View Chars74K English hnd (public)
























- Summary
Images and data files of characters and digits drawn on a tablet PC
- License
- ODbL
- Dependencies
- Tags
- character-recognition computer-vision handwritten-digits
- Attribute Types
- Download
-
tgz (13.0 MB)
Files are converted on demand and the process can take up to a minute. Please wait until download begins.
You can edit this item to add more meta information and make use of the site's premium features.
- Original Data Format
- tgz
- Name
- Version mldata
- Comment
- Names
- Data (first 10 data points)
- Description
This dataset contains hand-printed (Hnd) characters used in the English language (Latin characters and Hindu-Arabic numerals). It contains 62 classes, and 55 samples per class, giving a total of 3410 samples.
The TGZ file contains two main directories:
Trg: contains M files (in MatLab format). Each file contains a vector of rows and a vector of columns, together they have the screen coordinates of the datapoints generated by the strokes to draw the characters.
Img: contains PNG images of the bitmaps generated from the hand drawn characters. These is the format used in the paper by deCampos et al VISAPP2009.
Each directory contains a set of sub-directories in the format Samplexxx, where xxx is the class label.
- URLs
- http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/
- Publications
T. E. deCampos, B. R. Babu and M. Varma; Character Recognition in Natural Images; In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), 2009;
http://www.ee.surrey.ac.uk/CVSSP/Publications/papers/deCampos-VISAPP-2009.pdf
This paper tackles the problem of recognizing characters in images of natural scenes. In particular, we focus on recognizing characters in situations that would traditionally not be handled well by OCR techniques. We present an annotated database of images containing English and Kannada characters. The database comprises of images of street scenes taken in Bangalore, India using a standard camera. The problem is addressed in an object categorization framework based on a bag-of-visual-words representation. We assess the performance of various features based on nearest neighbour and SVM classification. It is demonstrated that the performance of the proposed method, using as few as 15 training images, can be far superior to that of commercial OCR systems. Furthermore, the method can benefit from synthetically generated training data obviating the need for expensive data collection and annotation.
- Data Source
- This dataset was captured from 55 volunteers using a tablet PC with the pen thickness set to match the average thickness found in hand painted public information boards.
- Measurement Details
Raw trajectories and image bitmaps
- Usage Scenario
Hand drawn character recognition
- revision 1
- by teo on 2012-03-16 13:46
- revision 2
- by teo on 2012-03-16 13:57
- revision 3
- by teo on 2012-03-16 13:59
- revision 4
- by teo on 2012-03-16 14:00
- revision 5
- by teo on 2012-03-16 14:02
- revision 6
- by teo on 2012-03-16 14:10
- revision 7
- by teo on 2012-03-16 15:05
- revision 8
- by teo on 2012-03-16 15:06
- revision 9
- by teo on 2012-03-16 15:15
- revision 10
- by teo on 2012-03-16 15:20
- revision 11
- by teo on 2012-03-16 15:23
- revision 12
- by teo on 2012-03-16 15:23
- revision 13
- by teo on 2012-09-24 20:35
No one has posted any comments yet. Perhaps you would like to be the first?
Leave a comment
To post a comment, please sign in.This item was downloaded 3395 times and viewed 16257 times.
No Tasks yet on dataset Chars74K English hnd
Submit a new Task for this Data itemDisclaimer
We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.
Acknowledgements
This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)
http://www.pascal-network.org/.