mldata :: Repository :: :: Chars74K Kannada hnd

About

You are here: Home / Repository / Data / View / Chars74K Kannada hnd

View Chars74K Kannada hnd (public)

2012-09-24 22:17 by teo | Version 4 | Rating

Rating

Overall (based on 0 votes)

Interesting

Documentation

ACTIVATE EDIT FORK DELETE

Summary
Data
More Info
History
Comments
Stats
Tasks

Summary: Images and data files of Kannada characters drawn on a tablet PC
License: ODbL
Dependencies
Tags: character-recognition computer-vision
Attribute Types
Download: tgz (125.8 MB)
Files are converted on demand and the process can take up to a minute. Please wait until download begins.

Completeness of this item currently: 100%.
You can edit this item to add more meta information and make use of the site's premium features.

Original Data Format

tgz

Name

Version mldata

Comment

Names

Data (first 10 data points)

Description

This dataset contains hand-printed (Hnd) symbols used in the Kannada language. The compound symbols were treated as individual classes, meaning that a combination of a consonant and a vowel leads to a third class in our dataset, so more than 600 classes have been annotated.

Clearly this is not the ideal representation for this type of script, as it leads to a very large number of classes and many of the systems are only ever used in conjunction with other symbols. However, we decided to use this representation for our baseline evaluations as a way to evaluate a generic recognition method for this problem.

The TGZ file contains two main directories: Trg: contains M files (in MatLab format). Each file contains a vector of rows and a vector of columns, together they have the screen coordinates of the datapoints generated by the strokes to draw the characters. Img: contains PNG images of the bitmaps generated from the hand drawn characters. This is the format used in the paper by [deCampos et al VISAPP2009].

Each directory contains a set of sub-directories in the format Samplexxx, where xxx is the class label.

This dataset and the experiments present in the paper were done at Microsoft Research India by Teofilo de Campos, with the mentoring support from Manik Varma. Additional SVM and MKL experiments were performed by Rakesh Babu.

We would like to acknowledge the help of several volunteers who annotated this dataset. In particular, we would like to thank Arun, Kavya, Ranjeetha, Riaz and Yuvraj. We would also like to thank Richa Singh and Gopal Srinivasa for developing some of the tools for annotation .

We kindly request that any publication obtained by using this dataset cites our original benchmark paper [deCampos et al VISAPP2009].

URLs

http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/

Publications

T. E. deCampos, B. R. Babu and M. Varma; Character Recognition in Natural Images; In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), 2009;

http://www.ee.surrey.ac.uk/CVSSP/Publications/papers/deCampos-VISAPP-2009.pdf

This paper tackles the problem of recognizing characters in images of natural scenes. In particular, we focus on recognizing characters in situations that would traditionally not be handled well by OCR techniques. We present an annotated database of images containing English and Kannada characters. The database comprises of images of street scenes taken in Bangalore, India using a standard camera. The problem is addressed in an object categorization framework based on a bag-of-visual-words representation. We assess the performance of various features based on nearest neighbour and SVM classification. It is demonstrated that the performance of the proposed method, using as few as 15 training images, can be far superior to that of commercial OCR systems. Furthermore, the method can benefit from synthetically generated training data obviating the need for expensive data collection and annotation.

Data Source

This dataset was captured from 25 volunteers using a tablet PC with the pen thickness set to match the average thickness found in hand painted public information boards.

Measurement Details

Raw trajectories and image bitmaps

Usage Scenario

Hand drawn character recognition.

Please use the lists available with the associated task in order to follow the protocol for experiments.

revision 1: by teo on 2012-03-27 15:47
revision 2: by teo on 2012-03-27 15:53
revision 3: by teo on 2012-07-23 12:39
revision 4: by teo on 2012-09-24 22:17

No one has posted any comments yet. Perhaps you would like to be the first?

To post a comment, please sign in.

This item was downloaded 2634 times and viewed 3860 times.

No Tasks yet on dataset Chars74K Kannada hnd

Submit a new Task for this Data item

Data

Sort by

Disclaimer

We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.

Data | Task | Method | Challenge

Acknowledgements

This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)

http://www.pascal-network.org/.