mldata :: Repository :: :: datasets-UCI letter

About

You are here: Home / Repository / Data / View / datasets-UCI letter

View datasets-UCI letter (public)

2010-11-06 09:57 by mldata | Version 1 | Rating

Rating

Overall (based on 0 votes)

Interesting

Documentation

ACTIVATE EDIT FORK DELETE

Summary
Data
More Info
History
Comments
Stats
Tasks

Summary: (No information yet)
License: unknown (from Weka repository)
Dependencies
Tags: arff slurped Weka
Attribute Types: Integer,String
Download: # Instances: 20000 / # Attributes: 17
HDF5 (2.0 MB) XML CSV ARFF LibSVM Matlab Octave
Files are converted on demand and the process can take up to a minute. Please wait until download begins.

Completeness of this item currently: 55%.
You can edit this item to add more meta information and make use of the site's premium features.

Original Data Format

arff

Name

letter

Version mldata

Comment

TITLE: Letter Image Recognition Data

The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The character images were based on 20 different fonts and each letter within these 20 fonts was randomly distorted to produce a file of 20,000 unique stimuli. Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15. We typically train on the first 16000 items and then use the resulting model to predict the letter category for the remaining 4000. See the article cited above for more details.

2.USE IN STATLOG 2.1 Testing Mode Train and Test

2.2 Special PreProcessing
    No

2.3 Test Results
            Error Rate  TIME
    Algorithm   Train   Test    Train   Test
    --------------------------------------------    
    Alloc80     0.065   0.064   39575   ?
    KNN     0   0.068   15  2135
    LVQ     0.057   0.079   1487    48
    QuaDisc     0.101   0.113   3736    1223
    Cn2     0.021   0.115   40458   52
    BayTree     0.015   0.124   276 7
    NewId       0   0.128   1056    2
    IndCart     0.010   0.130   1098    1020
    C4.5        0.042   0.132   309 292
    Dipol92     0.167   0.176   1303    80
    Radial      0.220   0.233   ?   ?
    LogDisc     0.234   0.234   5062    39
    Ac2     0   0.245   2529    92
    Castle      0.237   0.245   9455    2933    
    Kohonen     0.218   0.252   ?   ?
    Cal5        0.158   0.253   1033    8
    Smart       0.287   0.295   400919  184
    Discrim     0.297   0.302   326 84
    BackProp    0.323   0.327   277445  22
    Bayes       0.516   0.529   75  18
    Itrule      0.585   0.594   22325   69
    Default     0.955   0.960   ?   ?
    Cascade     1.0
    Cart        1.000

SOURCE Information and Paste Usage 3.1 Source -- Creator: David J. Slate -- Odesta Corporation; 1890 Maple Ave; Suite 115; Evanston, IL 60201 -- Donor: David J. Slate (dave@math.nwu.edu) (708) 491-3867
-- Date: January, 1991

3.2 Past Usage: -- P. W. Frey and D. J. Slate (Machine Learning Vol 6 #2 March 91): "Letter Recognition Using Holland-style Adaptive Classifiers".

The research for this article investigated the ability of several variations of Holland-style adaptive classifier systems to learn to correctly guess the letter categories associated with vectors of 16 integer attributes extracted from raster scan images of the letters. The best accuracy obtained was a little over 80%. It would be interesting to see how well other methods do with the same data.

DATASET DESCRIPTION Number of Instances: 20000 Train 15000 Test 5000

Number of Attributes: 16 (numeric features)

NUMBER of CLASSES : 26 capital letter (26 values from A to Z)

Class Distribution:
789 A      766 B     736 C     805 D     768 E     775 F     773 G
734 H      755 I     747 J     739 K     761 L     792 M     783 N
753 O      803 P     783 Q     758 R     748 S     796 T     813 U
764 V      752 W     787 X     786 Y     734 Z

Attribute Information:

1.  x-box   horizontal position of box  (integer)
2.  y-box   vertical position of box    (integer)
3.  width   width of box            (integer)
4.  high    height of box           (integer)
5.  onpix   total # on pixels       (integer)
6.  x-bar   mean x of on pixels in box  (integer)
7.  y-bar   mean y of on pixels in box  (integer)
8.  x2bar   mean x variance         (integer)
9.  y2bar   mean y variance         (integer)
10. xybar   mean x y correlation        (integer)
11. x2ybr   mean of x * x * y       (integer)
12. xy2br   mean of x * y * y       (integer)
13. x-ege   mean edge count left to right   (integer)
14. xegvy   correlation of x-ege with y (integer)
15. y-ege   mean edge count bottom to top   (integer)
16. yegvx   correlation of y-ege with x (integer)

Missing Attribute Values: None

CONTACTS statlog-adm@ncc.up.pt bob@stams.strathclyde.ac.uk

================================================================================

Num Instances: 20000 Num Attributes: 17 Num Continuous: 16 (Int 16 / Real 0) Num Nominal: 1 Missing values: 0 / 0.0%

name                      type enum ints real     missing    distinct  (1)

1 'x-box' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 2 'y-box' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 3 'width' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 4 'high' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 5 'onpix' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 6 'x-bar' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 7 'y-bar' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 8 'x2bar' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 9 'y2bar' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 10 'xybar' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 11 'x2ybr' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 12 'xy2br' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 13 'x-ege' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 14 'xegvy' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 15 'y-ege' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 16 'yegvx' Int 0% 100% 0% 0 / 0% 16 / 0% 0% 17 'class' Enum 0% 100% 0% 0 / 0% 26 / 0% 0%

Relabeled values in attribute 'class' From: 1 To: A
From: 2 To: B
From: 3 To: C
From: 4 To: D
From: 5 To: E
From: 6 To: F
From: 7 To: G
From: 8 To: H
From: 9 To: I
From: 10 To: J
From: 11 To: K
From: 12 To: L
From: 13 To: M
From: 14 To: N
From: 15 To: O
From: 16 To: P
From: 17 To: Q
From: 18 To: R
From: 19 To: S
From: 20 To: T
From: 21 To: U
From: 22 To: V
From: 23 To: W
From: 24 To: X
From: 25 To: Y
From: 26 To: Z

Names

x-box,y-box,width,high,onpix,x-bar,y-bar,x2bar,y2bar,xybar,

Types

numeric
numeric
numeric
numeric
numeric
numeric
numeric
numeric
numeric
numeric

Data (first 10 data points)

x-box	y-box	width	high	onpix	x-bar	y-bar	x2bar	y2bar	xybar	...
2	4	4	3	2	7	8	2	9	11	...
4	7	5	5	5	5	9	6	4	8	...
7	10	8	7	4	8	8	5	10	11	...
4	9	5	7	4	7	7	13	1	7	...
6	7	8	5	4	7	6	3	7	10	...
4	7	5	5	3	4	12	2	5	13	...
6	10	8	8	4	7	8	2	5	10	...
1	0	2	0	1	6	10	7	2	7	...
5	9	7	6	7	7	7	2	4	9	...
1	0	2	1	1	5	7	8	6	7	...
...	...	...	...	...	...	...	...	...	...	...

Description

A jarfile containing 37 classification problems, originally obtained from the UCI repository (datasets-UCI.jar, 1,190,961 Bytes).

URLs

(No information yet)

Publications

Data Source

http://www.ics.uci.edu/~mlearn/MLRepository.html

Measurement Details

Usage Scenario

revision 1: by mldata on 2010-11-06 09:57

No one has posted any comments yet. Perhaps you would like to be the first?

To post a comment, please sign in.

This item was downloaded 3980 times and viewed 3238 times.

No Tasks yet on dataset datasets-UCI letter

Submit a new Task for this Data item

Data

Sort by

Disclaimer

We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.

Data | Task | Method | Challenge

Acknowledgements

This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)

http://www.pascal-network.org/.

x-box	y-box	width	high	onpix	x-bar	y-bar	x2bar	y2bar	xybar	...
2	4	4	3	2	7	8	2	9	11	...
4	7	5	5	5	5	9	6	4	8	...
7	10	8	7	4	8	8	5	10	11	...
4	9	5	7	4	7	7	13	1	7	...
6	7	8	5	4	7	6	3	7	10	...
4	7	5	5	3	4	12	2	5	13	...
6	10	8	8	4	7	8	2	5	10	...
1	0	2	0	1	6	10	7	2	7	...
5	9	7	6	7	7	7	2	4	9	...
1	0	2	1	1	5	7	8	6	7	...
...	...	...	...	...	...	...	...	...	...	...

x-box	y-box	width	high	onpix	x-bar	y-bar	x2bar	y2bar	xybar	...
2	4	4	3	2	7	8	2	9	11	...
4	7	5	5	5	5	9	6	4	8	...
7	10	8	7	4	8	8	5	10	11	...
4	9	5	7	4	7	7	13	1	7	...
6	7	8	5	4	7	6	3	7	10	...
4	7	5	5	3	4	12	2	5	13	...
6	10	8	8	4	7	8	2	5	10	...
1	0	2	0	1	6	10	7	2	7	...
5	9	7	6	7	7	7	2	4	9	...
1	0	2	1	1	5	7	8	6	7	...
...	...	...	...	...	...	...	...	...	...	...