View datasets-UCI iris (public)

2011-09-14 15:17 by phoyer | Version 7 | Rating Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star
Rating
Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star Overall (based on 0 votes)
Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star Interesting
Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star Documentation
ACTIVATE EDIT FORK DELETE

Object is LOCKED because others depend on it.
Fork to change.

Summary

The classic iris flower data

License
unknown (from Weka repository)
Dependencies
Tags
arff slurped Weka
Attribute Types
Floating Point,String
Download
# Instances: 150 / # Attributes: 5
HDF5 (24.4 KB) XML CSV ARFF LibSVM Matlab Octave
Completeness of this item currently: 88%.
Original Data Format
arff
Name
iris
Version mldata
0
Comment
  1. Title: Iris Plants Database

  2. Sources: (a) Creator: R.A. Fisher (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov) (c) Date: July, 1988

  3. Past Usage:

  4. Publications: too many to mention!!! Here are a few.

  5. Fisher,R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950).

  6. Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis. (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.

  7. Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System Structure and Classification Rule for Recognition in Partially Exposed Environments". IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-2, No. 1, 67-71. -- Results: -- very low misclassification rates (0% for the setosa class)

  8. Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions on Information Theory, May 1972, 431-433. -- Results: -- very low misclassification rates again

  9. See also: 1988 MLC Proceedings, 54-64. Cheeseman et al's AUTOCLASS II conceptual clustering system finds 3 classes in the data.

  10. Relevant Information: --- This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. --- Predicted attribute: class of iris plant. --- This is an exceedingly simple domain.

  11. Number of Instances: 150 (50 in each of three classes)

  12. Number of Attributes: 4 numeric, predictive attributes and the class

  13. Attribute Information:

  14. sepal length in cm

  15. sepal width in cm

  16. petal length in cm

  17. petal width in cm

  18. class: -- Iris Setosa -- Iris Versicolour -- Iris Virginica

  19. Missing Attribute Values: None

Summary Statistics: Min Max Mean SD Class Correlation sepal length: 4.3 7.9 5.84 0.83 0.7826
sepal width: 2.0 4.4 3.05 0.43 -0.4194 petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)

  1. Class Distribution: 33.3% for each of 3 classes.
Names
sepallength,sepalwidth,petallength,petalwidth,class,
Types
  1. numeric
  2. numeric
  3. numeric
  4. numeric
  5. nominal:Iris-setosa,Iris-versicolor,Iris-virginica
Data (first 10 data points)
    sepa... sepa... peta... peta... class
    5.1 3.5 1.4 0.2 Iris...
    4.9 3.0 1.4 0.2 Iris...
    4.7 3.2 1.3 0.2 Iris...
    4.6 3.1 1.5 0.2 Iris...
    5.0 3.6 1.4 0.2 Iris...
    5.4 3.9 1.7 0.4 Iris...
    4.6 3.4 1.4 0.3 Iris...
    5.0 3.4 1.5 0.2 Iris...
    4.4 2.9 1.4 0.2 Iris...
    4.9 3.1 1.5 0.1 Iris...
    ... ... ... ... ...
Description

This is the classic Iris flower data set, collected by Edgar Anderson and used as an example of linear discriminant analysis by Ronald Fisher. See http://en.wikipedia.org/wiki/Iris_flower_data_set. Briefly, there are 150 instances, 50 each of Iris setosa, Iris versicolor, and Iris virginica. For each instance, there are measures of sepal length, sepal width, petal length, and petal width, in addition to the class indicator.

URLs
(No information yet)
Publications
    Data Source
    http://www.ics.uci.edu/~mlearn/MLRepository.html
    Measurement Details

    For details see: Edgar Anderson (1935). "The irises of the Gaspé Peninsula". Bulletin of the American Iris Society 59: 2–5.

    Usage Scenario

    This dataset has been widely used as a test case for classification algorithms.

    revision 1
    by mldata on 2010-04-29 20:06
    revision 2
    by phoyer on 2010-08-31 09:12
    revision 3
    by phoyer on 2010-08-31 09:25
    revision 4
    by phoyer on 2010-08-31 07:27
    revision 5
    by phoyer on 2010-08-31 07:27
    revision 6
    by phoyer on 2010-08-31 09:30
    revision 7
    by phoyer on 2011-09-14 15:17

    No one has posted any comments yet. Perhaps you would like to be the first?

    Leave a comment

    To post a comment, please sign in.

    This item was downloaded 15778 times and viewed 18526 times.

    Tasks defined on dataset datasets-UCI iris

    Submit a new Task for this Data item

    Data

    Sort by

    Disclaimer

    We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.

    Data | Task | Method | Challenge

    Acknowledgements

    This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)
    PASCAL Logo
    http://www.pascal-network.org/.