View uci-20070111 trains (public)
























- Summary
(No information yet)
- License
- unknown (from Weka repository)
- Dependencies
- Tags
- arff slurped Weka
- Attribute Types
- Integer,Floating Point,String
- Download
-
# Instances: 10 / # Attributes: 33
HDF5 (30.6 KB) XML CSV ARFF LibSVM Matlab OctaveFiles are converted on demand and the process can take up to a minute. Please wait until download begins.
You can edit this item to add more meta information and make use of the site's premium features.
- Original Data Format
- arff
- Name
- trains
- Version mldata
- 0
- Comment
Title: INDUCE Trains Data set
Sources:
Donor: GMU, Center for AI, Software Librarian, Eric E. Bloedorn (bloedorn@aic.gmu.edu)
Original owners: Ryszard S. Michalski (michalski@aic.gmu.edu) and Robert Stepp
Date received: 1 June 1994
Date updated: 24 June 1994 (Thanks to Larry Holder (UT Arlington) for noticing a translation error)
Past usage:
This set most closely resembles the data sets described in the following two publications:
- R.S. Michalski and J.B. Larson "Inductive Inference of VL Decision Rules" In Proceedings of the Workshop in Pattern-Directed Inference Systems, Hawaii, May 1977. Also published in SIGART Newsletter, ACM No. 63, pp. 38-44, June 1977.
-
Stepp, R.E. and Michalski, R.S. "Conceptual Clustering: Inventing
Goal-Oriented Classifications of Structured Objects" In
R.S. Michalski, J.G. Carbonell, and T.M. Mitchell (Eds.) "Machine
Learning: An Artificial Intelligence Approach, Volume II". Los
Altos, Ca: Morgan Kaufmann.
Both of these papers describe a set of 10 trains, 5 east-bound and 5 west bound. Both refer to the same 10 trains as seen by the figures in these publications. The differences are: 1) This dataset has 10 attributes, no wheel, or load color attributes 2) Reference 2 (Stepp, Michalski): does not completely list the attributes used, but does mention wheel color - an attribute not present in this dataset. 3) Reference 1 (Michalski, Larson): 12 attributes mentioned, but only 6 are explicitly described. These 6 are included in the dataset below and the Stepp and Michalski set.
Results: [1] Michalski and Larson found the following decision rules: (1) There exists car1, car2, lod1 and lod2 such that [infront(car1, car2)][lcont(car1, lod1)][lcont(car2,lod2)] [load-shape(lod1)=triangle][load-shape(lod2)=polygon]=>dir=east There exists a car1 such that [ln(car1)=short][car-shape(car1)=closed-top]=>dir=east [ncar=3]v There exists car1 such that [car1(car-shape(car1)=jagged- top] =>[dir=west] There exists car1 such that (4) [#cars(ln=long)=2][cshape(car1)=open,trapezoind,u-shaped] v [location(car1)=2][cshape(car1)=closed, rectangle]=>dir=west [2] The goal of the cluster research is to develop a general method for clustering structured objects that can generate conjunctive descriptions that occur in human classifications or invent new concepts that have similar appeal. CLUSTER/S was able to find the following cognitively appealing clusters: 1) a) "There are two different car shapes in the train" b) "There are three or more different car shapes in the train" 2) a) Wheels on all cars have the same color, b) wheels on all cars do not have the same color."
Relevant information:
Additional "background" knowledge is supplied that provides a partial ordering on some of the attribute values.
We are providing this dataset both in its original form and in a form similar to the more typical propositional datasets in our repository. Since the trains dataset records relations between attributes, this transformation was somewhat challenging. However, it may shed some insight on this problem for people who are more familiar with the simple one-instance-per-line dataset format.
Hierarchy of values: if (cshape is one of {openrect,opentrap,ushaped,dblopnrect} then cshape is opentop if (cshape is one of {hexagon,ellipse,closedrect,jaggedtop,slopetop, engine} then cshape closedtop
Prediction task: Determine concise decision rules distinguishing trains traveling east from those traveling west.
Number of instances: 10
Number of attributes:
10, not including the class attribute
ccont(train idx1, car idx2): car idx is contained in train idx
ncar(train idx): # of trains in car train idx (int)
infront(car idx1, car idx2): relative positions of cars in train
loc(car idx): absolute position of car in train (int)
nwhl(car idx): # of wheels of car idx (int)
ln(car idx): length of car idx (long, short)
cshape(car idx): shape of car (engine, dblopenrect, closedrect, openrect, opentrap, ushaped, hexagon, ellipse, jaggedtop, slopetop, opentop, closedtop)
npl(car idx): number of loads in car idx
lcont(car idx, load idx): description of which cars hold which loads
lhshape(load idx): description of load shape (trianglod, rectanglod, circlelod, hexagonlod) Class: direction (east, west)
The following format was used for the "transformed" dataset representation as found in trains.transformed.data (one instance per line):
Attributes: 33 1. Number_of_cars (integer in [3-5]) 2. Number_of_different_loads (integer in [1-4]) 3-22: 5 attributes for each of cars 2 through 5: (20 attributes total) - num_wheels (integer in [2-3]) - length (short or long)
- shape (closedrect, dblopnrect, ellipse, engine, hexagon, jaggedtop, openrect, opentrap, slopetop, ushaped) - num_loads (integer in [0-3]) - load_shape (circlelod, hexagonlod, rectanglod, trianglod) 23-32: 10 Boolean attributes describing whether 2 types of loads are on adjacent cars of the train - Rectangle_next_to_rectangle (0 if false, 1 if true) - Rectangle_next_to_triangle (0 if false, 1 if true) - Rectangle_next_to_hexagon (0 if false, 1 if true) - Rectangle_next_to_circle (0 if false, 1 if true) - Triangle_next_to_triangle (0 if false, 1 if true) - Triangle_next_to_hexagon (0 if false, 1 if true) - Triangle_next_to_circle (0 if false, 1 if true) - Hexagon_next_to_hexagon (0 if false, 1 if true) - Hexagon_next_to_circle (0 if false, 1 if true) - Circle_next_to_circle (0 if false, 1 if true) 33. Class attribute (east or west)The number of cars vary between 3 and 5. Therefore, attributes referring to properties of cars that do not exist (such as the 5 attriubutes for the "5th" car when the train has fewer than 5 cars) are assigned a value of "-".
- Distribution of classes:
- There are 5 east-bound trains and 5 west-bound trains (i.e., 50% east, 50% west)
Information about the dataset CLASSTYPE: nominal CLASSINDEX: last
- Names
- Number_of_cars,Number_of_different_loads,num_wheels_2,length_2,shape_2,num_loads_2,load_shape_2,num_wheels_3,length_3,shape_3,
- Types
- nominal:3,4,5
- nominal:1,2,3,4
- nominal:2,3
- nominal:long,short
- nominal:closedrect,dblopnrect,openrect,opentrap,ushaped
- nominal:1,3
- nominal:circlelod,rectanglod,trianglod
- nominal:2,3
- nominal:long,short
- nominal:closedrect,dblopnrect,hexagon,jaggedtop,openrect,opentrap,slopetop,ushaped
- Data (first 10 data points)
Numb... Numb... num_... leng... shap... num_... load... num_... leng... shap... ... 5 4 2 long open... 3 rect... 2 short slop... ... 4 3 2 short usha... 1 tria... 2 short open... ... 4 2 2 short open... 1 circ... 2 short hexa... ... 5 2 2 short open... 1 tria... 2 short dblo... ... 4 3 2 short dblo... 1 tria... 3 long clos... ... 3 2 2 long clos... 3 circ... 2 short open... ... 4 2 2 short dblo... 1 circ... 2 short usha... ... 3 2 3 long clos... 1 rect... 2 short usha... ... 5 2 2 short open... 1 circ... 2 long jagg... ... 3 1 2 short usha... 1 rect... 2 long open... ...
- Description
A gzip'ed tar containing UCI and UCI KDD datasets (uci-20070111.tar.gz, 17,952,832 Bytes)
- URLs
- (No information yet)
- Publications
- Data Source
- http://www.ics.uci.edu/~mlearn/MLRepository.html http://kdd.ics.uci.edu/
- Measurement Details
- Usage Scenario
- revision 1
- by mldata on 2010-11-06 09:58
No one has posted any comments yet. Perhaps you would like to be the first?
Leave a comment
To post a comment, please sign in.This item was downloaded 2969 times and viewed 2094 times.
No Tasks yet on dataset uci-20070111 trains
Submit a new Task for this Data itemDisclaimer
We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.
Acknowledgements
This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)
http://www.pascal-network.org/.