View statlib-20050214 biomed (public)
























- Summary
(No information yet)
- License
- unknown (from Weka repository)
- Dependencies
- Tags
- arff slurped Weka
- Attribute Types
- Integer,Floating Point,String
- Download
-
# Instances: 209 / # Attributes: 9
HDF5 (30.0 KB) XML CSV ARFF LibSVM Matlab OctaveFiles are converted on demand and the process can take up to a minute. Please wait until download begins.
You can edit this item to add more meta information and make use of the site's premium features.
- Original Data Format
- arff
- Name
- biomed
- Version mldata
- 0
- Comment
February 23, 1982
The 1982 annual meetings of the American Statistical Association (ASA) will be held August 16-19, 1982 in Cincinnati. At that meeting, the ASA Committee on Statistical Graphics plans to sponsor an "Exposition of Statistical Graphics Technology." The purpose of this activity is to more fully inform the ASA membership about the capabilities and uses of computer graphcis in statistical work. This letter is to invite you to participate in the Exposition.
Attached is a set of biomedical data containing 209 observations (134 for "normals" and 75 for "carriers"). Each vendor of provider of statistical graphics software participating in the Exposition is to analyze these data using their software and to prepare tabular, graphical and text output illustrating the use of graphics in these analyses and summarizing their conclusions. The tabular and graphical materials must be direct computer output from the statistical graphics software; the textual descriptions and summaries need not be. The total display space available to each participant at the meeting will be a standard poster- board (approximately 4' x 2 1/2'). All entries will be displayed in one location at the meetings, together with brief written commentary by the committee summarizing the results of this activity.
Reference
Exposition of Statistical Graphics Technology, L. H. Cox, M. M. Johnson, K. Kafadar, ASA Proc Stat. Comp Section, 1982, pp 55-56. Enclosures
THE DATA
The following data arose in a study to develop screening methods to identify carriers of a rare genetic disorder. Four measurements m1, m2, m3, m4 were made on blood samples. One of these, m1, has been used before. Because the disease is rare, there are only a few carriers of the disease from whom data are available. The data come in two files, one for normals and one for carriers of the disease. A description of the files is provided. The data have been stripped of the names and other identifiers. Otherwise the data are as received by the analyst.
PURPOSE OF THE ANALYSIS
The purpose of the analysis is to develop a screening procedure to detect carriers and to describe its effectiveness. Experts in the field have noted that young people tend to have higher measurements. The laboratory which prepared the measurements is worried that there may be a systematic drift over time in their measurement process. These effects should be considered in the analysis. Can graphical displays show the differences between the distributions of carriers and normals?
FILE DESCRIPTION
Column Content
1 Observation number (sequence number per patient) Note that there are several samples per patient for some patients. 2-8 Blank 9-12 Hospital identification number for blood sample 13-18 Blank 19-20 Age of patient 21-26 Blank 27-32 Date that blood sample was taken (mmddyy) Note that all day entries are 00. 33-39 Blank 40-43 ml (measurement 1) sss.s 44-50 Blank 51-54 m2 (measurement 2) xxx.x Eight missing data points. 55-61 Blank 62-65 m3 (measurement 3) xxx.x 66-72 Blank 73-75 m4 (measurement 4) xxx Seven missing data points.
Information about the dataset CLASSTYPE: nominal CLASSINDEX: last
- Names
- Observation_number,Hospital_identification_number_for_blood_sample,Age_of_patient,Date_that_blood_sample_was_taken,ml,m2,m3,m4,class,
- Types
- nominal:1,2,3,4,5,6,7
- numeric
- numeric
- numeric
- numeric
- numeric
- numeric
- numeric
- nominal:carrier,normal
- Data (first 10 data points)
Obse... Hosp... Age_... Date... ml m2 m3 m4 class 1 1027 30 100078 167.0 89.0 25.6 364 carr... 1 1013 41 100078 104.0 81.0 26.8 245 carr... 1 1324 22 80079 30.0 108.0 8.8 284 carr... 2 1332 22 80079 44.0 104.0 17.4 172 carr... 1 966 20 100078 65.0 87.0 23.8 198 carr... 1 979 42 90078 440.0 107.0 20.2 239 carr... 1 1327 59 80079 58.0 88.2 11.0 259 carr... 1 978 35 90078 129.0 93.1 18.3 188 carr... 2 1290 36 60079 104.0 87.5 16.7 256 carr... 3 1139 35 20079 122.0 88.5 21.6 263 carr... ... ... ... ... ... ... ... ... ...
- Description
A gzip'ed tar containing StatLib datasets (statlib-20050214.tar.gz, 12,785,582 Bytes)
- URLs
- (No information yet)
- Publications
- Data Source
- http://lib.stat.cmu.edu/datasets/
- Measurement Details
- Usage Scenario
- revision 1
- by mldata on 2010-11-06 09:59
No one has posted any comments yet. Perhaps you would like to be the first?
Leave a comment
To post a comment, please sign in.This item was downloaded 2487 times and viewed 1983 times.
No Tasks yet on dataset statlib-20050214 biomed
Submit a new Task for this Data itemDisclaimer
We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.
Acknowledgements
This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)
http://www.pascal-network.org/.