View datasets-numeric pharynx (public)

2010-11-06 09:57 by mldata | Version 1 | Rating Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star
Rating
Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star Overall (based on 0 votes)
Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star Interesting
Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star Documentation
Summary

(No information yet)

License
unknown (from Weka repository)
Dependencies
Tags
arff slurped Weka
Attribute Types
Integer,Floating Point
Download
# Instances: 195 / # Attributes: 12
HDF5 (27.6 KB) XML CSV ARFF LibSVM Matlab Octave
Completeness of this item currently: 44%.
You can edit this item to add more meta information and make use of the site's premium features.
Original Data Format
arff
Name
'pharynx'
Version mldata
0
Comment

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Case number deleted.

As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based learning with encoding length selection. In Progress in Connectionist-Based Information Systems. Singapore: Springer-Verlag.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Name: Pharynx (A clinical Trial in the Trt. of Carcinoma of the Oropharynx). SIZE: 195 observations, 13 variables.

DESCRIPTIVE ABSTRACT:

The .dat file gives the data for a part of a large clinical trial carried out by the Radiation Therapy Oncology Group in the United States. The full study included patients with squamous carcinoma of 15 sites in the mouth and throat, with 16 participating institutions, though only data on three sites in the oropharynx reported by the six largest institutions are considered here. Patients entering the study were randomly assigned to one of two treatment groups, radiation therapy alone or radiation therapy together with a chemotherapeutic agent. One objective of the study was to compare the two treatment policies with respect to patient survival.

SOURCE: The Statistical Analysis of Failure Time Data, by JD Kalbfleisch & RL Prentice, (1980), Published by John Wiley & Sons

VARIABLE DESCRIPTIONS:

The data are in free format. That is, at least one blank space separates each variable in the .dat file. The variables are as follows:

Case: Case Number Inst: Participating Institution sex: 1=male, 2=female Treatment: 1=standard, 2=test Grade: 1=well differentiated, 2=moderately differentiated, 3=poorly differentiated, 9=missing Age: In years at time of diagnosis Condition: 1=no disability, 2=restricted work, 3=requires assistance with self care, 4=bed confined, 9=missing Site: 1=faucial arch, 2=tonsillar fossa, 3=posterior pillar, 4=pharyngeal tongue, 5=posterior wall T staging: 1=primary tumor measuring 2 cm or less in largest diameter, 2=primary tumor measuring 2 cm to 4 cm in largest diameter with minimal infiltration in depth, 3=primary tumor measuring more than 4 cm, 4=massive invasive tumor N staging: 0=no clinical evidence of node metastases, 1=single positive node 3 cm or less in diameter, not fixed, 2=single positive node more than 3 cm in diameter, not fixed, 3=multiple positive nodes or fixed positive nodes Entry Date: Date of study entry: Day of year and year Status: 0=censored, 1=dead Time: Survival time in days from day of diagnosis

STORY BEHIND THE DATA:

Approximately 30% of the survival times are censored owing primarily to patients surviving to the time of analysis. Some patients were lost to follow-up because the patient moved or transferred to an institution not participating in the study, though these cases were relatively rare. From a statistical point of view, an important feature of these data is the considerable lack of homogeneity between individuals being studied. Of course, as part of the study design, certain criteria for patient eligibility had to be met which eliminated extremes in the extent of disease, but still many factors are not controlled.

This study included measurements of many covariates which would be expected to relate to survival experience. Six such variables are given in the data (sex, T staging, N staging, age, general condition, and grade). The site of the primary tumor and possible differences between participating institutions require consideration as well.

The T,N staging classification gives a measure of the extent of the tumor at the primary site and at regional lymph nodes. T=1, refers to a small primary tumor, 2 centimeters or less in largest diameter, whereas T=4 is a massive tumor with extension to adjoining tissue. T=2 and T=3 refer to intermediate cases. N=0 refers to there being no clinical evidence of a lymph node metastasis and N=1, N=2, N=3 indicate, in increasing magnitude, the extent of existing lymph node involvement. Patients with classifications T=1,N=0; T=1,N=1; T=2,N=0; or T=2,N=1, or with distant metastases were excluded from study.

The variable general condition gives a measure of the functional capacity of the patient at the time of diagnosis (1 refers to no disability whereas 4 denotes bed confinement; 2 and 3 measure intermediate levels). The variable grade is a measure of the degree of differentiation of the tumor (the degree to which the tumor cell resembles the host cell) from 1 (well differentiated) to 3 (poorly differentiated)

In addition to the primary question whether the combined treatment mode is preferable to the conventional radiation therapy, it is of considerable interest to determine the extent to which the several covariates relate to subsequent survival. It is also imperative in answering the primary question to adjust the survivals for possible imbalance that may be present in the study with regard to the other covariates. Such problems are similar to those encountered in the classical theory of linear regression and the analysis of covariance. Again, the need to accommodate censoring is an important distinguishing point. In many situations it is also important to develop nonparametric and robust procedures since there is frequently little empirical or theoretical work to support a particular family of failure time distributions.

Names
Inst,sex,Treatment,Grade,Age,Condition,Site,T,N,Entry,
Types
  1. nominal:2,5,4,6,3,1
  2. nominal:2,1
  3. nominal:1,2
  4. nominal:1,2,3
  5. numeric
  6. nominal:1,2,3,0,4
  7. nominal:2,4,1
  8. nominal:3,2,4,1
  9. nominal:1,3,0,2
  10. nominal:2468,2968,3368,5768,9568,10668,10768,12068,13368,15468,18268,18468,19068,20768,21768,22768,23368,25968,28068,28268,28968,29468,29868,30468,30868,31068,31868,32468,33568,33368,33868,369,769,969,1769,2469,3569,4469,4569,4969,5169,5669,2769,8369,9369,11869,12569,12769,12969,13269,13569,14369,15569,15669,16669,16769,17869,19969,20469,23069,24569,26669,27969,26869,28069,28969,29069,30469,32869,33069,33269,33569,33669,34469,35369,36369,870,4270,4470,4870,4970,5470,5770,7870,8270,9670,11070,11870,12470,13170,14470,14670,15270,15870,16070,16670,17470,18770,18970,19070,20570,21170,21970,23170,24370,25170,25470,25870,28570,28770,31670,32770,33370,33670,34170,34270,34370,34470,35570,36270,1271,1571,1871,2271,2671,3371,4371,4971,6771,7571,7771,8871,10571,11371,15371,15471,15971,16171,18371,18871,20171,20271,20971,21671,21871,22171,23771,25371,26371,27371,28071,28471,29471,29971,31471,31971,32171,32371,32671,33071,34071,34271,34771,1272,3572,4672,5472,5572,5672,5972,8072,8272,13671,14372,15672,15772,20572,20772,20972,22772,24372,24872,27672,12371
Data (first 10 data points)
    Inst sex Trea... Grade Age Cond... Site T N Entry ...
    2 2 1 1.0 51 1.0 2 3 1 2468 ...
    2 1 2 1.0 65 1.0 4 2 3 2968 ...
    2 1 1 2.0 64 2.0 1 3 3 3368 ...
    2 1 1 1.0 73 1.0 1 4 0 5768 ...
    5 1 2 2.0 64 1.0 1 4 3 9568 ...
    4 1 2 1.0 61 1.0 2 3 0 10668 ...
    4 1 1 2.0 65 1.0 2 4 3 10768 ...
    4 1 2 3.0 84 1.0 4 1 3 12068 ...
    6 1 1 2.0 54 2.0 1 3 3 13368 ...
    3 1 1 2.0 72 2.0 4 2 2 15468 ...
    ... ... ... ... ... ... ... ... ... ... ...
Description

A jarfile containing 37 regression problems, obtained from various sources (datasets-numeric.jar, 169,344 Bytes).

URLs
(No information yet)
Publications
    Data Source
    Measurement Details
    Usage Scenario
    revision 1
    by mldata on 2010-11-06 09:57

    No one has posted any comments yet. Perhaps you would like to be the first?

    Leave a comment

    To post a comment, please sign in.

    This item was downloaded 4503 times and viewed 2526 times.

    No Tasks yet on dataset datasets-numeric pharynx

    Submit a new Task for this Data item

    Data

    Sort by

    Disclaimer

    We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.

    Data | Task | Method | Challenge

    Acknowledgements

    This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)
    PASCAL Logo
    http://www.pascal-network.org/.