## Statistics Courses

### DATA FILES

#### Data File Description

acrylic.dat A random sample of sheets of acrylic material were selected in a quality assurance program and measured for thickness.  The results (in mm) are listed in several columns.
age.dat A childcare worker interested in an age profile of unmarried mothers in her province determined the age NEAREST BIRTHDAY at the time of giving birth of the 88 unmarried mothers who gave birth within the study time period.  The ages are listed in several columns.
battery.dat An analysis of field equipment battery lifetimes under one set of environmental conditions involved an assessment of the usable lifetimes of six sample batteries of each of four battery types - coded A, B1, B2, B3 (actually two types with three sub-types of the second.)  The resulting lifetimes are listed in four columns -- one column for each of A, B1, B2, B3, in that order.
benefits.dat An analysis of quality of employee benefits packages involved an assessment of the packages for 240 sample employee groups classified into four types coded A, B, C, and D. The packages were assessed for quality and classified as Minimal, Adequate or Excellent. The resulting data are in 240 rows, one row per group, and in two columns, one column for type and one for quality.
capsule.dat An arthritis pain medication is prepared in capsules which should each contain 500 mg of the main ingredient, a form of glucosamine.  Natural variation leads to some variation from the 500, but values are expected to be within 0.5% of 500.  Amounts of glucosamine were recorded for sample capsules from the production runs over four different weeks.  The data are listed in four columns, one column for each of weeks 1, 2, 3, 4.
capsule1.dat This data set has one column with the data from week 1 of the data in capsule.dat above.
capsule2.dat This data set has one column with the data from week 2 of the data in capsule.dat above.
capsule3.dat This data set has one column with the data from week 3 of the data in capsule.dat above.
capsule4.dat This data set has one column with the data from week 4 of the data in capsule.dat above.
care.dat In a labour force survey on the possible effects of child caring on the employment status of women, part of the survey included 370 women aged 18 to 65.  The women were classified according to child caring and employment status.  The data file includes 370 rows, one row for each woman, and two columns.  For each woman, the entry in the first column is the child caring classification: U5 for caring for a child or children under 5, U16N5 for caring for a child or children under 16 but not under 5, and N16 for not caring for a child or children under 16.  The entry in the second column is the employment status: EMPLOY for employed or seeking employment and NOT for neither employed nor seeking employment.
classify.dat Because it is expensive and difficult to determine perfectly which subjects should be excluded from a project, different affordable screening methods were tested for accuracy.  400 subjects KNOWN TO BE UNSUITABLE were assigned randomly to the four procedures (labelled A, B, C, D) each of which would produce an incorrect classification 'suitable,' a non informative classification 'unclear' or a correct classification 'unsuitable.'  The data are listed in two columns of 400 rows, one row per subject. The first column lists the procedure, the second the resulting classification.
coffee.dat This data file is used to develop a prediction model for coffee consumption in office complexes with on-sight coffee sales.  Sales for sample days were recorded for several complexes.  The data file has one row for each complex/day.  The data file has four columns: The first column gives the average number of people in the building (a methodology was developed to determine this 'average.')  The second column gives the hours of operation, the third the price per cup of coffee and the fourth the number of cups sold.
energy.dat The data relate to energy consumption in a building for fourteen months requiring heating.  The file has one row for each month.  Each row has three columns producing, respectively, the number of heating degree days (number of days multiplied by the difference in Fahrenheit degrees between 70 and the outside temperature,) the average wind speed in miles per hour, and the energy consumption in megawatt hours.
expense.dat The data include one part of the results of a random audit sample of n = 90 travel expense accounts from a full collection of N = 1583 such accounts.  The file includes the average cost per day for meals for each sample account.  The data are listed in several columns.
glass.dat These data, in several columns, record the weights in grams of wine, spirits and beer glass in 88 sample 'blue boxes' collected in a recycling program.
house.dat A poll was conducted to determine reaction to a proposal to change zoning of a parcel of land near a school and playground from low density to high density plus retail.  Respondents were selected at random in a nearby mall and were classified according to a map as living in the same area as the land in question, in an adjacent area or in some other area.  Each respondent was coaxed into giving a response to the proposal, expressing strong support, mild support, mild opposition or strong opposition.  The data are in two columns with a row for each respondent.  The first column gives the area SAME, ADJACENT or OTHER and the second gives the degree of support as STR_SUP, MILD_SUP, MILD_OPP, or STR_OPP.
pcb.dat The data, in several columns, are the measurements in microgram per gram of wet weight of polychlorinated biphenyls in four dozen samples of marine tissue for one species of shellfish taken after ten days of exposure at 15° Celsius to a set PCB level in water.
phone.dat The data, in several columns, represent daily usage (nearest 0.1 minute) in a sample audit of 25 cellular telephone records.
pthours.dat Part of an analysis of hours worked by members of a pool of N = 2758 part-time workers involved a random sample of the hours worked in one week by n = 120 of the workers.  Times worked can range from 3 to 35 hours to the nearest quarter hour.  The sample times are listed in several columns from minimum to maximum.
smokers.dat In a survey on a possible relationship between education level and smoking, subjects chosen at random were classified according to education level and smoking status.  The results are stored in two columns, one row for each subject.  The first column gives smoking status as CURRENT, QUIT, or NEVER.  The second gives education level as ELEM for elementary school only, SECON for secondary school only, SOMEPOST for some post secondary education or UNIV for a university degree.
success.dat Subjects were assigned at random to four conditions involving varying degrees of stress (75 subjects for each condition,) and then assessed for the degree of success in completing a task under the level of stress imposed.  The data are tabulated in 300 rows and two columns -- one row per subject with the first column indicating stress level (no, low, mod [for moderate] or high) and the second degree of success ('success' for fully successful, 'partial' for partially success and 'not' for not successful.)
system1.dat Two data systems were compared with regard to the time to extract subsets of a data bank on the basis of various selection criteria.  Times for two dozen sample searches with system 1 are listed in several columns in this file.  (See below for system 2.)
system2.dat Two data systems were compared with regard to the time to extract subsets of a data bank on the basis of various selection criteria.  Times for 21 sample searches (an original  two dozen with 3 missing values) with system 2 are listed in several columns in this file.  (See above for system 1.)
visits.dat These time series data list the number of visits per month to a mental health clinic from  January 1988 through December 1992, inclusive.
webvisit.dat These data, in several columns, represent the results of a three-month sample of daily numbers of visits to a web site.
wire.dat These data, in two columns, give the diameters (in column one) and the strengths (in column two) of ten sample sections of wire (one row per wire sample from smallest diameter to largest.)