DTSC71 302 Statistical Learning and Regression Models
STATISTICAL LEARNING & REGRESSION MODELS
Answer ALL questions.
Total Marks: 45
Question 1: (9 Total Marks)
Provide a short answer to each question (3 marks each)
- An analysis of factors affecting the usage of public transportation (e.g., busses, trains, ferries, light rail etc.) is being undertaken. Survey responses from a sample of people are collected, recording various categorical demographic and socioeconomic variables as well as the number of times during the last week they used any form of public transportation. Briefly describe what type of model you would use to analyse this data and why.
- An analysis of restaurant rating is being undertaken. A sample of restaurants was selected and rating levels (one to five stars) were recorded along with other basic characteristics such as type of food (e.g., French, Italian, Asian Fusion, Rotisserie Grill, etc.), average cost of a meal, location, seating capacity etc. The goal was to assess differences in ratings given to restaurants with various business-oriented covariates. Briefly discuss what type of model you would use and why.
- The survey in part (a) is to be modified. The same categorical covariates are collected, but this time the survey asked what mode of public transport was preferred (e.g., busses, trains, ferries, light rail, none of them, no preference, etc.). Briefly discuss whether you would use the same model structure you proposed for (a) and, if you would change your model, what new structure would you replace it with.
Question 2: (18 Total Marks)
The file MRI.csv contains data on the accuracy of MRI readings for various types of scans. Readings were made on two Types of MRI machine (T1 and T2) in each of three different modes (A, B and C) on two days and three different sessions per day. For each of these 2×3×2×3 = 36 combinations, 3 readings each were taken on a range of different “Reference Targets”. The goals of the analysis are:
- Fit an appropriate model to determine if there are statistically significant fluctuations in mean readings over the various days and/or sessions.
- Fit an appropriate model to determine the machine type and scan mode that best “hits the target”.
- For the type/mode combination that is best, find a simple calibration to the reading which should make it as close to the reference target as possible. In other words, use your model to determine a simple procedure to calculate an Adjusted Reading = 𝑓𝑓(Reading) which improves the accuracy of the original reading in terms of hitting the reference target.
Question 3: (18 Total Marks)
The file Injury.csv contains data regarding soft-tissue training injuries in a sample of professional athletes. It contains their age (including fractions of a year), the amount of high intensity sprinting they have done at training in the past 4 weeks (in metres) and whether they have been injured during the current days training session (1 = Injured). The goals of the analysis are:
- Fit an appropriate model to determine, graphically or otherwise, where the optimal range of sprint training is to minimise the chance of soft-tissue injury.
- Assess the effect of age on the shape of the relationship between injury rate and training load.