DTSC13/71-302 statistical learing and regression models


Question 1: (9 Total Marks)

Provide a short answer to each question (3 marks each)

  1. An analysis of factors affecting the usage of public transportation (e.g., busses, trains, ferries, light rail etc.) is being undertaken. Surveys from a sample of people are undertaken, recording various demographic and socioeconomic variables as well as the number of times during the last week they have used public transportation. Briefly describe what type of model you would use and why.
  2. An analysis of restaurant rating is being undertaken. A sample of restaurants was taken and rating levels (one to five stars) were recorded along with other basic characteristics such as type of food (e.g., French, Italian, Asian Fusion, Rotisserie Grill, etc.), average cost of a meal, location, seating capacity etc. The goal was to assess differences in ratings given to restaurants with various demographic covariates. Briefly discuss what type of model you would use and why.
  3. The survey in part (a) is to be modified. The same categorical covariates are collected, but this time the survey asked what mode of public transport was preferred (e.g., busses, trains, ferries, light rail, none of them, no preference, etc.). Briefly discuss whether you would use the same model structure you proposed for (a) and, if you would change your model, what new structure would you replace it with.

Question 2: (18 Total Marks)

The file MRI.csv contains data on the accuracy of MRI readings for various types of scan. Readings were made on two Types of MRI machine (T1 and T2) in each of three different modes (A, B and C) on two days and three different sessions per day. For each of these 2×3×2×3 = 36 combinations, 3 readings each were taken on different “Reference Targets”. The goals of the analysis are:

  1. Fit an appropriate model to determine if there are statistically significant fluctuations in mean readings over the various days and/or sessions.
  2. Fit an appropriate model to determine the type and method that best “hits the target”.
  3. For the type/method combination that is best, find a simple calibration to the reading which should make it as close to the reference target as possible. In other words, use your model to determine a simple procedure to calculate an Adjusted Reading = 𝑓(Reading) which improves the accuracy of the original reading in terms of hitting the reference target.

Question 3:(18 Total Marks)

The file Injury.csv contains data regarding soft-tissue training injuries in a sample of professional athletes. It contains their age (including fractions of a year), the amount of high intensity sprinting they have done at training in the past 4 weeks (in metres) and whether they have been injured during the current days training session (1 = Injured). The goals of the analysis are:

  1. Fit an appropriate model to determine, graphically or otherwise, where the optimal range of sprint training is to minimise the chance of soft-tissue injury.
  2. Assess the effect of age on the shape of the relationship between injury rate and training load.