• +1-617-874-1011 (US)
  • +44-117-230-1145 (UK)
Live Chat
Follow Us:

R Language Assignment Question 9

Econ 3818 R exercise 3

Submit R code used to answer all question as part of the text file. Separate the R code above and below by three asterisk (***).

You may work in groups of three (but no more than three!). Please put the name of all group members at the top of the text file. Submit work individually.

1. You sample 100 people’s showering habits and find the average shower time is 12.2 minutes. Given that the population variance shower time is 15, use the qnorm() command to construct an 86% confidence interval for a sampled mean.

Make sure you are using qnorm() correctly. Note: qnorm(1 − 𝛼) returns 𝑍 𝛼 corresponding to

𝑃(𝑍 > 𝑍𝛼) = 1 − 𝛼. For a 100𝛾 percent confidence interval, we have that 𝛼 = 1−𝛾. See

help(qnorm) for more details.

2. Load the housing data we’ve been using:

housing_df <-read.csv(“https://mattbutner.github.io/data/housing_df.csv”)

  1. Use a combination of the mean(), sd(), length(), qnorm(), and sqrt() functions to construct a 90% confidence interval for the CRIM variables.
  2. Interpret the confidence interval from CRIM.
  3. Download CI_sim.R from D2L. This file simulates a bunch of random samples of the same size, constructs the mean and a confidence interval for each sample, and reports the percent of the confidence intervals that capture the true population mean. Make sure you have sample_size <- 100 num_samples <- 50 ci_level <-0.95 set up in the beginning of the document. You will need to install the user written package ggplot2. To do this, type install.packages(“ggplot2”) into the console before you run the R script. You will need to be connected to the internet. For more information, see https://ggplot2.tidyverse.org/.

No need to provide the R code for these questions.

  1. Suppose you increase the sample size from 100 to 200.
    1. What happens to the width of the confidence intervals?
    2. Does the true population mean fall inside more of the confidence intervals?
  2. Return the sample size back to 100. Now change the number of samples from 50 to 100.
    1. How does this change the percentage of the confidence intervals that capture the population mean?
    2. As you increase the number of samples, towards infinity, what percentage of the confidence intervals will capture the true population mean?
  3. Return the number of samples back to 50. Now change the confidence level to 0.8.
    1. How does the width of the confidence intervals change?
    2. Does the percentage of confidence intervals that capture the population mean increase or decrease?
  4. You started taking the bus to work. The local transit authority says that a bus should arrive at your bus stop every five minutes. After a while, you notice you spend a lot more than five minutes waiting for the bus, so you start to keep a record.

You spend the next two months recording how long it takes for the bus to arrive to the bus stop. This give a total of sixty observations that denote the number of minutes it took for the bus to arrive (rounded to the nearest minute). These observations are hosted at

https://mattbutner.github.io/data/bus_stop_time.csv

  1. Load these data into R as a data frame titled bus_stop_time using the read.csv() command.
  2. Create a histogram of the time_until_bus varaible. Would you say that five minutes is a reasonable guess for the average arrival time based on this picture alone?
  3. Create 95% confidence interval for the bus arrival times using the z distribution. Does 5 minutes fall within the 95% confidence interval?
  4. How would you communicate your finding to the local transit authority?
  5. Identify, and download, the data set you plan on using for the R data project. Verify there is enough observations and variables to do something meaningful. Attach a link or short preview for the dataset. Briefly explain what you intend to do.