301114 The Nature Of Data-Diastolic Assessment Answers

In this assignment there are 3 questions (split into various parts). For each question/part you should draw appropriate plots, conduct the analysis and describe the conclusions in words. A report can be submitted as either a PDF (preferred) or Word document. It is advisable that your report includes the R-code used so that partial credit can be awarded in case of error. You can create the word/pdf file using Rmarkdown if you know how, but this is not compulsory.
Submission is due by Friday for week 12. Submission is by the vUWS online system.
1. a. The following table shows the number of prisoners in Australian prisons in 2014, broken down by age group and gender. Is there evidence that the age distribution differs by gender Make a appropriate plot of the data.
Males Females
19 years and under 920 65
20 to 24 years 4684 316
25 to 29 years 5624 475
30 to 34 years 5407 483
35 to 39 years 4609 404
40 to 44 years 3756 349
45 to 49 years 2408 234
50 to 54 years 1541 135
55 to 59 years 975 72
60 to 64 years 595 32
65 years and over 681 27

prisoners = cbind(Males = c(920, 4684, 5624, 5407, 4609, 3756, 2408, 1541, 975, 595, 681),
Females= c(65, 316, 475, 483, 404, 349, 234, 135, 72, 32, 27))
rownames(prisoners) = c("19 years and under", "20 to 24 years", "25 to 29 years",
"30 to 34 years", "35 to 39 years", "40 to 44 years",
"45 to 49 years", "50 to 54 years", "55 to 59 years",
"60 to 64 years", "65 years and over")

b. Data has been collected on the number of car insurance claims in two areas of Sweden, Stockholm and surrounds and rural southern Sweden. In Stockholm there were 23174 claims from 326149 policies, whereas in the rural area there were 31913 claims from 846957 policies. Is there evidence that the rate of car insurance claims is different in the two areas
2. The file PIMA.csv contains information about the Pima people from North America. A number of social and environmental factors have contributed to them having one of the highest rates of type 2 diabetes in the world. This data set contains information of around 700 female individuals, including • ever.pregnant — Whether the individual has ever been pregnant
• diastolic — diastolic blood pressure
• bmi — Body Mass Index
• age — age in years

Is there evidence that body mass index differs for women who have had a pregnancy versus those that haven’t In what direction is any difference Make an appropriate plot of the data.
3. a. Diastolic blood pressure is thought to vary by age. Make an appropriate plot of the data. Compute a 95% confidence interval for the Pearson correlation of age and diastolic blood pressure in female Pima people.
b. Fit the simple linear regression of diastolic and age. Interpret the slope of the regression. Compute a 95% confidence interval for the mean diastolic blood pressure of a 40 year old female of the Pima people.

Answer:

Though there is a slight difffference in the age distribution between the two gen ders in the prisoners, the distribution is not very signifificant. This is illustrated by the bar plots. To plot the bar plots, we make use of the following R Code:

>prisoners = cbind(Males = c(920, 4684, 5624, 5407, 4609, 3756, 2408, 1541, 975, 595, 681),

Females= c(65, 316, 475, 483, 404, 349, 234, 135, 72, 32, 27))

>rownames(prisoners) = c("19 years and under", "20 to 24 years", "25 to 29 years","30 to 34 years",

"35 to 39 years", "40 to 44 years","45 to 49 years", "50 to 54 years", "55 to 59 years",

"60 to 64 years", "65 years and over")

>barplot(prisoners, main="Bar plot of Gender vs age of Prisoners",col=colors(),

xlab="gender of prisoners",legend=rownames(prisoners))

The resulting bar plot is illustrated The rate(ratio) of car insurance claims to policies in Stockholm is

23174

32614

0.0711, whereas, the rate of car insurance claims to policies in rural areas is

31913

846957 = 0.0377. Thus, from the rates, it is clear that there are more claims in

Stockholm as compared to rural areas.Yes, the Body Mass Index(BMI) diffffers for those PIMA females who have been pregnant with those who have never been pregnant. This is seen in the box-plots for the Body Mass Index with respect to their pregnancy status. The code in R to produce the box plots are as follows:

>Sheet=read.csv("/Downloads/new_file.csv",sep=",",header = T)

>boxplot(Sheet$BMI~Sheet$EVER.PREGNANT, main="Difference in BMI with State of Pregnancy",

xlab="State of Pregnancy", ylab="BMI")

1plot of Q1(1).png plot of Q1(1).png

Figure 1: Bar plot of data

In the above code, the‘new fifile.csv‘ is the modifified PIMA.csv fifile, in which the fifirst column is split into four columns of AGE, DIASTOLIC, BMI and EVER.PREGNANT for ease of analysis.

The box plots produced The plot clearly shows that females who have never been pregnant to have higher BMI than those who have been pregnant.The scatter plot, when plotted clearly shows us a non-zero correlation between the Diastolic Blood pressure and the age of the PIMA people. We calculate the coeffiffifficient of correlation by Pearson’s method using R’s bulit in ‘cor‘ function. We also calculate the correlation coeffiffifficient using Spearman Rank correlation.The confifidence interval at 95% level is calculated by permuting or bootstrapping the data. Alternatively, the 95% confifidence interval is also calculated using the function ‘CIr‘ from the package ”psychometric”. The exact R code is as follows:

(i)For plotting the scatter plot and histograms:

>Sheet=read.csv("/Downloads/new_file.csv",sep=",",header = T)

>plot(Sheet$AGE, Sheet$DIASTOLIC)

>par(mfrow=c(2,1))

>hist(Sheet$AGE)

>hist(Sheet$DIASTOLIC)

The scatter plot is The code for calculating the coeffiffifficient of correlation is:

>Sheet=read.csv("/Downloads/new_file.csv",sep=",",header = T)

>cor(Sheet$AGE,Sheet$DIASTOLIC, method="pearson")

>cor(Sheet$AGE,Sheet$DIASTOLIC,method="spearman")

The coeffiffifficient of correlation was found out to be 0.3259467(pearson’s method)

and 0.3676724(spearman’s method).

The code for calculating the confifidence interval at 95% is given below. In addition, we also have the code for the evidence that correlation is not random. The correlation assuming the null hypothesis-that the true correlation is zero,

is calculated for the bootstrapped data and compared. The observed correlation coeffiffifficient is greater than that obtained when the data set is bootstrapped. Moreover, the observed correlation coeffiffifficient lies within the confifidence interval limits of the correlation coeffiffifficient by bootstrapping at a signifificance level of 95%. Hence, the null hypothesis can be rejected and thus, there is a correlation between age and diastolic blood pressure.

>Sheet=read.csv("/home/prajnan/Downloads/new_file.csv",sep=",",header = T)

>obs.cor = cor(Sheet$DIASTOLIC,Sheet$AGE)

>x= replicate(1000, {

post.perm = sample(Sheet$AGE)

cor(Sheet$DIASTOLIC, post.perm)

})

>hist(x,col=colors(),xlab="correlation coefficient assuming null hypo")

>x= replicate(1000, {

samp = sample(1:n, replace=TRUE, size=n)

cor(Sheet$DIASTOLIC[samp], Sheet$AGE[samp])

})

>quantile(x,c(0.050,0.950))

>hist(x,col=colors(),xlab="correlation coefficient of the simulations")

4The confifidence interval was obtained to be (0.2692707, 0.3780436).The histogram of correlation coeffiffifficients assuming null hypotheis is shown coeffff assuming null.png coeffff assuming null.png The histogram of simulated correlation coeffiffifficient showing confifidence inter val is as follows:

confifidence interval.png confifidence interval.png The confifidence at 95% level was also calculated using the package ”psycho

metric” as follows:

>library(psychometric)

>CIr(r=0.3259467, n=732, level=0.95)

In the above code, ‘r‘ corresponds to the coeffiffifficient of correlation, ‘n‘ to the

sample size and ‘level‘ to the confifidence level.

The output of the code was the interval having lower limit = 0.2596146 and upper limit = 0.3892176. We note the difffference in the two methods clearly.The linear regression for the data between Age and Diastolic Blood pressure is fifitted using the data provided in the ‘csv‘ fifiles and the R functions ‘lm‘ and ‘abline‘. Later the null hypothesis that the slope of the regression line is zero is tested by a similar method of bootstrapping applied to the correlation coeffiffifficient before. Since the observed slope lies within the confifidence interval limits for the simulated slope at a signifificance level of 95%, the null hypothesis is rejected.The exact codes are as follows:

>lm(Sheet$AGE~Sheet$DIASTOLIC)

>plot(Sheet$AGE,SHeet$DIASTOLIC)

>abline(lm(Sheet$AGE~Sheet$DIASTOLIC)

>slope = coef(lm(Sheet$DIASTOLIC~Sheet$AGE))[2]

>x= replicate(1000, {

height.perm = sample(Sheet$AGE)

coef(lm(Sheet$DIASTOLIC~height.perm))[2]

>hist(x,col=colors(),xlab="slope assuming null hypo")

The resulting output(regression line) The histogram of the slope assuming null hypothesis is shown below 6ssuming null hypo.png ssuming null hypo.pngThe intercept and the slope of the regression was shown to be: Intercept=10.9172

and slope(Sheet$DIASTOLIC)=0.3095 The slope implies that on an average,the Diastolic blood pressure increases by 0.3095 per year increase in the age of PIMA Female.

To calculate the 95% confifidence interval for the mean diastolic blood pressure of 40 year old females, we fifirst fifiltered the ‘csv‘ fifile to include only those rows that correspond to 40 in the AGE column. Then, we used the AVERAGE() and STDEV() functions to calculate the Mean and Standard Deviation of the Diastolic Blood pressure of the 12 40 year old females. Lastly, we used the R’s built in function ‘qnorm‘ to calculate the 95% confifidence interval by using the following formula and code

> a=75

> s=8.50668

> n=12

> error=qnorm(0.975)*s/sqrt(n)

> left=a-error

> right=a+error

In the above code, fifirstly, the error is assumed to be normally distributed.The variable ‘a‘ and ‘s‘ are the mean and standard deviations of the Diastolic blood pressure of the 12 females aged 40. ‘n‘ is the sample size, which is 12.‘left‘ refers to the lower confifidence limit and ‘right‘ refers to the upper confifidence limit. The default of 95% is taken into consideration. The result of the above code was: Lower limit = 70.18698 and Upper limit

= 79.81302, which implies that the Diastolic Blood pressure is within the range

of 70.18698 to 79.81302 at 95% confifidence interval.

References

[1] How to Interpret a Regression Line(2017)[online]. Accessed from https://www.dummies.com/education/math/statistics/how

to-interpret-a-regression-line/ on 04/10/2017

7[2] Calculating Confifidence Intervals-R Tutorial (2017)[online]. Accessed

from https://www.cyclismo.org/tutorial/R/confifidence.html on 04/10/2017

[3] Confifidence Interval for Linear Regression—R Tutorial(2017)[online]. Ac

cessed from https://www.r-tutor.com/elementary-statistics/simple-linear-regression/confifidence

interval-linear-regression on 04/10/2017

[4] Data Frame—R Tutorial(2017)[online]. Accessed from https://www.r

tutor.com/r-introduction/data-frame on 04/10/2017

[5] How to Calculate Confifidence Intervals of Correlations with R—R-Bloggers (2017)[online].

Accessed from https://www.r-bloggers.com/how-to-calculate-confifidence-intervals

of-correlations-with-r/ on 04/10/2017

[6] Histogram in R from a csv fifile with four columns- StackOverflflow(2017)[online].

Accessed from https://stackoverflflow.com/questions/46569340/histogram-in-r-from

a-csv-fifile-with-four-columns on 04/10/2017

[7] Linear Regression in R(2017)[online]. Accessed from https://r-statistics.co/Linear

Regression.html on 04/10/2017

[8] Quick R:Correlations(2017)[online]. Accessed from https://www.statmethods.net/stats/frequencies.html

on 04/10/2017

[9] Quick R :Bar Plots(2017)[online]. Accessed from https://www.statmethods.net/graphs/bar.html

on 04/10/2017

[10] Quick R; Box Plots(2017)[online]. Accessed from www.statmethods.net/graphs/boxplot.html

on 04/10/2017

[11] Data Visualization in R(2017)[online]. Accessed from https://www.datacamp.com/courses/data

visualization-in-r on 04/10/2017.

Buy 301114 The Nature Of Data-Diastolic Assessment Answers Online

Talk to our expert to get the help with 301114 The Nature Of Data-Diastolic Assessment Answers to complete your assessment on time and boost your grades now

The main aim/motive of the management assignment help services is to get connect with a greater number of students, and effectively help, and support them in getting completing their assignments the students also get find this a wonderful opportunity where they could effectively learn more about their topics, as the experts also have the best team members with them in which all the members effectively support each other to get complete their diploma assignments. They complete the assessments of the students in an appropriate manner and deliver them back to the students before the due date of the assignment so that the students could timely submit this, and can score higher marks.Â The experts of the assignment help services at urgenthomework.com are so much skilled, capable, talented, and experienced in their field of programming homework help writing assignments, so, for this, they can effectively write the best economics assignment help services.

Get Online Support for 301114 The Nature Of Data-Diastolic Assessment Answers Assignment Help Online

Not the Exact Question you were looking for ? Post your question for assignment help and get instant help on your homework and assignment questions from our experts