HI6007 Statistics for business decisions T2 2021 Final Assignment Holmes Institute
Briefly discuss the following with relevant examples.
ANSWER: ** Answer box will enlarge as you type
Part A
A parameter is a number describing a whole population (e.g., population mean), while a statistic is a number describing a sample (e.g., sample mean).
PART B
Descriptive Statistics
It describes the important characteristics/ properties of the data using the measures the central tendency like mean/ median/mode and the measures of dispersion like range, standard deviation, variance etc.
Inferential Statistics
It is about using data from sample and then making inferences about the larger population from which the sample is drawn. The goal of the inferential statistics is to draw conclusions from a sample and generalize them to the population.
ANSWER:
Government wants to analysis the peoples’ desire for covid vaccination and willingness to help for government plan for Covid free Australia
Then the best sampling plan would be simple random sampling of the citizens. It is a reliable method of obtaining information where every single member of a population is chosen randomly, merely by chance. Each individual has the same probability of being chosen to be a part of a sample.
The alternative sampling plan would be stratified sampling. Stratified random sampling is a method in which the researcher divides the population into smaller groups that don’t overlap but represent the entire population. While sampling, these groups can be organized and then draw a sample from each group separately. Thus the government can divide citizens based on their age group strata or annual income level strata and then pick random samples from each strata
A group of researchers wants to estimate the living standard of people in regional Victoria.
The best sampling plan would be convenience sampling. This method is dependent on the ease of access to subjects such as surveying customers at a mall in Victoria or passersby on a busy street in Victoria
The alternative sampling plan would be snowball sampling. The government choose to recruit few people living in Australia who would further nominate their known living in Victoria to participate in the survey.
Sales revenue ($M) 
9.6 
11.3 
12.5 
9.5 
8.5 
12 
11.4 
12.5 
13.8 
14.6 
Advertising expenditure ($000) 
23 
40 
55 
54 
28 
25 
31 
36 
88 
90 
(Note: Excel calculations are not allowed, and students are required to show all the steps in calculations)
ANSWER:
Lets sales be X and Advertising expenditure be Y
X Values
∑ = 115.7
Mean = 11.57
∑(X  Mx)2 = SSx = 33.761
Y Values
∑ = 470
Mean = 47
∑(Y  My)2 = SSy = 5490
X and Y Combined
N = 10
∑(X  Mx)(Y  My) = 305.2
R Calculation
r = ∑((X  My)(Y  Mx)) / √((SSx)(SSy))
r = 305.2 / √((33.761)(5490)) = 0.7089
The value of R is 0.7089.
This is a moderate positive correlation, which means there is a tendency for high X variable scores go with high Y variable scores (and vice versa).
= 33.911
We find that the covariance coefficient obtained is positive, implying that Sales revenue and Advertising expenditure move together; as one increases (decreases), the other also tends to increase (decrease).
ANSWER:
Z = 2.576 at level of significance = 0.01
Margin of error = 3%
Then
N = 0.5*(10.5)*(2.576)^2/(0.03)^2 = 1849
ANSWER:
ANSWER:
ANSWER:
You have been given following data set related to sales of Product X(units) in 3 different locations.
Location 1 
45 
27 
39 
42 
28 
Location 2 
30 
29 
36 
21 
24 
Location 3 
19 
25.5 
27.6 
31.5 
34.6 
You are required to answer following questions.
ANSWER:
Null Hypothesis, H_{0}: µ_{1} = µ_{2} = µ_{3}
Alternative Hypothesis, H_{a}: Not all means are equal
ANSWER:
Assuming true the null hypothesis at 5% level of significance we will Reject the null hypothesis H_{0} if the p value is less than 5%.
ANSWER:
The f value is 2.569. The pvalue is .117814. The result is not significant at p < .05.
location 1 
location 2 
location 3 

45 
30 
19 

27 
29 
25.5 

39 
36 
27.6 

42 
21 
31.5 

28 
24 
34.6 

N 
5 
5 
5 

∑X 
181 
140 
138.2 

Mean 
36.2 
28 
27.64 

∑X^{2} 
6823 
4054 
3962.42 

Std.Dev. 
8.228 
5.7879 
5.9702 

Source 
SS 
df 
MS 

Between 
234.4053 
2 
117.2027 
F = 2.56943 

Within 
547.372 
12 
45.6143 

Total 
781.7773 
14 
ANSWER:
The pvalue is 0.1178.
Since the pvalue (0.1178) is greater than the significance level (0.05), we fail to reject the null hypothesis. The result is not significant at p < .05.
Therefore, we cannot conclude that there are significant differences between the sales.
Note: No excel ANOVA output allowed. Students need to show all the steps in calculations.
An agronomist undertook an experiment to investigate the factors that potato harvest. In his research, agronomist decided to divide the farm into 30 half hectare plots and apply varies level of fertilizer. Potato was then planted and the harvest at the end of the season was recorded.
Fertilizer(Kg) 
Harvest (tons) 
210 
43.5 
220 
40.0 
230 
48.0 
240 
65.0 
250 
80.0 
260 
85.0 
270 
95.0 
280 
80.0 
290 
97.3 
Note: No excel ANOVA output allowed. Students need to show all the steps in calculations.
You are required to;
ANSWER:
Let fertilizer(kg) be X
Let harvest ( tons) be Y
Sum of X = 2250
Sum of Y = 633.8
Mean X = 250
Mean Y = 70.4222
Sum of squares (SSX) = 6000
Sum of products (SP) = 4492
Regression Equation = ŷ = bX + a
b = SP/SSX = 4492/6000 = 0.74867; where b is the slope coefficient of fertilizer
a = MY  bMX = 70.42  (0.75*250) = 116.74444; where a is the constant
ŷ = 0.74867X  116.74444
this implies that without any fertilizer ( X = 0) there is a harvest of 116.74 which means that infact the crop is all destroyed.
The slope coefficient of fertilizers denotes that for every 1 kg increase in application of fertilizer, the harvest increases by 0.749 tons.
the regression equation for Y is:
ŷ = 0.74867X  116.74444
ANSWER:
R= SSXY/sqrt(SSXX*SSYY)
Then R = 0.928
Then coefficient of determination ( R2 ) = 0.928*0.928 = 0.8612
this means that nearly 86.12% of variations in the harvest can be explained by the variation in the application of fertilizers
ANSWER:
Since the coefficient of determination if high, the model is definitely useful in predicting the potato harvest.
Harvest = 116.7444 + 0.74867*(250)
= 70.4306
Hence, predicted value for 250kg fertilizer will be 70.431 tons
ABX Delivery provides the service across all the states in Australia. Marketing manager of this company wants to identify key factors that affect the time to unload a truck. A random sample of 50 deliveries was observed following data were reported.
Time to unload a truck (in minutes),
total number of cartons and
the total weight (in hundreds of Kilograms).
Following tables shows the regression output of the sample data set.
SUMMARY OUTPUT 

Regression Statistics 

Multiple R 
0.836420803 
R Square 
0.699599759 
Adjusted R Square 
0.68681677 
Standard Error 
8.823384264 
Observations 
50 
ANOVA 

df 
SS 
MS 
F 
Significance F 

Regression 
2 
8521.530836 
4260.765 
54.72897 
0.000000 
Residual 
47 
3659.049164 
77.85211 

Total 
49 
12180.58 
Coefficients 
Standard Error 
t Stat 
Pvalue 

Intercept 
13.669 
7.829028389 
1.74599 
0.087346 
Cartons 
0.5172 
0.067246763 
7.691119 
0.000000 
Weight 
0.2901 
0.11166803 
2.597671 
0.012494 
ANSWER:
TIME TO UNLOAD A TRUCK=13.669+0.5172*CARTONS+0.2901*WEIGHT
(2 marks)
ANSWER:
CASE 1:
For cartons.
Null hypothesis H0: b1 = 0
Alternate hypothesis Ha: b1 ≠ 0
Assuming true the null hypothesis at 95% level of significance we conduct a t test on the regression coefficient of carton (b1). From the above regression table p value for coefficient of cartons is 0.0000; As the pvalue is less than 0.05, the null hypothesis is rejected at 5% level of significance and hence it can be concluded that the independent variable CARTONS is significant at 5% level of significance.
CASE 2
For Weight
Null hypothesis H0: b2 = 0
Alternate hypothesis Ha: b2 ≠ 0
Assuming true the null hypothesis at 95% level of significance we conduct a t test on the regression coefficient of weight (b2).The pvalue is obtained from the table as 0.012494; As the pvalue is less than 0.05, the null hypothesis is rejected at 5% level of significance and hence it can be concluded that the independent variable weight is significant at 5% level of significance.
ANSWER:
The value of R2 is obtained as 0.699599759; It can be interpreted that 69.96% of all the variance of the dependent variable can be explained by the chosen independent variables. Thus, the model fit is good.
ANSWER:
We can think of adding two new explanatory variables that can affect unloading time such as (i) Number of manpower involved in unloading the truck and (ii) Total weight of the manpower involved in unloading the truck.
With the addition of these two new variables, there can be following implications of the OLS models that There can be multicollinearity. Multicollinearity generally occurs when there are high correlations between two or more predictor variables.
Follow Us