Urgenthomework logo
UrgentHomeWork
Live chat

Loading..

Enn543 Data Analysis And Optimisation Assessment Answers

A set of data were collected on multiple samples of red and white wine. These data include both objective measurements on chemical and physical properties of the wines, and subjective measurements of quality based on expert judgements. The data are included in the files:
winequality-red.csv
winequality-white.csv
Using these data
 
(a) Fit GLMs to the quality as a function of the other variables for both types of wine. Assume that quality follows a Poisson distribution.
 
(b) Compare which variables are significant in each case. What are there differences? What are the similarities?

Answer:

Given that,

Random variable X follows normal distribution with mean µ = 5 and standard deviation σ = 10.

Hence, Prob(X > 10) = 1- Prob(X<=10) = 1- Prob(Z<=(10-5)/10)) =1- Prob(Z<=0.5) = 1- 0.6915 = 0.3085. (probabilities under Z values are obtained from standard normal table)

Prob(−20 < X < 15) = Prob(X<=15) – Prob(X<=-20) = P(Z<=(15-5)/10) – P(Z<=(-20-5)/10) = P(Z<=(15-5)/10) – P(Z<=(-20-5)/10) = 0.8413 – 0.0062 = 0.8351. (Probability values are obtained from standard normal table)

Now, P(X > x) = 0.95 => P(X<=x) = 1-0.95 = 0.05 (As normal distribution is symmetric about the mean and the total probability is 1).

Now, from the standard normal table for Z = -1.65 the area under the normal curve is 0.05.

Hence, (x-5)/10 = -1.65 => x = -16.5 + 5 = -11.5

Hence, foe the value of x=-11.5, the area in the right tail of normal curve is 0.95.

Given that,

Random variable N follows Poisson distribution with mean µ=10000.

Now, through Gaussian approximation and the central limit theorem it can be shown that any sample mean distribution with specified mean and variance can be approximated to normal distribution with the same mean and variance as same as the mean of Poisson distribution.  

In this case the approximation will be N ~ normal(10000, sqrt(10000)) = normal(10000, 100)

Hence, by the approximation the value of P(N > 10,200) = 1 – P(N<=10200)

= 1- P(Z<=(10200-10000)/100) = 1 – 0.9772 = 0.0228.

Now, in MATLAB putting Poisson distribution to calculate the CDF of P(N > 10,200) or 1 – P(N<=10200) gives the result 0.0227.

MATLAB code:

P = 1 - cdf('Poisson',10200,10000);

disp(P)

ans =

0.0227

Hence, error in approximation of Poisson to normal is  |0.0228 – 0.0227| = 0.0001 which is very less.

Question 6:

 

The MLE (maximum likelihood estimate) of the above function is the value of  that maximizes the function L() = f(x1,x2,x3..|). Here, f = probability density function.

So, L( = (x1/(x2/(x3/….and so on.

Now, taking log on the above equation

  +

Now, the maximizing.

At, max(,

 = 0

  • = 0
  • =
  •  

Hence, for the given probability density function  gives the maximum likelihood estimate.

It is stated that the sample of data x= x1,x2,….xn follows a Poisson distribution with mean λ and that λ follows exponential distribution with parameter θ.

So, P(X) =

P() = θ e^(-

Hence, posterior probability = (Probability of likelihood)* (Prior probability)

=

Now, the above distribution is a Gamma distribution with parameters

β = θ + n,  α =  (Proved)

Question 8:

The variables of the yacht.dat file are the following in order.

X1 Residuary resistance per unit weight of displacement, adimensional

V2 Longitudinal position of the center of buoyancy, adimensional

V3 Prismatic coe?cient, adimensional

V4 Length-displacement ratio, adimensional

V5 Beam-draught ratio, adimensional

V6 Length-beam ratio, adimensional

V7 Froude number, adimensional

Now, using fitlm command in MATLAB the dependent variable X1 is fitted with respect to independent variables V2 to V7.

MATLAB command:

% the yacht.dat is loaded by selecting it from folder

lrm = fitlm(yacht,'X7~V1+V2+V3+V4+V5+V6');

disp(lrm)

Linear regression model:

    X7 ~ 1 + V1 + V2 + V3 + V4 + V5 + V6

Estimated Coefficients:

   Estimate      SE        tStat        pValue  

                   ________    _______    ________    __________

    (Intercept)      154.51     32.359       4.775    2.8055e-06

    V1             0.018076    0.44595    0.040534       0.96769

    V2              -301.54     52.185     -5.7783    1.8779e-08

    V3              -9.8484     18.656    -0.52791       0.59795

    V4               7.0168     7.2464     0.96832       0.33366

    V5               7.6548     18.712     0.40908       0.68277

    V6               73.168     5.1483      14.212    1.8803e-35

Number of observations: 309, Error degrees of freedom: 302

Root Mean Squared Error: 11.8

R-squared: 0.402,  Adjusted R-Squared 0.39

F-statistic vs. constant model: 33.9, p-value = 3.44e-31

Hence, the linear regression model is,

X1 = 154.51 + 0.018V1 -301.54V2 -9.848V3 + 7.017V4 +7.655V5 + 73.168V6.

Now, this linear regression model can be used as a function of the independent variables and then for some values of the independent variables the estimate of X1 can be evaluated using the ‘feval’ function in MATLAB. Now, the exactness of the regression equation can be verified by dividing the total dataset in two namely, the training dataset (80% data) and the validation

dataset (20% data). MATLAB command fitlm will be evaluated in the training dataset and the regression equation obtained will be used to evaluate using feval function with the validation set.

Question 9:

In this question a generalized linear regression model is fitted for both red wine ‘quality’ variable and white wine ‘quality’ variable assuming Poisson distribution.

  1. Model fitting for red wine and white wine model:

MATLAB code with output:

% manually load winequalityred.csv from folder

% winequalitywhite.csv and winequalityred.csv are manually loaded from folder

model = 'quality~fixedacidity +volatileacidity + citricacid + residualsugar + chlorides + freesulfurdioxide + totalsulfurdioxide + density + pH + sulphates + alcohol';

lrm1 = fitglm(winequalityred,model,'Distribution','poisson');

disp(lrm1)

lrm2 = fitglm(winequalitywhite,model,'Distribution','poisson');

disp(lrm2)

Output:

lm1 =

 

 

Generalized linear regression model:

    quality ~ [Linear formula with 12 terms in 11 predictors]

    Distribution = Poisson

Estimated Coefficients:

                           Estimate          SE         tStat       pValue  

                          ___________    __________    ________    _________

 

    (Intercept)                3.6538         13.67     0.26728      0.78925

    fixedacidity            0.0036583      0.016633     0.21994      0.82592

    volatileacidity           -0.1977       0.08039     -2.4593     0.013921

    citricacid              -0.035923      0.096141    -0.37365      0.70866

    residualsugar           0.0026177      0.009736     0.26887      0.78803

    chlorides                -0.33176       0.27688     -1.1982      0.23084

    freesulfurdioxide      0.00082523     0.0014126     0.58418       0.5591

    totalsulfurdioxide    -0.00061063    0.00047979     -1.2727      0.20312

    density                   -2.1729        13.953    -0.15573      0.87624

    pH                      -0.074826       0.12406    -0.60317       0.5464

    sulphates                 0.15912      0.072618      2.1912     0.028434

    alcohol                   0.04815      0.016999      2.8325    0.0046188

 

 

1599 observations, 1587 error degrees of freedom

Dispersion: 1

Chi^2-statistic vs. constant model: 66.1, p-value = 6.81e-10

lm2 =

Generalized linear regression model:

    quality ~ [Linear formula with 12 terms in 11 predictors]

    Distribution = Poisson

 

Estimated Coefficients:

                           Estimate          SE         tStat        pValue  

                          ___________    __________    ________    __________

 

    (Intercept)                28.094        11.144      2.5211      0.011698

    fixedacidity             0.012809      0.011881      1.0781         0.281

    volatileacidity          -0.33456      0.064234     -5.2085    1.9041e-07

    citricacid              0.0025292      0.053278    0.047471       0.96214

    residualsugar            0.014557     0.0043653      3.3347    0.00085393

    chlorides               -0.062667       0.31275    -0.20037       0.84119

    freesulfurdioxide      0.00062244    0.00046312       1.344       0.17894

    totalsulfurdioxide    -3.6945e-05    0.00021042    -0.17558       0.86063

    density                   -27.359        11.298     -2.4215      0.015457

    pH                         0.1235      0.059026      2.0922      0.036417

    sulphates                 0.10875      0.054501      1.9953      0.046011

    alcohol                   0.03036      0.014207       2.137      0.032594

 

 

4898 observations, 4886 error degrees of freedom

Dispersion: 1

Chi^2-statistic vs. constant model: 185, p-value = 1.04e-33

  1. As the overall p value of the white wine model is less than considered significance level of 0.05, so the model is appropriate. Now, the independent variables which are significant are volatileacidity, residualsugar, density, pH and alcohol as the p values of these variables is less than the considered significance level of 0.05.

Similarly, the red wine model is a proper fit as overall p value is 6.81e-10 which is less than considered level of significance of 0.05.

In this model the independent variables which are significant are volatileacidity, sulphates and alcohol as the p values of those are less than 0.05.

So, in white wine model there are more significant independent predictor variables than in red wine model. The similarity of these two models are

  1. a) both models are significant
  2. b) volatileacidity and alcohol are significant independent variables in both.

Buy Enn543 Data Analysis And Optimisation Assessment Answers Online


Talk to our expert to get the help with Enn543 Data Analysis And Optimisation Assessment Answers to complete your assessment on time and boost your grades now

The main aim/motive of the management assignment help services is to get connect with a greater number of students, and effectively help, and support them in getting completing their assignments the students also get find this a wonderful opportunity where they could effectively learn more about their topics, as the experts also have the best team members with them in which all the members effectively support each other to get complete their diploma assignments. They complete the assessments of the students in an appropriate manner and deliver them back to the students before the due date of the assignment so that the students could timely submit this, and can score higher marks. The experts of the assignment help services at urgenthomework.com are so much skilled, capable, talented, and experienced in their field of programming homework help writing assignments, so, for this, they can effectively write the best economics assignment help services.


Get Online Support for Enn543 Data Analysis And Optimisation Assessment Answers Assignment Help Online


); }
Copyright © 2009-2023 UrgentHomework.com, All right reserved.