Ict110 Introduction To Data Science Assessment Answers

⭳ 34 Download 📄 5 Pages / 1182 Words

Describe how to load the data, and the libraries needed. Provide an overview of the data about its dimensions and structures

Answer:

Introduction

Authorisation and Purpose

The study aims to analyse the vital statistics of East Asia and Pacific region from the year 2001 to the year 2015 by collecting the dataset from the World Bank. The implication of this analysis will be done by government planners to improve the health of that region.

Limitations

The primary constraint of this study is that the research and analysis is limited for East Asia and Pacific region only. Moreover, the data is collected from World Bank which is secondary of nature and it is another limitation.

Scope

The present study consists of 26 attributes holding information about health of the above mentioned region. Besides, the data contains information for a long time period of 15 years. The analysis can be performed using statistical analysis and interpreting the graphs. However, the data has lots of missing observations.

The analysis is proceeded through one-variable analyses, two-variable analyses. On the next step, the data is clustered using k-means clustering technique and finally, the data is analysed by fitting linear regression lines between two attributes.

Methodology

The information has been generated from World Bank. The dataset is quantitative in nature and contains information about health for the time period of 2001 to 2015.

Data Setup

The data is loaded into the “R” program before the analysis. A pop-up window gets opened after running the first line of the code and then the data file (in csv format) is selected by inputting the location of the data file. The missing values are addressed in the first line of the code as missing values.

At the second step, the necessary library files are loaded to the “R” program to perform the required statistical analyses and to display all the graphical presentations.

Exploratory Data Analysis

One variable analysis

One Variable Analysis – 1

The per capita gross national income (GNI) is analysed under the section of one-variable study. The average amount of GNI per capita in that region is 11522.45 and the standard deviation is 15406. The minimum amount and the maximum percentage of immunized one-year old children is 310 and 76300 respectively. The boxplot analysis shows that the dataset has many outliers.

One Variable Analysis – 2

The second variable is percentage of Tertiary school enrolment where the minimum value of the percentage is 37.81 and the standard deviation is 22.19. The minimum percentage of school enrolment is 3.1 and the maximum is 83.6. The boxplot of the distr8ibution of tertiary school enrolment percentage is negatively skewed.

One Variable Analysis – 3

The distribution of the rate of total unemployment is analysed in the course of the one-variable analysis. The distribution is graphically represented with the help of histogram that shows the distribution is positively skewed.

Two-variable analysis

Two-variable analysis 1

In the course of two-variable analysis, the Gross national income is analysed country wise for the region and it is graphically represented by side-by-side Box-plot. The graph shows that there has been huge variation in GNI during the time period of 2001 to 2015. The maximum gross national income has been obtained for the country Macao SAR, China.

Two-variable analysis 2

The total distribution of unemployment has been analysed here with respect to its change for each country for the tie period of 15 years. The side-by-side Box-plot has been used to represent the variation in the unemployment rate for the countries. There are outliers for the countries having country codes ‘KIR’, ‘PLW’, ‘SLB’. The box-plot having longest whiskers is for country having country code ‘MAC’ that indicates that the spread of the distribution of unemployment rare for this country is widest.

Advanced analysis

Clustering

Brief explanation of k-means and clustering

Clustering means segregating the entire dataset into smaller groups having similar characteristics. K-means clustering is a special type of non-hierarchical clustering technique that uses the centroid distances for group segmentation (Oleiwi 2016). The centroids are initially selected and the data points are assigned into them on the basis of the nearest distance from the centroid. The process is repeated until all the data points are assigned into groups (Cohen et al. 2015).

Clustering Analysis

The per capita gross national income and the total unemployment rate has been taken account for performing k-means clustering analysis for the year 2014. There are three optimal clusters that was found after scaling. From the graphical analysis it is seen that there are three groups-

Low GNI and High rate of total unemployment

Linear regression

Brief definition of linear regression

The linear regression analysis predicts the linear relationship between the explained variable and one or more explanatory variable(s) (Theobald and Freeman 2014).

Linear Regression 1

The dependent variable in this case is Total unemployment rate and the independent variable is tertiary school enrolment. The total unemployment rate is predicted by the follow8ing regression equation:

Total unemployment rate = 3.769558 + 0.009769* Tertiary school enrolment

Linear Regression 2

The relation between total unemployment rate (independent variable) and the GNI per capita (dependent variable) is shown in the following graph. The predicted regression equation is given by

GNI = 12974.4 + 395.2 * Total unemployment rate

The slope is positive here that indicates that there would be increase in per capita GNI for corresponding increase in total unemployment rate (Darlington and Hayes 2016).

Conclusion

The report about the health and population statistics shows important analysis of the East Asia and Pacific region. From the analysis, it can be concluded that there is high level of GNI per capita for the country having country code MAC. Besides, there are outliers in the distribution of Gross national income if analysed country wise. In addition, there are three optimal clusters if the Total unemployment rate is grouped on the basis of GNI per capita. On the other hand, it has been found that, if the tertiary school enrolment is increased then the total unemployment will also be increased. Besides, if there is any increase in total unemployment rate, then there will be increase in GNI per capita.

Reflection

The entire analysis was made interesting with the analysis of different attributes for different time periods. This study shows variation in the total unemployment rate and also in the change of GNI per capita for the East Asia and Pacific region.

References

Cohen, M.B., Elder, S., Musco, C., Musco, C. and Persu, M., 2015, June. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing (pp. 163-172). ACM.

Darlington, R.B. and Hayes, A.F., 2016. Regression analysis and linear models: Concepts, applications, and implementation. Guilford Publications.

Oleiwi, W.K., 2016. Using the Fuzzy Logic to Find Optimal Centers of Clusters of K-means. International Journal of Electrical and Computer Engineering, 6(6), p.3068.

Theobald, R. and Freeman, S., 2014. Is it the intervention or the students? Using linear regression to control for student characteristics in undergraduate STEM education research. CBE-Life Sciences Education, 13(1), pp.41-48.

Buy Ict110 Introduction To Data Science Assessment Answers Online

Talk to our expert to get the help with Ict110 Introduction To Data Science Assessment Answers to complete your assessment on time and boost your grades now

The main aim/motive of the management assignment help services is to get connect with a greater number of students, and effectively help, and support them in getting completing their assignments the students also get find this a wonderful opportunity where they could effectively learn more about their topics, as the experts also have the best team members with them in which all the members effectively support each other to get complete their diploma assignments. They complete the assessments of the students in an appropriate manner and deliver them back to the students before the due date of the assignment so that the students could timely submit this, and can score higher marks.Â The experts of the assignment help services at urgenthomework.com are so much skilled, capable, talented, and experienced in their field of programming homework help writing assignments, so, for this, they can effectively write the best economics assignment help services.

Get Online Support for Ict110 Introduction To Data Science Assessment Answers Assignment Help Online

Resources

24 x 7 Availability.
Trained and Certified Experts.
Deadline Guaranteed.
Plagiarism Free.
Privacy Guaranteed.
Free download.
Online help for all project.
Homework Help Services

Not the Exact Question you were looking for ? Post your question for assignment help and get instant help on your homework and assignment questions from our experts