Analysing Household Data
Have you been part of a national census? Privacy issues aside, a census provides lots of data that can inform a government policies and actions - but to be useful, the data needs to be analysed and interpreted.
In this assignment, we will use statistical methods to analyse and interpret real world demographic data.
The goal of this assignment is to
- Test your understanding of statistical methods and approaches
- Improve your ability to use Excel for manipulation of data(see here for some guides on using Excel)
- Understand the real-world applicationsand implications of statistics
To complete this assignment you must
- Complete a set of statistical analysis tasks on a unique data set (both tasks and data set will be provided to you)
- Submit in a report in word detailing your response to each task (the final answer and reasoning / calculations that led to it)
- Submit an excel document that contains your data set and the calculations you used to complete the tasks
Tasks for Analysis of Data Set
Complete the following tasks based on the unique data set you generated.
- Draw a random sample of two hundred (200) households as per the sample selection procedure. What sampling method have you used to select your sample data? In your opinion, is this the best method of sampling, why or why not?
- Compute the descriptive statistics and draw a Box-Whisker plot of Expenditures on the following variables (all series in one graph!);
- Also, use an appropriate measure of variation tocompare the variability in expenditures on these four variables. Explain, why is this an appropriate measure.
- Present a summary of your findings about the shape and spread of the distribution of these variables using information from the boxplots and the descriptive statistics.
- Construct a frequency distribution of the expenditures on Utilities, using the following classification ;
- What is the percentage of households that spend on Utilities
- at the most $ 1200 per annum
- between $1200 and $2400 per annum, and
- more than $2400 per annum.
- Draw the histogram of the expenditures on Utilitiesby households in your sample. Do you think the utility expenditures are normally distributed? Provide the “statistical reason” for your answer?
- What is the top 10% value and the bottom 10 % value of household’s annual after tax income (AtaxInc)? What does these two values imply?
- What does the mean (average) of variable OwnHouseimply?
- What is the probability that a randomly selected household will have a family size (FS= Adults + Children) equal to 5?
- Draw a scatter plot of natural log of total expenditures against natural log of after tax income, that is, ln(texp) against ln(ataxinc) and compute the coefficient of correlation. Express your finding about the relationship between the two variables.
- Construct a contingency table between the gender and the level of education. Using information in this table, can we say that male and female heads of the households differ in their higher level of qualification?
- What is the probability that the head of household is a female and her higher level of education is Intermediate?
- What is the probability that the head of household is a male and has the Bachelor degree?
- What is the proportion of having the Secondary as the highest degree from among females?
- Do you think that the events "gender of household head is male" and "having the Master Degree" are independent?