1 OVERVIEW OF THE ASSIGNMENT

This assignment will test your skills of collecting and analysing data to answer a specific business problem. It also gives you the opportunity to apply the theories you have learned in this course such as finding numerical summaries, displaying with appropriate graphs and using statistical inferences to solve business problems, including constructing hypotheses, test them and interpret the findings. You may have to use two Data sets. One Data set will be sent to you via KOI student email individually and you need to find or collect another dataset.

Suppose you are working for an agency who analyse NSW transport system data to make a recommendation to improve public transport system. You will be given series of research questions. Use your knowledge that you gain from this course to answer these questions by displaying appropriate outputs of Excel, StatKey or Wolfram alpha. Use these answers to write an executive summary which might be a valuable recommendation to Transport NSW.

2 TASK DESCRIPTION: WRITTEN REPORT

There are two datasets involved in this assignment: Dataset 1 and Dataset 2, detailed below.

Dataset 1: You will receive an email that contains a dataset that is specifically allocated to you. This dataset is a subset of a data __Opal Tap on and Tap Off Location - 8th to 14th August 2016__ individual sample file, provided by the Transport for NSW Open Data and has been edited to only include a subset of the cases and variables. The original dataset can be obtained from __ https://opendata.transport.nsw.gov.au/dataset/opal-tap-on-and-tap-off __ and it is under the license of __Creative Commons Attribution 3.0 Australia__ . Data dictionary of the edited dataset is given in the following table.

Variable | Description | Values |

mode | Type of the public transport | Bus, Train, Ferry and Light Rail |

date | Date of the tap on/off held | Date/month/year |

tap | It is a tap on or off | On and Off |

loc | Locations of stops. For bus | Postcodes and names of the stations |

postcodes and others name | ||

of the stations | ||

count | Total number tap on or off | Number |

on the certain location and | ||

the certain date |

Dataset 2: Collect data (e.g. via a survey) that will answer research question given in section 3. There is no requirement about the number of variables, sampling methods and sample size, but you need to justify your approaches in Section 1 (see below).

Both datasets should be saved in an Excel file (one file, separate worksheets). All data processing should be performed in Excel or Statkey http://www.lock5stat.com/StatKey/).

Prepare a report in a document file (.doc or .docx) which includes all relevant tables and figures, using the following structure:

Section 1: Introduction

- Give a brief introduction about the assignment and search related article and write a paragraph of summary which supports your assignment. You need to give the full citation of the article.
- Dataset 1: Give a short description about this dataset. Is this primary or secondary data? What are types of variables involved? Explain briefly what are the possible cases used in this study.
- Dataset 2: Explain how you collect the data and discuss its limitation (e.g. whether your sample is biased). Is this primary or secondary data? What is/are the type(s) of variable(s) involved? Give a description of cases you consider for this data set.

Section 2: Analysis of single variable in Dataset 1

- To answer research question “Which type of public transport was most used by the

NSW people during 8th to 14th of August 2016?”, provide a suitable numerical summary and graphical display for the variables mode of Dataset 1. Give a detailed comment to answer the research question.

- Now to answer research question “Are there more than 50% of public transport users in NSW use the particular mode of transport found in Part a?” setup an appropriate hypotheses, perform hypotheses test and answer the research question by writing the conclusion of the test.

Section 3: Analysis of two variables in Dataset 1

NSW Government need to decide on whether they have to build an underground Railway line from either Parramatta, Bankstown or Gosford to central. To prepare a recommendation for this;

- Give a numerical summary and an appropriate graphical display for the variables location, by only considering those three stations; and the variable count by considering the data with trains only.
- Perform a suitable hypothesis test at a 5% level of significance to test whether there is difference between mean counts of taps on and off.
- Use the conclusion of the test in part b and the outputs in part a to write a recommendation to NSW government.

Section 4: Collect and analysis Dataset2

You are interested in finding whether there is a difference in preference between different gender in terms of their transport mode (Bus, Train, Ferry and Light Rail). by considering appropriate number of cases and variable, give a proper graphical display and use it to write a comments.

Section 5: Discussion & Conclusion

Write an executive summary by combining all your findings in the previous sections which must be a valuable recommendation for NSW Transport. Give a suggestion for further research

3 TASK DESCRIPTION: PRESENTATION/INTERVIEW

A presentation/interview for the assignment is scheduled on Week 11, in your allocated tutorial.

You do NOT need to prepare a presentation material (e.g. power-point slides), instead, you will be asked to demonstrate and/or explain how you summarised the data and how you performed the analysis. You may be asked to reproduce what you have made in your written report (e.g. generate a chart or numerical summary using Excel or Statkey).

