• +1-617-874-1011 (US)
  • +44-117-230-1145 (UK)
Live Chat
Follow Us:

NIT 6130 Introduction to Research

Assignment – 4

Experiment Design & Result Analysis

Big Data Predictive Analytics to Overcome Flight Delays

Masters in Applied Information Technology (NMIT)
Victoria University, Melbourne, Victoria

Table of Contents

  1. Collection of data for experiment

1a. Identification and selection of available data sources

1b. Collection of Raw Data

  1. Experiment Design and Implementation

2a. Data pre-processing

2b. Feature Selection or Dimension Reduction

2c. Experiment Design

2d. Experiment Implementation Records

  1. Experiment Result Analysis and Summary
  1. Outline of Experiment and Result Analysis Chapter
  1. Collection of data for experiment

1a. Identification and selection of available data sources

In order to conduct experiment analysis, the available data sources are analysed and collected. The following table gives a brief description of the available data sources.

Data Source Name

Source Organization

Data Description

Data File Format

URL

Charge/ Fee

Target data source

Flight Delay Data 1

Department of Transportation, Washington, United States

Commercial Airline Flight Delay Records in 2015

.csv

https://www.kaggle.com/usdot/flight-delays/data

Free

Yes

Flight Delay Data 2

Bureau of Transportation Statistics

Commercial Airline (US) Flight Delay Records in 2017

.csv

https://www.transtats.bts.gov

/DL_SelectFields.asp?Table_ID=

236&DB_Short_Name=On-Time

Free

Yes

Flight Delay Data 3

Open Flights Airport Database

Select the delay criteria or reasons

.csv

.txt

https://openflights.org/data.html

$50

Yes

Flight Delay Data 4

Data World Organization

Departure Delay Record

.csv

https://data.world/data-society/airlines-delay/workspace/file?filename=airlinedelaycauses%2FDelayedFlights.csv

Free

Yes

Flight Delay Data 5

Bureau of Infrastructure, Transport and Regional Economics

International Airline Activity Link

.csv

https://data.gov.au/dataset/international-airline-activity

Free

Yes

1b. Collection of Raw Data

The relevant data for the experimental purpose is downloaded from the web and saved in a folder called ‘Raw Data’. These files are in the Microsoft Excel (*.csv) format. The details about these records are summarised in the given table.

Data Source Name

Date of Collection

Saved File Location

Saved File Name

Saved File Format

No. of Data Records

Flight Data 1

19/10/2017

C:\Users\LIZA\Desktop\Introduction to Research\Raw Dataset

AirlineDelayCauses.csv

csv

(Excel)

1048576

Flight Data 2

19/10/2017

C:\Users\LIZA\Desktop\Introduction to Research\Raw Dataset

Delya_T_Ontime.csv

csv

(Excel)

450018

Flight Data 3

21/10/2017

C:\Users\LIZA\Desktop\Introduction to Research\Raw Dataset

Flights.csv

csv

(Excel)

1048500

Flight Data 4

22/10/2017

C:\Users\LIZA\Desktop\Introduction to Research\Raw Dataset

PredictingAirlineDelays.csv

csv

(Excel)

560002

Flight Data 5

22/10/2017

C:\Users\LIZA\Desktop\Introduction to Research\Raw Dataset

InternationalAirlineActivity.csv

csv

(Excel)

402050

  1. Experiment Design and Implementation

2a. Data pre-processing

Huge amount of raw data is available for the research experiment. All this data cannot be utilised for the experimentation. Therefore, this collection of data needs to be pre-processed to conduct the experiment.

2b. Feature Selection or Dimension Reduction

The entire data collection files consist of multiple data features. Not all of them are relevant to the experimental process. So, few fields have been eliminated from the existing records and new files are updated accordingly. The dimensionality of the collected data is reduced in order to simplify data processing during experiment analysis. The new result data set are recorded in the following sample table.

Date

Data Source Name

Purpose of Pre-processing

Pre-processing Method

No. of Original Data Records

No. of Result Data Records

No. of Original Features

No. of Result Features

New Data File Name

23/10/17

Flight Data1

Featured Selection

Manual data processing

1048576

2000

46

20

AirlineDelayCauses_Updated.csv

23/10/17

Flight Data 2

Clean the missing data

Pre-fill the missing values

450018

4000

32

15

Delya_T_Ontime_Updated.csv

23/10/17

Flight Data 3

Discard data that is more than 5 years old

Manual data processing

1048500

2000

30

15

Flights_Updated.csv

23/10/17

Flight Data 4

Report-making followed by better analysis

Automated data processing using Excel features

560002

2000

35

15

PredictingAirlineDelays_Updated.csv

23/10/17

Flight Data 5

Featured Selection

Manual Data Processing

402050

3000

28

15

InternationalAirlineActivity_Updated.csv

2c. Experiment Design

Date

Experiment

Purpose of Experiment

Description of Procedure

Input Data

Expected Output

Result File Format

24/10/2017

Experiment 1

Evaluate Method 1

The Ground Delay Program (GDP) Procedure

Historical data and weather information using Map Reduce

A join key and table tag

Output1.csv

24/10/2017

Experiment 2

Evaluate Method 2

Regression Prediction Mechanism

Database input to Naive Bay’s Algorithm

Result for the prediction of departure delays

Output2.csv

24/10/2017

Experiment 3

Evaluate Method 3

Flight delay propagation and Delay probability distribution

The itineraries of passengers who have missed

a flight. Reschedule_Pax algorithm

new passenger itineraries

Output3.txt

24/10/2017

Experiment 4

Evaluate Method 4

Heuristic algorithm – Schedule Minimization for Passenger trip delay

The flight schedule

Itineraries- Regression Based Algorithm

The updated flight schedule

Output4.txt

2d. Experiment Implementation Records

A basic and simple delay model can be built with the help of Empirical Cumulative Distribution Model. The Kernel Density Estimation method is a basic function of the programming language that will be used. A Map-Reduce algorithm will be used that will split the input data set into individual chunks which will be processed be the map tasks in a completely parallel manner. The Linear Regression Model of the average daily delay analyzes the effects of arrival delay, airport capacity, traffic congestion and weather conditions.

  1. Experiment Result Analysis and Summary

After conducting the aforementioned experiments, there are certain results that are desired to be obtained. They are analysed as below –

  • Flight delays are one of the major causes of Total Passenger Trip Delay (TPTD). Other passenger trip delays are due to either missed connections or flight cancellations.
  • Airline network design also has a significant impact on the trip delay of passengers. The gap between direct and connected itineraries, frequency of the service, time wasted between banks at the hubs, aircraft size selection and target load factor also play a major role to determine the trip reliability of passengers.
  • Flight delay caused due to bad weather conditions should be forecasted much before the scheduled flights so that necessary alternative arrangements can be made. The passengers can also be advised in advance about the future delay in their trip so that they can also plan their journey accordingly. This will reduce chaos among the passengers.
  • The delays that passengers experience because of the late or diverted flights can be minimised. The passengers are affected by the trip delay because of the cancelled flights, missed connections or boarding issues. To avoid such situations a new flight can be implemented in order to avoid delay in further flights. But this will increase the cost to the airlines effectively. If the frequency of the flight decreases, then the load factor of the re-booked flights increases. The experimental result can minimize the trip delay of passengers by either rescheduling passengers on the next flight that were late from connecting flights or by holding the next flight until the passenger arrives. The second case may eventually delay all the other passengers and their connecting flights.
  1. Outline of Experiment and Result Analysis Chapter

4.1 Data Analysis

4.1.1 Data Pre-processing and Transformation

4.1.2 Target Data Creation

4.1.3 Model descriptions and variables

4.1.3.1 The training dataset

4.1.3.2 Decision Trees

4.1.3.3 Random Forest Model

4.2 Delay Prediction

4.2.1 Classification Technique

4.2.2 Hadoop MapReduce

4.3 Analysis of Variance (ANOVA) on the average daily arrival delay

4.3.1 ANOVA test on seasonal pattern

4.4 BRYAGH: Basic Reduction Yare Approach for Flights

4.4.1 BRYAGH Algorithm

4.4.2 Pseudo code

4.5 Conclusion

Resources

  • 24 x 7 Availability.
  • Trained and Certified Experts.
  • Deadline Guaranteed.
  • Plagiarism Free.
  • Privacy Guaranteed.
  • Free download.
  • Online help for all project.
  • Homework Help Services

Testimonials

Urgenthomework helped me with finance homework problems and taught math portion of my course as well. Initially, I used a tutor that taught me math course I felt that as if I was not getting the help I needed. With the help of Urgenthomework, I got precisely where I was weak: Sheryl. Read More