•   +1-617-874-1011 (US)
  •   +44-117-230-1145 (UK)
  • CS570 Assignment - MapReduce for Flight Cancellation

    Requirements:

    This assignment uses on-time performance of airline flights data from the Research and Innovative Technology Administration, Bureau of Transportation Statistics (RITA) to determine the number of flights cancelled, grouped by flight cancellation reason. The data is available for download and analysis at http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=OnTimeYou can download lookup data related to cancellation codes and descriptions at http://www.transtats.bts.gov/Download_Lookup.asp?Lookup=L_CANCELLATION Tasks: 

    • Download the prezipped on-time performance file for January 2014 and upload the unzipped CSV file to your project input folder 
    • Also download lookup data related to cancellation codes and descriptions
    • Create a new Maven Java project for this assignment, and then create several map reduce classes to accomplish the following tasks:

    Create a job to get cancelled flights where Cancelled is 1

    Create a job to group the data by the cancellation code, and count the number of flights in each group

    Create a job to sort the data set by Year, then Carrier, and then FlightNum. And also use Month to partition them into different reducers 

    Create a job to join Cancellation Codes with Cancelled flight data set