• +1-617-874-1011 (US)
  • +44-117-230-1145 (UK)
Live Chat
Follow Us:

MCIS6263 Big Data Assignment

Munge this data

An important part of any data analysis task is to prepare the data for that. This preparation part is essential to insure the quality and suitability of the data for the next steps in the analysis pipeline. This is an exercise for you to practice this critical task.

Objectives:

1- Give the students the chance to understand the data munging concepts

2- Learn to work with a new programming language.

3- Learn to work with some necessary tools for the munging task

4- Have the chance to practically do data munging/wrangling.

5- Understand the different parts of the coded exercise and the ability to work with already existing code, with whatever needed in terms of configuration.

Please visit this link:

https://dzone.com/articles/hands-on-data-wrangling-what-how-and-why

The page has an exercise with several parts/tasks:

  • Exercise 1: Fix Date Formats
  • Exercise 2: Fix Currency Values
  • Exercise 3: Fix Java Log4j Log File With Exception Stack Traces
  • Exercise 4: Web Scraping Top 50 Pop Songs in the Last Decade
  • Exercise 5: Web Scrape Using OpenRefine, R

What you should submit for this:

A report that shows the following:

  1. First page: The names and Ids of the students in the group
  2. For each task:
  3. Title of the task/exercise
  4. A brief description of each task in your own words.
  • For each task you should describe how you performed the task.
  1. In simple steps
  2. Submit a snapshot/print screen of the first parts of original data in the report.
  3. Then a snapshot/print screen of the data after performing the task.
  4. The snapshot should show parts of your desktop

Notes:

  • The report has to be in pdf
  • Here are the tools you will need to install on your computer: (you can find this list in the link above)

Tool

Version

Download & Install Instructions

Type

R language

3.2.4

https://cran.rstudio.com/

Open Source

R Studio

0.99.887

https://www.rstudio.com/products/rstudio/download/

Open Source

OpenRefine

2.6

http://openrefine.org/download.html

Open Source

Commercial but

Trifacta

3.0.1-

https://www.trifacta.com/trifacta-wrangler

free offering with

Wrangler

client1

limited

functionality