Worst are common the star rating comments remove the stop words such the
In the report, you need to explain how you design the PySpark programme for each problem. You should include following sections:
1) The design of the programme.2) Experimental results, 2.1) Screenshots of the output, 2.2) Description of the results.
Here are some examples of probably fake comments (e.g., ”GREAT”) and their corre-sponding ratings (e.g., 5 Star) in our data set:
6^220^Five Stars^2016-01-09^false^ Quality product.^5.00 6^221^Five Stars^2016-01-09^false^ Great quality.^5.00 6^222^Five Stars^2015-11-25^false^ Excellent^5.00
6^223^Five Stars^2016-01-14^false^ GREAT^5.00
5 levels of rating here where 1 star rating represents the worst experience and the 5 star rating represents the best experience. Hint: you can remove punctuation in each comment with the following code:
1 star rating: average length of comments __ 2 star rating: average length of comments __ 3 star rating: average length of comments __ 4 star rating: average length of comments __ 5 star rating: average length of comments __
2. (7.5 points) Top words
$ spark-submit 2-wordranking.py
top 10 common words
1 star rating : __ __ __ ...
2 star rating : __ __ __ ...
3 star rating : __ __ __ ...
4 star rating : __ __ __ ...
5 star rating : __ __ __ ...