Download as:
Rating : ⭐⭐⭐⭐⭐
Price: $10.99
Language:EN
Pages: 19

You are viewing 1/3rd of the document.
Purchase the document to get full access instantly.

Immediately available after payment
Both online and downloadable
No strings attached

Int mappartitionsrdd map counting val counts pairs

Resilient Distributed Data Sets (RDDs)
DataFrames
SQL Tables/View
Datasets

RDD is characterized by the following properties

- A list of partitions

SpeciXcation of custom partitions may provide signiXcant performance

improvements when using key-valueRDDs

Resilient Distributed Data Sets (RDDs)

val rdset = spark.sparkContext.parallelize(strings);
rdset: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[1] at parallelize at :26

Listing RDD

Creating RDD val lines = spark.sparkContext.textFile("sales.txt")
lines: org.apache.spark.rdd.RDD[String] = sales.txt MapPartitionsRDD[1] at textFile at :24

Listing RDD lines.collect()
res0: Array[String] = Array(bolt 45, bolt 5, drill 1, drill 1, screw 1, screw 2, screw 3)

Introduction to Spark
Outline

Resilient Distributed Data Sets (RDDs)
DataFrames
SQL Tables/Views
Datasets

DataFrames

A DataFrame can be created in the following way

dataFrame.show()

+----+-------+
| age| name |
+----+-------+
|null|Michael|
| 30 | Andy |
| 19 | Justin|
+----+-------+

root

|-- age: long (nullable = true)

SQL Tables/Views

| age| name |

+----+-------+

SQL Tables/Views

SQL Tables/Views

CREATE VIEW

CREATE TEMP VIEW just_usa_global AS
SELECT *
FROM flights
WHERE dest_country_name = 'United States'

SQL Tables/Views

Spark SQL view includes three core complex types: sets, lists, and structs

CREATE VIEW

SELECT DEST_COUNTRY_NAME as new_name,
collect_list(count) as flight_counts,
collect_set(ORIGIN_COUNTRY_NAME) as origin_set
FROM flights
GROUP BY DEST_COUNTRY_NAME

internal table in Hive

- Unmanaged table is a table that stores only data, it is equivalent to an external

DEST_COUNTRY_NAME STRING,

ORIGIN_COUNTRY_NAME STRING,

Introduction to Spark
Outline

Resilient Distributed Data Sets (RDDs)
DataFrames
SQL Tables/Views
Datasets

Datasets

A Dataset can be deXned, created, and used in the following way

Using a Dataset caseClassDS.select($"name").show()

Results +-----+
| name|
+-----+
|James|
+-----+

Copyright © 2009-2023 UrgentHomework.com, All right reserved.