Will notloading and filtering divs load nyse exchange
Preliminary matters
Input/Output in Pig Latin
Operations
Built-in functions
Foreach
Filter
Group
Order by
Distinct
Joins
Limit and sample
Comments
Comments A = load 'foo'; --this is a single-line comment
/*
This is a multiline comment.*/
Input in Pig Latin
load
By default, load looks for your data on HDFS in a tab-delimited Wle using the default load function PigStorage
Loading data
Output in Pig Latin
store
Pig stores your data on HDFS in a tab-delimited Wle using PigStorage
Storing data
dump
Occasionally you will want to see it on the screen, this is done by the
Pig Latin
OutlinePreliminary matters
Input/Output in Pig Latin
Operations
Built-in functions
Foreach
Filter
Group
Order by
Distinct
Joins
Limit and sample
- |
|
|
---|---|---|
- |
|
|
- | relational operations: |
They allow you to transform the operations such as sorting, grouping, joining, projecting, and Wltering
Mathematical functions
- |
---|
Pig Latin
Outline
Loading and transforming
A = load 'input' as (user:chararray, id:long, address:chararray, phone:chararray,
prices = load 'NYSE_daily' as (exchange, symbol, date, open, high, low, close, volume, adj_close);
gain = foreach prices generate close - open;
gain2 = foreach prices generate $6 - $3;
end = foreach prices generate volume..;
-- produces volume, adj_closeall_in_one = foreach prices generate *;
-- produces a tuple of all fields
Preliminary matters
Input/Output in Pig Latin
Operations
Built-in functions
Foreach
Filter
Group
Order by
Distinct
Joins
Limit and sample
Pig Latin
OutlinePreliminary matters
Input/Output in Pig Latin
Operations
Built-in functions
Foreach
Filter
Group
Order by
Distinct
Joins
Limit and sample
In SQL GROUP BY clause creates a group that must feed directly into one or more aggregate functions
In Pig Latin there is no direct connection between functions
Group
Group supports multiple keys
grpd: {group: (exchange: bytearray,stock: bytearray),daily: {exchange: bytearray, stock: bytearray,
date: bytearray,
dividends: bytearray}}
|
---|
Pig Latin
OutlinePreliminary matters
Input/Output in Pig Latin
Operations
Built-in functions
Foreach
Filter
Group
Order by
Distinct
Joins
Limit and sample
Loading and sorting
daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray, date:chararray,
Pig Latin
Outline
daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray);
uniq = distinct daily;
Preliminary matters
Input/Output in Pig Latin
Operations
Built-in functions
Foreach
Filter
Group
Order by
Distinct
Joins
Limit and sample
In a join, the Wrst (second) dataset speciWed is called as the left (right) entity or data set
- |
---|
Inner Join
Inner join
Right Outer Join
Right outer join
Pig Latin
Outline
limitstatement allows for the restriction of an output to a given number of the Wrst records
The following returns at most 10 lines, i.e., the Wrst 10 records
results
samplestatement allows for the restriction of an output to a given percentage of the total number of records