• +1-617-874-1011 (US)
  • +44-117-230-1145 (UK)
Live Chat
Follow Us:

DS 715 - Assignment 4 Performing OLAP with SQL

Objectives:

  1. Demonstrate use of SQL to perform OLAP operations;

Estimated time to complete: 40 min reading +45-50 min. solve queries Points: 25/100.

Tasks:

In this assignment we are going to use the SQL ROLLUP and CUBE operators to generate and store a three dimensional cube with hierarchies based on the tables in the database given by the description below:

Tb_Supplier(Supp_ID, Name, City, State)

Tb_Consumer(Con_ID, Name, City, State)

Tb_Product(Prod_ID, Name, Product_Category, Product_Line, Product_Packaging)

Tb_Offers(Supp_ID, Prod_ID, Quantity, Price)

Tb_Requests(Con_ID, Prod_ID, Quantity, Price)

Tb_Transactions(Tran_ID, Supp_ID, Con_ID, Prod_ID, Quantity, Price) and represented by the diagram:

DS 715 - Assignment 4 Performing OLAP with SQL

The stored cube will then be used to efficiently solve SQL OLAP queries.

The ROLLUP Operator

ROLLUP and CUBE are SQL extension operators one would place in the GROUP BY clause of an SQL SELECT statement in order to compute and add aggregates to the result set returned by the SELECT statement. The ROLLUP operator is applied to a set of columns that represent a concept hierarchy and performs a rollup on one or more levels of the hierarchy.

For example run the following query against your database:

SELECT State, COUNT(Supp_ID) "#Suppliers" FROM Tb_Supplier

GROUP BY ROLLUP(State)

and compare the result with what is returned by the following simple GROUP BY query:

SELECT State, COUNT(Supp_ID) "#Suppliers" FROM Tb_Supplier GROUP BY State

Carefully comparing the results of the two queries reveals that the ROLLUP query adds an extra row to the result, and that row represents the rollup of the numbers of suppliers per state over all states in the database. This is an example of rolling up by climbing a hierarchy.

Note that the extra row has a NULL value on the column representing the state, i.e. the rolled-up column.

As another, very similar example, compare:

SELECT City, COUNT(Supp_ID) "#Suppliers" FROM Tb_Supplier

GROUP BY ROLLUP(City)

with

SELECT City, COUNT(Supp_ID) "#Suppliers" FROM Tb_Supplier

GROUP BY City

For an example that shows the effect of climbing two levels in a hierarchy compare the result of this query:

SELECT State, City, COUNT(Supp_ID) "#Suppliers" FROM Tb_Supplier

GROUP BY ROLLUP(State, City)

with

SELECT State, City, COUNT(Supp_ID) "#Suppliers" FROM Tb_Supplier

GROUP BY State, City

One can notice that the ROLLUP adds two types of rows to the result of the simple GROUP BY query:

  • Rows with City=NULL and State a state name where the last column represents the number of suppliers in each state – rolling-up cities by state;
  • One row with City=NULL and State=NULL where the last column represents the total number of suppliers – rolling-up states.

Note that in the ROLLUP parameters list the columns are listed in the descending order of the concept hierarchy (State > City). Actually, running the query below will return a meaningless result:

SELECT State, City, COUNT(Supp_ID) "#Suppliers" FROM Tb_Supplier

GROUP BY ROLLUP(City, State)

The CUBE Operator

The CUBE operator is applied on a number of attributes called the cube dimensions and calculates aggregates on all possible subsets of its dimensions. This is an example of rollups by dimension reductions. For example a cube on the supplier, consumer and product dimensions is the equivalent of the union of the results from 8 different GROUP BY queries.

Run the following query and carefully observe the result it returns:

SELECT DISTINCT S.Name "Supplier Name",

C.Name "Consumer Name",

P.Name "Product Name",

SUM(Quantity) "Total Transactions Quantity",

COUNT(Tran_ID) "Number of Transactions"

FROM Tb_Supplier S, Tb_Consumer C, Tb_Product P, Tb_Transactions T

WHERE S.Supp_ID=T.Supp_ID AND

C.Con_ID=T.Con_ID AND

P.Prod_ID=T.Prod_ID

GROUP BY CUBE(S.Name, C.Name, P.Name)

On your sample database this cube returns a total of 396 rows.

There are 8 kinds of rows in this result, and each kind represents a cuboid:

  • One row where "Supplier Name"=NULL, "Consumer Name"=NULL, "Product Name"=NULL, and the last two columns contains the grand totals of their corresponding aggregates – this is a rollup by reducing dimensions: supplier, consumer and product and represents the apex node in the lattice of cuboids;
  • Rows where "Supplier Name"=NULL, "Consumer Name"=NULL, while "Product Name" has the name of a product and the last two columns represent aggregates over each product – these rows represent rollup by reducing dimensions: supplier and consumer; - . . .
  • Rows where "Supplier Name", "Consumer Name", and "Product Name" have all non-null values – this is similar to a GROUP BY query on the 3 dimension columns, and represent the base node in the lattice of cuboids, i.e. zero dimensions are reduced.

If we want a group of columns to be handled as one single column by the CUBE operator we would place those columns within another set of parenthesis.

In the query example below the group of S.State, S.City, S.Name columns, and group of C.State, C.City, C.Name columns are handled as one single column each.

The query produces a result with 433 rows of the same 8 types as the first CUBE query.

Run the following query and carefully observe the result it returns:

SELECT DISTINCT S.Name "Supplier Name",

S.City "Supplier City",

S.State "Supplier State",

 

C.Name "Consumer Name",

C.City "Consumer City",

C.State "Consumer State",

P.Name "Product Name",

SUM(Quantity) "Total Transactions Quantity",

COUNT(Tran_ID) "Number of Transactions"

FROM Tb_Supplier S, Tb_Consumer C, Tb_Product P, Tb_Transactions T

WHERE S.Supp_ID=T.Supp_ID AND

C.Con_ID=T.Con_ID AND

P.Prod_ID=T.Prod_ID

GROUP BY CUBE((S.State, S.City, S.Name),

 (C.State, C.City, C.Name),

P.Name)

Notice that in all rows where "Supplier Name” or "Supplier City" or "Supplier State" is NULL the other two will also be NULL.

Same is true for "Consumer Name", "Consumer City" and "Consumer State".

After all this is still a 3 dimensional cube.

Indeed this cube contains rollups on supplier state and consumer state, but not on supplier or consumer name or city.

Cubes with Hierarchies

The columns "Supplier Name", "Supplier City" and "Supplier State", and "Consumer Name" with "Consumer City" and "Consumer State", represent two concept hierarchies in the supplier and consumer dimensions. Adding the missing rollups to the cube above is straightforward and can be done by adding ROLLUP operator for the two concept hierarchies to the cube calculation as below (notice the parameter attributes are listed in the descending order of the concept hierarchy – higher level concepts first):

SELECT DISTINCT S.Name "Supplier Name",

S.City "Supplier City",

S.State "Supplier State",

 

C.Name "Consumer Name",

C.City "Consumer City",

C.State "Consumer State",

P.Name "Product Name",

SUM(Quantity) "Total Transactions Quantity",

COUNT(Tran_ID) "Number of Transactions"

FROM Tb_Supplier S, Tb_Consumer C, Tb_Product P, Tb_Transactions T

WHERE S.Supp_ID=T.Supp_ID AND

C.Con_ID=T.Con_ID AND

P.Prod_ID=T.Prod_ID

GROUP BY CUBE((S.State, S.City, S.Name),

 (C.State, C.City, C.Name),

P.Name),

ROLLUP(S.State, S.City, S.Name),

ROLLUP(C.State, C.City, C.Name)

The result is a 3 dimensional cube with two 3-level hierarchies producing on your database a total of 1414 rows which can be placed in 32 cuboids (4*4*2=32).

Notice we will have rows where:

"Supplier Name" "Supplier City" "Supplier State"

NULL - - --rollup supplier name

NULL NULL - --rollup supplier name

 and city

and

"Consumer Name" "Consumer City" "Consumer State" 

NULL - - --rollup consumer name NULL NULL - --rollup consumer name

 and city

and all their combinations, but never any of the patterns below:

"Supp/Cons Name" "Supp/Cons City" "Supp/Cons State"

NULL - NULL - NULL NULL - - NULL

Notice that calculating a cube on all the attributes, as in:

GROUP BY cube(S.State, S.City, S.Name, C.State, C.City, C.Name, P.Name)

would generate a result with 9108 rows on our sample database.

This 7 dimensional cube would contain many meaningless rows and would waste a lot of space while slowing down queries on the cube.

Cube Queries

Most OLAP queries are retrieving the rows of one single cuboid in the multi-dimensional cube or a sub-set of rows from one cuboid if slice and dice operations are involved.

Given a pre-calculated cube the rows of each cuboid are easily identified by a specific combination of NULL values within their columns.

Therefore, one can easily select the rows of a cuboid or any subset of it by a simple SELECT-FROM –WHERE statement, which is much more efficient than a multi-join GROUP BY query needed to extract the same rows directly from the data warehouse tables.

To calculate and store the sample cube with hierarchies from the previous section simply use the statement:

SELECT DISTINCT S.Name "Supplier Name",

S.City "Supplier City",

S.State "Supplier State",

C.Name "Consumer Name",

C.City "Consumer City", C.State "Consumer State",

P.Name "Product Name",

SUM(Quantity) "Total Transactions Quantity",

COUNT(Tran_ID) "Number of Transactions"

INTO Tb_Transactions_Cube

FROM Tb_Supplier S, Tb_Consumer C, Tb_Product P, Tb_Transactions T

WHERE S.Supp_ID=T.Supp_ID AND

C.Con_ID=T.Con_ID AND

P.Prod_ID=T.Prod_ID

GROUP BY CUBE((S.State, S.City, S.Name),

 (C.State, C.City, C.Name),

P.Name),

ROLLUP(S.State, S.City, S.Name),

ROLLUP(C.State, C.City, C.Name)

which creates the Tb_Transactions_Cube table and populates with the rows of the cube.

It is now easy to write queries against this cube.

The grand total aggregates, i.e. the apex cuboid, is in the one row where all columns except the aggregate ones are NULL:

--apex cuboid

SELECT *

FROM Tb_Transactions_Cube

WHERE "Supplier Name" IS NULL AND "Supplier City" IS NULL

AND "Supplier State" IS NULL

AND "Consumer Name" IS NULL

AND "Consumer City" IS NULL AND "Consumer State" IS NULL

AND "Product Name" IS NULL

while the base cuboid, which details aggregates by all combinations of consumers, suppliers and products, is returned by:

--base

SELECT *

FROM Tb_Transactions_Cube

WHERE "Supplier Name" IS NOT NULL AND "Supplier City" IS NOT NULL

AND "Supplier State" IS NOT NULL

AND "Consumer Name" IS NOT NULL AND "Consumer City" IS NOT NULL

AND "Consumer State" IS NOT NULL AND "Product Name" IS NOT NULL The aggregates by products are given by the rows where product name is not NULL:

--aggregates by products - rollup supplier and consumer dimensions, 

--hierarchies included

SELECT *

FROM Tb_Transactions_Cube

WHERE "Supplier Name" IS NULL

AND "Supplier City" IS NULL

AND "Supplier State" IS NULL

AND "Consumer Name" IS NULL AND "Consumer City" IS NULL

AND "Consumer State" IS NULL

AND "Product Name" IS NOT NULL

The following query will return nothing, because the configuration below violates the hierarchies defined in the Tb_Transactions_Cube (e.g. state cannot be NULL where name is not NULL):

--impossible or non-sense

SELECT *

FROM Tb_Transactions_Cube

WHERE "Supplier Name" IS NOT NULL

AND "Supplier City" IS NOT NULL

AND "Supplier State" IS NULL

AND "Consumer Name" IS NOT NULL AND "Consumer City" IS NOT NULL

AND "Consumer State" IS NULL

AND "Product Name" IS NOT NULL

Other sample queries:

--aggregates by products sold to consumers in Wisconsin?

SELECT "Product Name", [Total Transactions Quantity], 

[Number of Transactions]

FROM Tb_Transactions_Cube

WHERE "Supplier Name" IS NULL

AND "Supplier City" IS NULL

AND "Supplier State" IS NULL

AND "Consumer Name" IS NULL AND "Consumer City" IS NULL

AND "Consumer State" = 'Wisconsin' AND "Product Name" IS NOT NULL

--Quantity of milk sold by each supplier from Wisconsin? --careful [Supplier City] has to be NOT NULL

SELECT [Supplier Name], [Total Transactions Quantity]

FROM Tb_Transactions_Cube WHERE "Supplier Name" IS NOT NULL AND "Supplier City" IS NOT NULL AND "Supplier State"='Wisconsin' AND "Consumer Name" IS NULL AND "Consumer City" IS NULL

AND "Consumer State" IS NULL

AND "Product Name" = 'Milk'

--Quantity of milk sold by each supplier from Wisconsin in the state of --Ilinois?

--careful [Supplier City] has to be NOT NULL

SELECT [Supplier Name], [Total Transactions Quantity] FROM Tb_Transactions_Cube

WHERE "Supplier Name" IS NOT NULL AND "Supplier City" IS NOT NULL

AND "Supplier State" = 'Wisconsin' AND "Consumer Name" IS NULL AND "Consumer City" IS NULL

AND "Consumer State" = 'Illinois'

AND "Product Name" = 'Milk'

Given the Tb_Transactions_Cube stored above solve the following queries:

  • Aggregates by combinations of supplier name and product name?
  • Aggregates by supplier states?
  • Number of transactions between supplier-city-consumer-city pairs?
  • Name of each product sold in Wisconsin and quantity of sales for the product?
  • Quantity of sales aggregated by product and supplier state?
  • Quantity of computer sales aggregated by suppliers in Wisconsin?
  • Quantity of auto sales by each supplier from Wisconsin to consumers in Illinois?
  • Quantity of auto sales by each supplier in Madison to consumers in Illinois?
  • Quantity of each product sold by supplier Bernstein to consumers in Chicago?
  • Quantity of milk sold by supplier Bernstein to consumers in Chicago?
  • (Extra Credit) For each product list quantity sold by suppliers in

Madison to consumers in Chicago versus quantity sold by suppliers in Chicago to consumers in Madison (result columns will be: product name, quantity Madison_Chicago, quantity Chicago_Madison?

Return:

Please post your solutions to the queries above in properly formatted SQL syntax (each SQL clause on its own line and with capitalized keyword and operators) in the D2L dropbox created for this assignment.