ITECH1103- Big Data and Analytics Lab 3 – Working with Data Items Federation University Australia
Objectives
* Creating, testing, and comparing models can be accomplished with SAS Visual Statistics andSAS Visual Data Mining and Machine Learning.
SAS Visual Data Studio uses a CAS table as input and creates a CAS table as output.SAS Visual Analytics uses a CAS table as input and creates a report that can be viewed in the Report Viewer or Mobile BI app. Any changes to data made in Visual Analytics apply to the report only and do not affect the CAS table.Beginning with Visual Analytics 8.3, report data views can be created to save and apply settings for a data source. A data view acts as a template for any settings that are modified, including data property changes, data source filters, hierarchies, geography data items, calculated items, and more. A data view does not update the CAS table. If the view is updated, your reports are not automatically updated with the new settings.Data views are saved separately from your reports. If you create a data view in one report, you can apply it to other reports that use the same data source.Data views can be shared so other users can also apply them to the data source.A data source can have a default view as set by an administrator. You can also set the default view for yourself. A default data view is automatically applied anytime you add the data source to a report.For more information about data views, see “Working with Data Views in Reports” in the SAS® Visual Analytics 8.3: Working with Report Data documentation.
Data Item Properties
In the Data pane, an icon next to each data item indicates the type of data item. The following types of data items are available:
Category: A data item whose distinct values are used to group and aggregate measures.
Date and Time: A category data item whose distinct values are used to group and aggregate measures. There are three types of date categories: date, datetime, and time.
Custom Category: A data item that can be created based on either a category or numeric data item. A custom category is always a category data item with alphanumeric values.
Calculated (category): A data item that is calculated from existing data items using an expression and returns an alphanumeric value.
Calculated (datetime): A data item that is calculated from existing data items using an expression and returns a datetime value. Calculated dates and times are treated as categories with distinct values being governed by the chosen date or time format.
Geography: A category data item whose values are mapped to geographical locations or regions. These data items can be used to show data on a geographic map.
Hierarchy: A data item with a predefined arrangement of category data items, typically whose values are arranged with more general information at the top and more specific information at the bottom. The first level of the hierarchy is known as the root level.
Geographic Hierarchy: A hierarchy whose members are all geographic data items.
Interaction Effect : A user-created data role that can be used when there is a non-additiverelationship between two variables (the effect of one variable on a model changes as another variable changes). SAS Visual Statistics must be licensed for you to create and use an interaction effect.
Measure: A data item whose values can be used in computations. These values are numeric. By default, almost all measures have a default aggregation of Sum, but the aggregation can be modified.
Calculated (measure): A data item that is calculated from existing data items using an expression and returns a numeric value. Numeric data items are treated as measures (with an aggregation type of Sum), or they can be changed to category data items.
Frequency: A measure data item whose value represents the number of observations in the selected data source. This data item is automatically added to the Data pane under the Measure group. You cannot change the classification for this data item. This data item is automatically assigned to some report objects when no measure is assigned.
Frequency Percent: A measure data item whose value represents the percentage of observations in the selected data source. This data item is automatically added to the Data pane under the Aggregated Measure group. You cannot change the classification for this data item.
Aggregated Measure or Time Period Calculation: A data item that represents special predefined operations, like distinct count, percentage of totals, percentage of subtotals, or frequency percent. Users can also create their own aggregated measure calculations. Aggregated measures cannot be used in all report objects, filters, controls, spark lines, or time series graphs. Some aggregated measures cannot be used in a detail rank. Percentage of subtotal items can be used only in a crosstab.
Working with Data Items
This demonstration illustrates how to modify data item properties (name, format, aggregation) in Visual Analytics.
1.From the browser window, select SAS Drive from the bookmarks bar.
2. If necessary, clickand select New Report in the top left corner of SAS Drive.
3.Click Data.
4.In the Open Data Source window, select Data Sources.
5. Double-click cas-shared-default.
6. Double-click Public PATH.
7. SelectCUSTOMERS_CLEAN.Note: If CUSTOMERS_CLEAN does not exist, then right-click CUSTOMERS_CLEAN.sashdatand select Load.
8.ClickOK.The Data pane is displayed, and it contains a list of data items from the CUSTOMERS_CLEANtable.
9. If necessary, click the Data icon in the left pane.
10.Verify that Customer ID and Order ID appear in the Category group, because the data type waschanged to character in SAS Data Studio.
Note: Character and datetime data items appear as categories in Visual Analytics.
11.Verify that the new columns created in SAS Data Studio (Customer_FirstName,Customer_LastName, and Title) appear in the Category group.
12.Verify that the new columns created in SAS Data Studio (Days to Delivery and Profit) appear inthe Measure group. Note: Numeric (double) data items appear as measures in Visual Analytics
Note: Cost and Retail Price were renamed in SAS Data Studio to Unit Cost and Total Revenue, respectively. Those new names are not reflected because Visual Analytics displays labels, not data source names.
13.Modify properties for a data item, Date Order was Delivered.
14. Modify properties for a data item,Profit.
15.Modify properties for a data item, Discount in percent of Normal Total Retail Price.
16.Modify the aggregation for a data item, Days to Delivery.
17.Rename data items.
18.Create a data view.
19.Save the report.
Business Scenario: Employees
Exercise
1.Working with Data Items
Old name |
New name |
Employee ID |
ID |
Employee Name |
Name |
Manager at 1. level |
Manager ID |
Frequency |
Number of Employees |
Exploring Data with Charts and Graphs
Objectives
Discuss when to use descriptive graphs (histogram, box plot, bar chart) in Visual Analytics.
Maximize graphs objects to view details. • Modify roles and options for graph objects.
Co p yrigh t © SAS In sti tu te In c. All righ ts reserved .
Objects: Graphs (Descriptive)
Histogram Box plot |
The histogram contains a series of bars that represent the number of observations (or percentage of all observations) for a measure that fit in a specified value range (or bin). The shape of the distribution can be affected by the number of bins specified for the histogram. Note: If you use the default number of bins, then the minimum and maximum values on the histogram might not match your actual data values. However, if you specify the number of histogram bins, then the minimum and maximum values on the histogram match your actual data values exactly. |
The size and location of the box indicate the range of values between the 25th and 75th percentile (or the interquartile range). The diamond marker inside the box indicates the mean value, and the line inside the box indicates the median value. You can modify options to display outliers in the plot. Outliers are data points whose distance from the interquartile range are more than 1.5 times the size of the interquartile range. The whiskers (lines protruding from the box) can indicate either minimum and maximum values of the plot or the range of values outside of the interquartile range but close enough not to be considered outliers. If there are a large number of outliers, the range of outlier values is represented by a bar colored to represent the number of values inside the outlier range (as seen above). |
Bar chart |
A bar chart displays data aggregated by the distinct values of a category. By default, the bars are sorted by descending order of the value of the first measure. For ranked bars, the data is sorted based on the values of the rank. Stacked bar charts enable you to compare totals for each category, as well as totals for all categories. However, comparing segments is difficult, and when there are many segments in the chart, it is difficult to read. To see relative differences (parts of a whole) in a bar chart, select Normalize groups to 100% for the Group scale option. Note: Nominal values are categories whose data has no particular order. |
Exploring Data: Part 1
This demonstration illustrates how to use the automatic chart to explore data and modify roles and options for charts and graphs in Visual Analytics.
The Welcome to SAS Visual Analytics window appears.
The automatic chart functionality determines the best way to display the selected data.
2) Enter Distribution of Profit in the Name field.
Note: The Automatic title setting was turned off for Graph objects in an earlier demo. You can turn it on for this graph by selecting Automatic title, or you can create a custom title by selecting Custom title.
A majority of the products ordered are low profit items, in the $0 to $25 range. Also notice that a little more than 20% of items result in a loss. Why is this problem occurring? Are these products ordered from a similar product area, geographical area, or order type? Could the costs be too high in these areas? What can we do to reduce costs?
Profti. The Roles pane should resemble the following:
Note: The Measures role is required for the crosstab object. The crosstab should resemble the following:
Profits are much lower in the internet and catalog channels. A company-wide policy mandates that we need to try to improve profits for orders through these channels.
By default, totals are added to rows and columns.
The updated crosstab should resemble the following:
The updated crosstab should resemble the following:
Profits are much lower in North America than in Europe. Because our corporate office is located in North America, we would expect higher profits. Also notice the loss in Africa for internet sales. Why is this loss occurring? Is this due to start-up operations (for example, building distribution facilities in Africa)? Are the losses consistent over time or has this changed over time?
The bar chart should resemble the following:
The bar chart has many more roles available.
The Options pane should resemble the following:
The updated bar chart should resemble the following:
Profits in North America are less than half of total profits in Europe. We need to understand why this discrepancy exists and try to improve profits in non-European countries.
The Outline pane displays a list of all pages and objects in the report.
Exercise
Annual Salary
Frequency Percent
Name |
Distribution of Salary |
Bin range |
Measure values |
Set a fixed bin count |
<selected> |
Bin count |
4 |
The automatic chart should resemble the following:
e. Maximize the histogram to answer the following question: Into which range do a majority of salaries fall?
Hint: After answering the question, click (Restore) in the upper right corner.
f. Add a bar chart on the right of the automatic chart by assigning the following data items to the specified roles:
Category |
Job Title |
Measure |
Annual Salary |
Group |
Department |
g. Specify Total Salary by Job and Department as the name of the bar chart. The bar chart should resemble the following:
h. Answer the following questions: In which department are a majority of our salary costs spent? For which job title?
Answer:
Why do you think salary costs are so much higher for this group?
Answer:
The final report should resemble the following:
i. Save the report.
j. Sign out of Visual Analytics.
Exploring Data: Part 2
This demonstration illustrates how to use box plots to explore data in Visual Analytics.
The Welcome to SAS Visual Analytics window appears.
c. In the right pane, click the Roles icon.
d. For the Category role, select Add ¢ Order Type.
e. For the Measures role, select Add ¢ Profit and click OK.
The Roles pane should resemble the following:
The box plot should resemble the following:
f. In the right pane, click the Options icon.
g. If necessary, expand the Object section.
h. Enter Profit by Order Type in the Name field.
i. In the Box Plot group, select Ignore Outliers for the Outliers field.
j. Select the check box for Averages.
The box plot should resemble the following:
k.In the upper right corner of the box plot, click(Maximize) to view additional details. The table of data values displays descriptive statistics for Profit for each order type.
Even though total profits are highest for the retail sales channel, averages across all channels are very similar, but are a bit higher for catalog and internet sales. This reinforces our company-wide policy to try to increase profits in these channels. Total profits might be higher in retail because there are more customers or more orders for that channel.
l. In the upper right corner, click (Restore).
m. In the upper right corner of the Profit by Continent box plot, click (Maximize) to view additional details. The table of data values displays descriptive statistics for Profit for each continent.
Even though total profits are highest for Europe, averages are higher in North America and Asia. Because our corporate office is located in North America, we will start by focusing on increasing profits in North America. Total profits might be higher in Europe because there are more customers or more orders for that continent. Also, note the negative average profits in Africa. Why is this occurring? What can we do to increase profits for that continent?
n. In the upper right corner, click (Restore).
9. In the upper right corner, click (Menu) and select Save.
10. Select Email¢ Sign Out in the upper right corner to sign out.
3. Exploring Data: Part 2
a. Open the browser and sign in to Visual Analytics using Eric’s credentials.
b. Open the VA1- Exercise3.2b report from the Shared Data/Basics/Exercises (HR)folder.
c. On Page 2, create a box plot by assigning the following data items to the specified roles:
Category Measures |
Job Title |
Annual Salary |
d. Modify the following options for the box plot:
Name |
Salary Analysis by Job Title |
Outliers |
Show Outliers |
Show averages |
<selected> |
The box plot should resemble the following:
e. Maximize the box plot to answer the following questions:
Which job title has the highest average salary? The lowest?
Answer:
Orion Star has had a great sales year and would like to promote some employees. With which job title would you recommend starting the promotion analysis? Why?
Answer:
Hint: After answering the question, click (Restore) in the upper right corner. f. Save the report.
g. Sign out of Visual Analytics.
Follow Us