• +1-617-874-1011 (US)
  • +44-117-230-1145 (UK)
Live Chat
Follow Us:

Sample Assignment On IT18 Decision Support Systems and Data Warehousing

Designing

We have heard many times in the news, blogs, social media and internet magazines about data warehouse. But what does that mean, this thing is taking a life of its own. Big Data is the new force when it comes to storage technology because it breaks data storage free from structures. Big data and data warehouse are integral parts of the modern data management strategy. Data warehouse is the central location where the data is stored and managed to all the organizations applications. The data that these companies have to store are so big that they will run out of space eventually. Data warehouse is centralized, dynamic and it involves the process of collective and saving data but is not a massive database which means that the company does not have to invest in an expensive mind blowing equipment and technologies. There is this mentality that we have to remove ourselves from and that is it think outside the box, since every company is different their needs have to be different as well.

One of the benefits of modern data warehouse is that this is not another database, instead is a component that you will open and close as necessary to save and retrieve data. This is beautiful because you can mix brands and components to adjust your needs. To integrate this into single source, traditional data warehouse was in one place in multiple storage volumes but the modern systems are equipped with interfaces to the DW systems. There are specialized packages that have their own back-end databases in storage for each specific use and purpose.

The goal of data warehouse is enable the user’s quick and reliable access to a view of the organization data, support forecast for better informed decision making process at the lowest level. The information needs to be reach out the information in a constant matter, the information need to be single and consistent with was has been saved. This data is read-only but able to adapt to changes because of the rules that will equal a change overtime and the data warehouse needs to be resilient to those changes.

What type of data resides in a data warehouse?

A data warehouse is a collection of data that's:

Separate from operational systems

Accessible and available for queries

Subject-oriented by business

Integrated and consistently named and defined

Associated with defined periods of time

Static (non-volatile); meaning that updates aren't made

(Walters, 2016)

To have a successful warehouse data you need to have a specific software. The first thing to always consider is budget, so with this in mind we will squeeze the research until you can find the right product for your business need. To make a selection we have to consider:

  • Scalability – Can it grows and change as the company changes?
  • Parallel Processing Support – this is systems with more than one CPU and Cloud storage.
  • Hardware combination – hardware that is platform dependent and independent to avoid crashes and being able to bug fixes.
  • Fast Track Data Warehouse Sizing tool – it will help determine what you need.
  • Number of Users
  • Types of Queries
  • Amount of Data
  • Disk Subsystem – 25,600 megabytes per second.

In terms of software there are some that we will consider:

  • Ab Initio Software – help companies in data analysis parallel processing. I consider this software because it specializes in high volume data processing and enterprise data application integration.
  • Amazon Redshift – hosted data warehouse product, cloud platform. I consider this software because of the data cloud.
  • Analytix DS – is a vendor specialized in mapping and tools for data Integration. I consider this platform because there are many government entities that use this type of software and we are mandated-as government to look at least one that is already successful in the DOD.
  • IBM – Combines data, analytics and AI technology with expertise to adapt to the company architectures. It has cloud, hybrid development and address any problem that will help re-evaluate and find the value of the data. I consider IBM because the proposed change is a challenge and the Department of Defense has relied on IBM for decades.
  • SQL Server – Is a combination of integrated tools to help with production systems. It includes OLTP and in memory Columnstore which includes column compression to reduce the storage footprint and improve query. I considered SQL because is open source relational database that operates in Windows and Linux.
  • SAP – System Applications and Products. This system allow the business to track customers and business interactions. It runs ERP and data management. I considered SAP because it has a combination of many software’s, cloud based, ERP, manages inventories, orders and customer service.

To have success in your data warehouse elements we need to synchronize the elements on each step of the project, automation will help the business and technology to integrate in an effective way. Your Data warehouse has to offer value for your money. It needs to help reduce the development time and increase the potential of the value. We need to establish success by developing a timeframe and environment that targets success. This product will help in implementation of best practices by standardizing the approach in the developmental process, the data lineage and automatization is a must. Maintaining and modifying data warehouse will improve the development seamless and that will extend far beyond the initial period to avoid time waste. And finally the elimination of manual effort in the design, build and administration process.

For the purpose of this project I will be developing a data warehouse for the Global Combat Support System Army (GCSSA system). I will use SAP because is a state of the art software that will provide oversight of logistic and finance systems as an automated combat enabler for Soldiers embedded in the DOD financial system. This will help provide highly accurate cost management and material support handling. I will use the web-based system to make it accessible from virtually any computer and any personnel that possess a Common Access Card (CAC).

Planning

Sharing information is has been linked with databases as long as they have been systems development. At this point of time the sharing of information needs to be immediate, efficient and secure, but in all the databases within the enterprise to retrieve the data effectively requires a combined and coordinated effort between the systems. There is a need to have one locations for storage and sharing of data instead of trying to link the multiple databases that exists today and this is how data warehouse gets in the game.

For business analysts data warehousing is a dream come true. A place where all the information and activities are joined together and you only need a set of analytical tools.But in order to accomplish this dream you need to plan a successful data warehouse system. The purpose of the data warehouse system is to help providing accurate and timely information to determine the best path to take in the decision making process. There are seven basic steps in order to plan, design and setup a successful data warehouse.

  1. Determine Business Objectives – What do we need to accomplish? Making sure that the key-decision makers can be able to access the information and how the external forces will impact the internal ones. Quantitative measurements of the business specific activities will help in the guide of the organization. KPIs will help measure the activities with the fact tables.
  2. Collect and Analyze information – Asking questions helps leaders to acquire different views of the data to help the decision making process. Data sources are somehow simple yet it can provide CRM which is time reporting and performance evaluations. This will help to create reports that sometimes are overlooked with simple software or the information form spreadsheets and memos, like telephone calls, shipping and deliveries, all the information has purpose.
  3. Identify Core Business Process – Now the processes on the business need to correlate. KPIs and CRM are taking us in the right path. Since data warehouse is interrelated in the data structure every structure will have KPIs for the specific business processes and correlates those indicators and the factors that generated each one of them. In order to design a successful structure for business process there is a need to identify the entities that work together and relate it to the entity that generated it. Then the KPIs are gathered as facts into dimensions tables, related to the dimension, then the attributes are added to the next fact table that will combine facts with related dimension into virtual cubes.
  4. Construct a Conceptual Data Model – the business process is identified to be customized for the business by creating a conceptual model for the data. The fact tables and dimensions related to the facts that will identity the KPIs that will be combined together to form OLAP cubes in a constant unit of measure. Then the rows in the fact tables are generated by the interactions of the selected entities, the dimensions and activities need to be correlated. Now when creating data warehouse they are older systems that don’t have the complete data and that needs to be corrected before you will be able to use the data warehouse because the data table primary key is a composite key made from a foreign key of each dimension table.
  5. Locate Data Sources and Plan Data Transformation – This is the how to get it part. We determine the critical information and how to develop the data warehouse structure and the next step is how to move the data in a consolidated and consistent structure, how to correlate the CRM and databases. Designing is the step that will help reconcile data into every separate database to correlate the information and copy it to the data warehouse tables. Then the data will be scrubbed and be certain that the fields are not left blank because the information might be crucial to provide accurate data analysis. The data needs to be transformed from one data source from another one, the timeframe is also a concern so the data transformation is also something to consider because transforming require complex programs with sophisticated algorithms to represent the values. We will use ETL and when we load it into schemas the data warehouse will be easier. The information exist in a silo at this point and it does not translate to the spectrum to help the data merge. The development of this system will acquire the data and present it to the user in their desired format.
Sample Assignment On IT18 Decision Support Systems and Data Warehousing Image 1
  1. Set Tracking Duration – Storing data takes a lot of storage space so it is imperative to determine what storage to use and how to keep up with data as time goes by because it needs to be available forever. The data is stored in levels and the level must be consistent in the structure. The data can stay in the day grain for 2 years and move to another structure to save space, after 3 to 5 years it will be move to the next structure which is the week and so on to the next that will be the month. This planning of structure will determine the success of the data warehouse.
  2. Implement the Plan – Now the plan is developed, it gives the steps that will help forecast the work, schedule and budget of the project. This is a slow and large project that where the delivery dates will project the success of the timeframe keeping it on track. Phase delivery is a good approach because the steps like planning and implementing will need to fit into the structures adding the capabilities into the previous structures that will only add value to the system. (Walls & Scott 1999)

Implementation

This project is now moving to the next step which is implementation. We are going to use SAP data warehouse software because upon determination SAP will be the one we will use. SAP is the program that describe the tasks and concepts for managing task chains and configure the scheduling profiles. In the implementation part we will try to make it work by designing task chains for dependencies in the data load processes. The schedule will be simple and flexible while monitoring the executions. To get in the program we need user authorization and authentication that will be implemented with the user account and authentication service. This is a Department of Defense Project and just like any other needs Data Privacy and Protection and this is done via many legal requirements and privacy acts that are specific with the government. SAP provides us with features and functions that are in compliance with federal requirements. After each transaction the personal data is deleted to avoid identity theft. Every time the task is activated the “responsible” database entry is cleared. Here is the table codes for the query:

DELETE '/instance/:instanceId' clears all information regarding a specific instanceId, for example, when the corresponding service is deleted.

POST '/instance/:instanceId/anonymize' anonymizes all known user relevant information for the whole instanceId.

Query parameters for route DELETE '/taskChain/:namespace/:taskChainId/' (same applies to /task/:namespace/:taskId):

?clear=true

The change log code is:

xs set-env dwf-toe AUDIT_OLD_VALUE "true"

(SAP, 2017)

We will use DataStore and name the project GCSS-A:

  1. Provide application ID.
  2. Enforcer use authentication flag.
  3. Fixed Service name.
  4. Enable SSL validation.
  5. Trigger build automatically flag.
  6. Run backened module.

The database will run in development mode, with the task chain folder right click new and task chain, new task, properties, save.

We need to extract sources with synonyms, virtual tables and flat files.

Sample Assignment On IT18 Decision Support Systems and Data Warehousing Image 2

Sample of virtual table definition.

Sample Assignment On IT18 Decision Support Systems and Data Warehousing Image 3

Example of SQL code editor.

Sample Assignment On IT18 Decision Support Systems and Data Warehousing Image 4
Flowgraph in Web IDE
(SAP, 2017)
SELECT *
 FROM SalesLT.Customer;
SELECT Title, FirstName, MiddleName, LastName, Suffix
 FROM SalesLT.Customer;
SELECT Title, FirstName, MiddleName, LastName, Suffix
 FROM SalesLT.Customer;
SELECT SalesPerson, Title + '' + LastName AS CustomerName, Phone
 FROM SalesLT.Customer;
SELECT CAST(CustomerID AS VARCHAR) + ': ' + CompanyName AS CustomerCompany
 FROM SalesLT.Customer;
SELECT SalesOrderNumber + '(' + STR(RevisionNumber, 1) + ')' 
AS OrderRevision CONVERT(nvarchar(30), OrderDate, 102) AS OrderDate
 FROM SalesLT.SalesOrderHeader;
SELECT FirstName + ''+ ISNULL(MiddleName + ' ', '') + LastName AS CustomerName
 FROM SalesLT.Costomer;
SELECT CustomerID, COALESCE(EmailAddress, Phone) AS PrimaryContact
 FROM SalesLT.Costomer;
SELECT SalesOrderID, OrderDate,
 CASE
 WHEN ShipDate IS NULL THEN 'Awaiting Shipment'
 ELSE 'Shipped'
 END AS ShippingStatus
 FROM SalesLY.SalesOrderHeader
SELECT DISTINCT City, StateProvince
 FROM SalesLT.Address;
SELECT c.CustomerID, p.ProductID
 FROM SalesLT.Customer AS c
 FULL JOIN SalesLT.SalesOrderHeader AS oh
 ON c.CustomerID = oh.CustomerID
 FULL JOIN SalesLT.SalesOrderDetail AS od
 ON od.SalesOrderID = oh.SalesOrderID
 FULL JOIN SalesLT.Product AS p
 ON p.ProductID = od.ProductID
 WHERE oh.SalesOrderID IS NULL
 ORDER BY ProductID, CustomerID;

Data warehouse is challenging in the security terms, the large systems and serving many user communities is something that needs flexibility to avoid hackers attacks and the same time the data needs to be available to the users as needed while recording activities. Data warehouse contains data from many sources making this a lucrative business for hackers. A strong security structure will improve the effectiveness of data warehouse. In our business we need warehouse security because it is used by many divisions within the ARMY. The infrastructure needed is to ensure that every employee can only see the data relevant to themselves and nothing else.

For GCSSA we will use SAP Security. SAP has the biggest security response because it is committed to identify and address all issues in SAP and cloud to keep them secure. It security is something that requires profound attention because it involves and affect process, people and technology to avoid security issues from the beginning. This is the configuration of SAP as a whole with the secure configuration applied to the related systems which are authorization, encryption and logging with access control checks by computer, IP and company. The mission critical data involved in the process is protected from all types of attacks: on-site or cloud. This system is the safeguard of robust data and IT security and more.

Data Migration

Data migration is the process used to move the data from one storage system to another one. For GCSSA it needs to be done because there is an issue with the system compatibility. We have embarked in the project of data migration to replace the servers and storage equipment and it will be helping in the maintenance of the infrastructure and the migration of applications for data center relocations.

In order to create Data Migration we need a plan. This plan will cover the impact to the business in terms of delay or hiccups in the migration progress to prevent downtime in the system. Some questions we need to ask are how long it will take for the migration, how much downtime is required and set a little more time for compatibility of data. There are three categories of data moves:

  • Host – file copying.
  • Array – migrate between systems.
  • Network appliances migration – Cloud.

When we start the migration of the GCSSA we will work considering the information that we are moving is the most up-to date and in the right format, in the right order and making sure that our old data is saved before is moved. After that we will have a validation period in what the information migrated will be compared with the information saved form the old system.

GCCSSA is an ERP systems that will be in charge of all the logistical and funding enterprise. This system is replacing the old one called PBUSE, SAMS-E, SSF-MW and FCM. The development of this centralized system had opened new parameters and indexes to all the fields involved in the transition. The single database will help providing accurate, real time logistic and financial information in every component. In the deployment strategy was determined that there will be two phases, the first one is the test and evaluation which will help in the future fielding and second one will be the general fielding. In the first wave there will be a development of the full system migration but just to a small group and that it will not affect the bigger group. With this the data migration will only impact 14,000 users. In the second wave will be 140,000 users affected, 10 times the first wave. But for this one the blackout will happen while the Soldiers are taking the training, this way the data migration will be validated at the end of the training. Data Migration and implementation is not just another training, is also a cultural change among all these fields that were apart before and now they are coming together to give visibility to the stakeholders in all fields and in real time.

For .cvs we have to variables:

    String.Split(char[]) in C# or String.Split(Char()) in VB.NET
String.Split(char[], int) in C# or String.Split(Char(), Integer) in VB.NET
This is done with the coma separated value:
string values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com";
string[] sites = values.Split(',');
foreach (string s in sites) {
Console.WriteLine(s);

This will be generated:
TechRepublic.com
CNET.com
News.com
Builder.com
GameSpot.com
String code:
Dim values As String
values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com"
Dim sites As String() = Nothing
sites = values.Split(",")
Dim s As String
For Each s In sites
Console.WriteLine(s)
Next s
char[] sep = new char[3];

sep[0] = ',';
sep[1] = ':';
sep[2] = ';';
string values = "TechRepublic.com: CNET.com, News.com, Builder.com; GameSpot.com";
string[] sites = values.Split(sep, 4);
foreach (string s in sites) {
Console.WriteLine(s);
using System;
using System.Collections.Generic;
using System.IO;
using System.Text;

namespace ReadWriteCsv
{
    /// 
    /// Class to store one CSV row
    /// 
    public class CsvRow : List
    {
        public string LineText { get; set; }
    }

    /// 
    /// Class to write data to a CSV file
    /// 
    public class CsvFileWriter : StreamWriter
    {
        public CsvFileWriter(Stream stream)
            : base(stream)
        {
        }

        public CsvFileWriter(string filename)
            : base(filename)
        {
        }

        /// 
        /// Writes a single row to a CSV file.
        /// 
        /// The row to be written
        public void WriteRow(CsvRow row)
        {
            StringBuilder builder = new StringBuilder();
            bool firstColumn = true;
            foreach (string value in row)
            {
                // Add separator if this isn't the first value
                if (!firstColumn)
                    builder.Append(',');
                // Implement special handling for values that contain comma or quote
                // Enclose in quotes and double up any double quotes
                if (value.IndexOfAny(new char[] { '"', ',' }) != -1)
                    builder.AppendFormat("\"{0}\"", value.Replace("\"", "\"\""));
                else
                    builder.Append(value);
                firstColumn = false;
            }
            row.LineText = builder.ToString();
            WriteLine(row.LineText);
        }
    }

    /// 
    /// Class to read data from a CSV file
    /// 
    public class CsvFileReader : StreamReader
    {
        public CsvFileReader(Stream stream)
            : base(stream)
        {
        }

        public CsvFileReader(string filename)
            : base(filename)
        {
        }

        /// 
        /// Reads a row of data from a CSV file
        /// 
        /// 
        /// 
        public bool ReadRow(CsvRow row)
        {
            row.LineText = ReadLine();
            if (String.IsNullOrEmpty(row.LineText))
                return false;

            int pos = 0;
            int rows = 0;

            while (pos < row.LineText.Length)
            {
                string value;

                // Special handling for quoted field
                if (row.LineText[pos] == '"')
                {
                    // Skip initial quote
                    pos++;

                    // Parse quoted value
                    int start = pos;
                    while (pos < row.LineText.Length)
                    {
                        // Test for quote character
                        if (row.LineText[pos] == '"')
                        {
                            // Found one
                            pos++;

                            // If two quotes together, keep one
                            // Otherwise, indicates end of value
                            if (pos >= row.LineText.Length || row.LineText[pos] != '"')
                            {
                                pos--;
                                break;
                            }
                        }
                        pos++;
                    }
                    value = row.LineText.Substring(start, pos - start);
                    value = value.Replace("\"\"", "\"");
                }
                else
                {
                    // Parse unquoted value
                    int start = pos;
                    while (pos < row.LineText.Length && row.LineText[pos] != ',')
                        pos++;
                    value = row.LineText.Substring(start, pos - start);
                }

                // Add field to list
                if (rows < row.Count)
                    row[rows] = value;
                else
                    row.Add(value);
                rows++;

                // Eat up to and including next comma
                while (pos < row.LineText.Length && row.LineText[pos] != ',')
                    pos++;
                if (pos < row.LineText.Length)
                    pos++;
            }
            // Delete any unused items
            while (row.Count > rows)
                row.RemoveAt(rows);

            // Return true if any columns read
            return (row.Count > 0);
        }
    }
}

Here is the code broken down by sections:
using System.IO;
using LumenWorks.Framework.IO.Csv;
void ReadCsv()
{
    // open the file "data.csv" which is a CSV file with headers
    using (CsvReader csv =
           new CsvReader(new StreamReader("data.csv"), true))
    {
        int fieldCount = csv.FieldCount;

        string[] headers = csv.GetFieldHeaders();
        while (csv.ReadNextRecord())
        {
            for (int i = 0; i < fieldCount; i++)
                Console.Write(string.Format("{0} = {1};",
                              headers[i], csv[i]));
            Console.WriteLine();
        }
    }
}
using System.IO;
using LumenWorks.Framework.IO.Csv;
void ReadCsv()
{
    // open the file "data.csv" which is a CSV file with headers
    using (CsvReader csv = new CsvReader(
                           new StreamReader("data.csv"), true))
    {
        myDataRepeater.DataSource = csv;
        myDataRepeater.DataBind();

References

Data warehouse reference architectures. (2014, March 26). Retrieved from https://www.lynda.com/SQL-Server-tutorials/Data-warehouse-reference-architectures/156150/167724-4.html

Database / Hardware Tool Selection in Data Warehousing. (n.d.). Retrieved from https://www.1keydata.com/datawarehousing/tooldatabase.html

Goals of a Data Warehouse - Rensselaer Data Warehouse Project. (n.d.). Retrieved from http://www.rpi.edu/datawarehouse/dw-goals.html

IBM Analytics - Analytic Solutions for Business. (n.d.). Retrieved from https://www.ibm.com/analytics

SQL Server 2016 | Microsoft. (n.d.). Retrieved from https://www.microsoft.com/en-us/sql-server/sql-server-2016

Good one to know! (n.d.). Retrieved from http://www.businessdictionary.com/definition/SAP.html

Fulton, S. M. (2013, September 25). What Is Data Warehousing Today? - Understanding Data Warehousing. Retrieved from http://www.tomsitpro.com/articles/data_governance-big_data-business_analytics-shadow_it-hadoop,2-549-2.html

Walls, D., & Scott, M. D. (1999, December 20). 7 Steps to Data Warehousing. Retrieved from http://www.itprotoday.com/microsoft-sql-server/7-steps-data-warehousing

SAP Help Portal. (n.d.). Retrieved from https://help.sap.com/viewer/ff18034f08af4d7bb33894c2047c3b71/7.5.9/en-US/b2e50138fede083de10000009b38f8cf.html

Develop your agile DW with SAP Web IDE - SAP HANA SQL Data Warehouse - SAP HANA. (2017, December 8). Retrieved from https://blogs.saphana.com/2017/12/08/web-ide-sap-hana-sql-data-warehouse/

Securing a Data Warehouse. (n.d.). Retrieved from https://docs.oracle.com/cd/B28359_01/server.111/b28314/tdpdw_security.htm#TDPDW0121

Why SAP for Security | SAP Security Overview. (n.d.). Retrieved from https://www.sap.com/corporate/en/company/security.html

McDonough, J. (2018, January 23). The United States Army | GCSS-Army. Retrieved from https://gcss.army.mil/Library/TopStories/WaveOneEnds.aspx

Patton, T. (2006, January 24). Easily parse string values with .NET. Retrieved from https://www.techrepublic.com/article/easily-parse-string-values-with-net/

Rouse, M. (2017, April). What is data migration? - Definition from WhatIs.com. Retrieved from http://searchstorage.techtarget.com/definition/data-migration

Wood, J. (2012, July 4). Reading and Writing CSV Files in C# - CodeProject. Retrieved from https://www.codeproject.com/Articles/415732/Reading-and-Writing-CSV-Files-in-Csharp