You are on page 1of 116

Data Management

Data management is the development and execution of architectures, policies,


practices and procedures in order to manage the information lifecycle needs of an
enterprise in an effective manner.
Data life cycle management (DLM) is a policy-based approach to managing the flow
of an information system's data throughout its life cycle: from creation and initial
storage to the time when it becomes obsolete and is deleted. Several vendors offer
DLM products but effective data management involves well-thought-out procedures
and adherence to best practices as well as applications.
There are various approaches to data management. Master data management
(MDM), for example, is a comprehensive method of enabling an enterprise to link all
of its critical data to one file, called a master file, that provides a common point of
reference.
The effective management of corporate data has grown in importance as
businesses are subject to an increasing number of compliance regulations.
Furthermore, the sheer volume of data that must be managed by organizations has
increased so markedly that it is sometimes referred to as big data.
The goal of big data management is to ensure a high level of data quality and
accessibility for business intelligence and big data analytics applications.
Corporations, government agencies and other organizations employ big data
management strategies to help them contend with fast-growing pools of data,
typically involving many terabytes or even petabytes of information saved in a
variety of file formats. Effective big data management helps companies locate
valuable information in large sets of unstructured data and semi-structured data
from a variety of sources, including call detail records, system logs and social media
sites.
Most big data environments go beyond relational databases and traditional data
warehouse platforms to incorporate technologies that are suited to processing and
storing nontransactional forms of data. The increasing focus on collecting and
analyzing big data is shaping new platforms that combine the traditional data
warehouse with big data systems in a logical data warehousing architecture. As
part of the process, the must decide what data must be kept for compliance
reasons, what data can be disposed of and what data should be kept and analyzed
in order to improve current business processes or provide a business with a
competitive advantage. This process requires careful data classification so that
ultimately, smaller sets of data can be analyzed quickly and productively.

http://www.vistrails.org/index.php/Course:_Big_Data_Analysis
http://archive-org.com/page/6234871/2015-07-
09/http://www.vistrails.org/index.php/Course:_Big_Data_Analysis
https://practicalanalytics.co/predictive-analytics-101/
http://www.vistrails.org/index.php/Course:_Massive_Data_Analysis_2014
http://www.vistrails.org/index.php/Course:_Big_Data_2015
http://www.ee.columbia.edu/~cylin/course/bigdata/

DATA VISUALIZATION

Data Visualization is an art of presenting the data in a manner that even a non-analyst can
understand it. A perfect blend of aesthetic elements like colors, dimensions, labels can create
visual masterpieces, hence revealing surprising business insights which in turn helps businesses
to make informed decisions.

Data Visualization is an inevitable aspect of business analytics. As more and more sources of
data are getting discovered, business managers at all levels embrace data visualization softwares,
that allow them to analyze trends visually and take quick decisions. Currently, the most popular
tools for visualizations / data discovery are Qlikview and Tableau.

Pro and Cons of Data Visualization

Here are some pros and cons to representing data visually


Pros

It can be accessed quickly by a wider audience.


It conveys a lot of information in a small space.
It makes your report more visually appealing.

Cons

It can misrepresent information if an incorrect visual representation is made.


It can be distracting if the visual data is distorted or excessively used.

Visualize: To form a mental vision, image, or picture of


(something not visible or present to the sight, or of an
abstraction); to make visible to the mind or imagination.
Visualization is the use of computer graphics to create
visual images which aid in the understanding of complex,
often massive representations of data.
Visual Data Mining is the process of discovering implicit
but useful knowledge from large data sets using
visualization techniques.
Tableau
Tableau is a Business Intelligence tool for visually analysing the data. Users can
create and distribute interactive and shareable dashboards which depict the trends,
variations and density of the data in form of graphs and charts. Tableau can connect
to files, relational and Big data sources to acquire and process data. The software
allows data blending and real time collaboration, which makes it very unique. It is
used by businesses, academic researchers and many governments to do visual data
analysis.

You can use Tableau's drag and drop interface to visualize any data, explore
different views, and even combine multiple databases together easily. It does not
need any complex scripting. Anyone who understands the business problem can
address it with a visualization of the relevant data

Tableau Features
Tableau provides solutions for all kinds of industries, departments and data environments. Below
are the unique features which enable tableau handle so many diverse scenarios.

Speed of Analysis - As it does not need high level of programming expertise, any
computer user with access to data can start using it to derive value from the data.

Self-Reliant - Tableau does not need a complex software setup. The desktop version
which is used by most users is easily installed and contains all the features needed to start
and complete data analysis.

Visual Discovery - The user explores and analyses the data by using visual tools like
colours, trend lines, charts and graphs. There is very little script to be written as nearly
everything is done by drag and drop.

Blend Diverse Data Sets - Tableau allows you to blend different relational, semi-
structured and raw data sources in real time, without expensive up-front integration costs.
The users dont need to know the details of how data is stored.

Architecture Agnostic - Tableau works in all kinds of devices where data flows. So the
user need not worry about specific hardware or software requirements to use Tableau.

Real Time Collaboration - Tableau can filter, sort, and discuss data on the fly and
embed a live dashboard in portals like SharePoint site or Salesforce. You can save your
view of data and allow colleagues to subscribe to your interactive dashboards so they see
the very latest data just by refreshing their web browser.

Centralized Data - The tableau server provides a centralized location to manage all of
the organizations published data sources. You can delete, change permissions, add tags,
and manage schedules in one convenient location. Its easy to schedule extract refreshes
and manage them in the data server. Administrators can centrally define a schedule for
extracts on the server for both incremental and full refreshes.
There are three basic steps involved in creating any Tableau data analysis report. They are given
below.

Connect to a data source: It involves locating the data and use an appropriate type of
connection to read the data.

Choose Dimensions and Measures: This involves selecting the required columns from
the source data for analysis.

Apply Visualization technique: This involves applying required visualization methods


like a specific chart or graph type to the data being analyzed.

Fortunately, with the help of powerful and easy to use visualization tools from companies such as
Tableau Software, thats rapidly changing. Heres a look at how data visualization is empowering
more and more companies to draw actionable insights from Big Data.

The power to show data visually

Tables of numbers can be difficult to understand. And attempting to gain insights from data on
spreadsheets can be long and laborious. When viewed in a visual format such as diagrams,
charts, maps and graphs, data becomes easier to understand---without anyone having to be a data
scientist. In fact, understanding visualized data can happen almost instantaneously, regardless of
the viewers knowledge of complex mathematical or statistical algorithms.

The power to reveal details

Data visualization harnesses the ability of the human visual system to process an entire visual
field all at once. This is particularly helpful when dealing with noisy, highly unstructured data.
When complex data is properly visualized, its easier for the viewer to pick out various details
that could lead to important insights---details that might otherwise remain buried in the math.
The key is finding a view of the data that can reduce complexity without compromising
important information.

The power to get to answers fast

Due to its intuitive nature, data visualization allows those who are working with information
questions to essentially do so at the speed of thought, as opposed to having to write queries on a
database. By eliminating the query process, data visualization reduces the time it takes to get to
the real answers.

The power of simplicity


Data visualization is all about data simplification. Along with providing the ability to create the
visual that best serves the data, visualization software has built-in features designed to simplify
the data analysis process. One such feature---proprietary to Tableau Software---is a drag and
drop interface that lets data viewers look at various graphical views quickly and easily as they
analyze the data.

The power to improve team problem solving

One of the challenges for teams attempting to analyze problems is to quickly get all team
members on the same page. Data visualization helps unify teams by creating a shared view of
the information. In addition, data visualization allows team members to click on specific data
points to better determine what is going on, thus heightening the ability of the team to solve real
problems and gain insights together in real time.

The power to make better decisions

Data visualization helps people make smarter more effective decisions by enabling them to see
the types of patterns, trends and relationships within the data that lead to insights. Armed with
these insights, decision makers can take clear and concrete actions to drive desired outcomes.

Dont let the vast amounts of data intimidate you. Leverage the power of Big Data platforms,
like Hadoop, and other data visualization technologies to more easily extract insights and present
your results to others.

What is Data Visualisation?


Data visualisation is, quite simply, the process of describing information through visual
rendering. Humans have used visualisations to explain the world around them for millions of
years.

Tableau leads the world in making the data visualisation process available to business users of
every background and industry. Businesses around the globe realize that the ability to visualise
data effectively leads directly to better understanding, insight and better business decisions.

Tableau Software enables businesses to keep pace with the evolving technology landscape and
outperform competitors through an adaptive and intuitive means of visualising their data.

Tableau is groundbreaking data visualisation software created by Tableau Software.


Tableau connects easily to nearly any data source, be it corporate Data Warehouse,
Microsoft Excel or web-based data. Tableau allows for instantaneous insight by
transforming data into visually appealing, interactive visualisations called
dashboards. This process takes only seconds or minutes rather than months or
years, and is achieved through the use of an easy to use drag-and-drop interface.
What is Data Architecture
Data Architecture basically deals with designing and constructing data
resource. Data Architecture provides methods to design, construct and
implement a fully integrated, business-driven data resource that include real
world objects and events, onto appropriate operating environments. Data
Architecture also covers data resource components.

Data architecture is one of the pillars of Enterprise Architecture. The other


pillars are the Application Architecture, Business Architecture and Integration
Architecture. The Data Architecture pillar is the definition or blueprint of the
data design which will be used in achieving the implementation of a physical
database.

The data architecture can be compared to a house design where all the
descriptions of the house structure to be built from the choice of materials,
sizes and style of the rooms and roofing, lay out of the plumbing and
electrical structures are described in the blueprint.

In the same manner, the data architecture describes the way data will be
processed, stored and used by the organization that will use it. It lays out the
criteria on processing operations including the whole flow of the system.
Data architecture falls into the realm of work of data architects. These
professionals define the target state, make developmental alignments and
then once the data architecture is implemented, data architects make
whatever enhancements tailored to the needs and one in the spirit of the
original blueprint.

Designing a data architecture is a complex process because this will involve


relating abstract data models to real life business activities and entities
before implementing the database design and finally setting up the IT
hardware infrastructure. If the data architecture is not stable or robust, a
failure at one point can save a wave of failures that can cause the whole
system to breakdown and result in a lot of monetary loss to the
organizations. The data architecture breaks down subjects into atomic level
and then builds them up again to the desired form during the definition of
the target state phase. In breaking the subject, there are three traditional
architectural processes to be considered. The Conceptual aspect represents
all the business entities and its related attributes. The Logical aspect
represents the entire logic of the entity relationships. The Physical aspect is
the actual data mechanism for particular types of functionalities.

The general framework of the data architecture takes into consideration


three vast aspects of the whole information system.
The first is the physical data architecture which is focused on actual tangible
hardware elements. Depending on the bulk of data to be processed and the
number of data consumers who may be accessing the data warehouse
simultaneously, investment in physical data architecture includes buying top
of the line computer servers, routers, and other network paraphernalia.

The second aspect pertains to the elements of the data architecture. For
instance, the structure of the administrative section of the implementing
company and how data is used within the section should be described in the
data architecture to become the basis for data models.

Data analysis is a highly iterative and non-linear process, better reflected by a series of cyclic
process, in which information is learned at each step, which then informs whether (and how) to
refine, and redo, the step that was just performed, or whether (and how) to proceed to the next
step.

Setting the Scene

Data analysis is a study of subjective question and study even includes developing and executing
a plan for collecting data, a data analysis presumes the data have already been collected. More
specifically, a study includes the development of a hypothesis or question, the designing of the
data collection process (or study protocol), the collection of the data, and the analysis and
interpretation of the data.

Activities of data Analysis

There are 5 core activities of data analysis:

1. Stating and refining the question

2. Exploring the data

3. Building formal statistical models

4. Interpreting the results

5. Communicating the results

1. Stating and Refining the Question

Doing data analysis requires quite a bit of thinking and we believe that when youve completed a
good data analysis, youve spent more time thinking than doing. The thinking begins before you
even look at a dataset, and its well worth devoting careful thought to your question. This point
cannot be over-emphasized as many of the fatal pitfalls of a data analysis can be avoided by
expending the mental energy to get your question right.
Types of Questions:-

Descriptive

A descriptive question is one that seeks to summarize a characteristic of a set of data. Examples
include determining the proportion of males, the mean number of servings of

fresh fruits and vegetables per day, or the frequency of viral illnesses in a set of data collected
from a group of individuals.

Exploratory

An exploratory question is one in which you analyze the data to see if there are patterns, trends,
or relationships between variables. These types of analyses are also called hypothesis-

generating analyses because rather than testing a hypothesis as would be done with an
inferential, causal, or mechanistic question, you are looking for patterns that would support
proposing a hypothesis.

Inferential

An inferential question would be a restatement of this proposed hypothesis as a question and


would be answered by analyzing a different set of data.

Predictive

A predictive question would be one where you ask what types of people will eat a diet high in
fresh fruits and vegetables during the next year. In this type of question you are less interested in
what causes someone to eat a certain diet, just what predicts whether someone will eat this
certain diet. For example, higher income may be one of the final set of predictors, and you may
not know (or even care) why people with higher incomes are more likely to eat a diet high in
fresh fruits and vegetables, but what is most important is that income is a factor that predicts this
behavior.

Mechanistic

This will lead to an answer that will tell us, if the diet does, indeed, cause a reduction in the
number of viral illnesses, how the diet leads to a reduction in the number of viral illnesses. A
question that asks how a diet high in fresh fruits and vegetables leads to a reduction in the
number of viral illnesses would be a mechanistic question.

2. Exploratory Data Analysis

Exploratory data analysis is the process of exploring your data, and it typically includes
examining the structure and components of your dataset, the distributions of individual variables,
and the relationships between two or more variables. The most heavily relied upon tool for
exploratory data analysis is visualizing data using a graphical representation of the data.

There are several goals of exploratory data analysis, which are:

1. To determine if there are any problems with your dataset.

2. To determine whether the question you are asking can be answered by the data that you have.

3. To develop a sketch of the answer to your question.

3. Using Models to Explore Your Data

In a very general sense, a model is something we construct to help us understand the real world.
But a simple summary statistic, such as the mean of a set of numbers, is not enough to formulate
a model. A statistical model must also impose some structure on the data. At its core, a statistical
model provides a description of how the world works and how the data were generated. The
model is essentially an expectation of the relationships between various factors in the real world
and in your dataset. What makes a model a statistical model is that it allows for some
randomness in generating the data.

4. Comparing Model Expectations to Reality

Inference is one of many possible goals in data analysis and so its worth discussing what exactly
is the act of making inference.

1. Describe the sampling process

2. Describe a model for the population(populations is subset of my data)

Drawing a fake picture:- To begin with we can make some pictures, like a histogram of the data.

Reacting to Data: Refining Our Expectations

Okay, so the model and the data dont match very well, as was indicated by the histogram above.
So what do do? Well, we can either

1. Get a different model

2. Get different data

5. Interpreting Your Results and Communication

Communication is fundamental to good data analysis. You gather data by


communicating your results and the responses you receive from your audience
should inform the next steps in your data analysis. The types of responses you
receive include not only answers to specific questions, but also commentary and
questions your audience has in response to your report.

Connect to a Data Source


One opening Tableau we get the start page Showing various data sources. Under the header
Connect, we have options to choose a file or server or saved data source. Under Files we choose
excel. Then navigate to the file Sample Superstore.xls as mentioned above. The excel file has
three sheets named orders, people and Returns. We choose Orders.

Choose the Dimensions and Measures


Next we choose the data to be analyzed by deciding on the dimensions and measures.
Dimensions are the descriptive data while measures are numeric data. When put together, they
help us visualize the performance of the dimensional data with respect to the data which are
measures. We choose category and region as the dimensions and sales as the measure. Drag and
drop them as shown below. The result shows the total sales in each category for each region.
Ap
ply Visualization Technique
In the previous step we see that the data is available only as numbers. We have to read and
calculate each of the values to judge the performance. But we can see them as graphs or charts
with different colours to get a quicker judgment.

We drag and drop the sum(sales) column from the Marks tab to the Columns shelf. The table
showing the numeric values of sales now turns into a bar chart automatically.
We
can apply a further technique of adding another dimension to the existing data and that will add
more colours to the existing bar chart as shown below.
Bel
ow is a flow of design steps that should be ideally followed to create effective dashboards.
Con
nect to Data Source
Tableau connects to all popular data sources. It has inbuilt connectors which take care of
establishing the connection once the connection parameters are supplied. be it Simple text files,
Relational sources, No Sql sources or Cloud data bases, tableau connects to nearly every data
source.

Steps involved in Connecting to Excel Files in Tableau


Before we start, Let us see the data present in the Excel file. If you observe the below screenshot,
its just a normal .xlsx file holding two sheets or two tables.

Below screenshot will show you the Data present in Customers sheet
Below screenshot will show you the Data present in Department sheet
If you havent started the Tableau yet, Double click on the Tableau desktop to open. Once it is
open, it looks like below screenshot

First, Under the Connect section, Please select the Excel Option
Once you selected the Excel Option, a new window will be opened to select the Excel file from
our file system. For now, we are selecting the CustomersAndDept.xlsx file as shown below
Once you are done, below screenshot will be appeared. Please understand the following option
before you start creating report

1. Workbook: Excel File we selected from our file system.

2. Sheets: This section will display the Sheets or Tables present in the Excel
source. We have two tables so, it is displaying those two sheets (Customers
and Department). We have the search bar under this section and it is very
useful for large number of sheets. For instance, If you have 30 or 40 sheets
then you can use this to search for specific table

3. Drag Sheets Here: You have to Drag Table(s) from Sheets to this Section.
Tableau will only use the tables present in this area. This is something like
Dataset.

4. This region will show the data present in our Dataset

NOTE: CustomersAndDept is the default data source name (Excel File name) assigned by the
Tableau. Please change this default name to more meaningful unique name as per your
requirements
We can add the sheets to Region 3 in Multiple Ways: As the Name suggest, Either we can Drag
the Customers Table from Sheets region to 3rd region or else simply double-click on the required
table will automatically add

TIP: Tableau allows us to add multiple tables using Joins


Once you dragged the Customers sheet, Preview region will display the data present in that
sheet.

If you observe the below screenshot, there is a message showing Data Interpreter is On

Data Interpreter: If Tableau discovers the format of any column data in our selected sheet is
difficult to analyze, Tableau will pop up a message asking us to Turn On Data Interpreter. By
turning Data Interpreter on, we can normalize the data
Once you finished, Click on the Sheet 1 tab to design the report.

1. Data: This will display the list of currently connected data Sources. We have
only at this time otherwise, it will display all the data sources available.

2. Dimensions: Columns with string data will be placed under the Dimensions
section

3. Measures: Columns with Numeric data or Metric values will be placed under
the Measures section

4. This is the region where we design our Tableau reports by dragging Measures
and Dimensions
NOTE: From the above screenshot you can observe that, We are only seeing the data present in
Customers table though we have Customers and Department tables in our Excel file. If you want
to add Department table, Click back button on Top and Join the Department table with
Customers table

Bar chart in Tableau


Bar Chart in Tableau is very useful to compare the data visually. For example, If we want to
check the Sales by Color we can use this bar chart. By seeing this bar chart, One can understand,
Which color product is performing better compared to other.

Creating Bar chart in Tableau Desktop is very easy. If you drag a measure to Column Card and
Dimension to Row Card, it will automatically generate the Bar Chart.

Create a Bar Chart in Tableau Approach 1


First, Drag and Drop the Color Dimension to Row Shelf and Sales Amount to Text option in
Marks Card. Since Sales Amount is a Measure value, it will be aggregated to default Sum.
Now we have to change the Table report to Bar chart using the Show Me option. Please expand
the Show Me window and select the Bar Chart from it as shown below
Once you select the Bar Chart from Show Me window, Bar Chart will be displayed as shown in
below screenshot
Tableau allows us to convert the Horizontal Bar Chart to Vertical Column Chart. To do this,
Please click on the Swap option present in Tableau Tool bar as shown in below screenshot

Once you click on the Swap option, Our Horizontal Bar Chart is converted to Vertical Column
Chart as shown below
If you observe the above screenshot, It is providing perfect result but we are unable to identify
the exact amount Sales of each color. To resolve these situations, We have to display the Data
Labels.

Adding Data labels to Bar Chart in Tableau


To add data labels to Bar chart, Please drag and Drop the data Label values from Dimension or
Measures Pane to Label option in Marks Card. In this example, We want to display the Sales
Amount as Data labels so, Drag and Drop the Sales Amount from Measures region to Labels
option
Once you are done, You can see the Data Labels in Tableau Report
Stacked Bar Charts in Tableau
Creating Stacked Bar Charts in Tableau Desktop is very easy compared to any other Business
Intelligence Tools. In this example, We want to Stack the Bar Chart by the Country Region So,
we are dragging English Country Region Name from Dimension region to Color Option in
Marks Card
Once you are done, You can see the Stacked Bar Chart As shown in below screenshot. Tableau
Desktop allocate default colors for every region but from the Marked Region, You can edit the
colors by selecting the Edit Colors.. option from the context menu.
Pie Chart in Tableau
Pie Chart in Tableau is very useful to display the Sales by region, Country wide customers, Sales
by Country etc. Pie Charts are also useful in the dashboard design, we can use pie chart to
display Country wise sales and then use the Action filters to further drill down.

Create a Pie Chart in Tableau Approach 1


First, Drag and Drop the Sales Amount from Measures Region to Columns Card. Since it is a
Measure value, Sales Amount will be aggregated to default Sum.
Next, Drag and Drop the English Country Region Name from Dimension Region to Rows
Card. Once you drag them, following screenshot will be displayed.
Now we have to change the default Bar chart to Pie Chart using the Show Me option. Please
expand the Show Me window and select the Pie Chart from it as shown below

Once you select the Pie Chart from Show Me window, Pie Chart will be displayed with default
colors
Please use the Size option in Marks Card to expand or Shirk the Pie chart in Tableau
If you observe the above screenshot, It is providing perfect result but we are unable to identify
the difference between Sales in France and Sales in Germany. To resolve these situations, We
have to display the Data Labels.

Adding Data labels to Pie Chart in Tableau


To add data labels to Pie chart, Please drag and Drop the data Label values from Dimension or
Measures Pane to Label option in Marks Card.

In this example, We want to display the Sales Amount as Data labels so, Drag and Drop the Sales
Amount from Measures region to Labels option
Tableau allows us to add Multiple Measure values as Data Labels. This can be very useful, when
want to compare the Total Sales against the Profits by Region. To do this, Use the above
technique to place the Yearly Income
Formatting Pie Charts in Tableau
One of the most common question raised by any developer is formatting the Charts. This is
because, the Default colors or Default Pie Chart pallet may or may not be attractive to end-user.

To do this, Please select the Edit Colors.. option from the context menu as shown below
Once you select the Edit Colors.. option, a new window called Edit Colors will be opened to
select the Color Palette for English Country region name. For demonstration purpose, we are
selecting the Color Blind 10 as shown below
Click Apply button and then Click OK to finish it
Create a Pie Chart in Tableau Approach 2
This method is very easy and straight forward. First, Please select the Pie option from the drop
down list present in Marks Card
Next, Drag and Drop the Sales Amount from Measures Region to Filed region as shown below.
Since it is a Measure value, Sales Amount will be aggregated to default Sum.

Next, Drag and Drop the English Country Region Name from Dimension Region to Color
option in Marks Card as shown below.
Once you drag them, following screenshot will be displayed. Hope you understood, How easy, it
is to design or create a Pie Chart

Remember, Tableau allows us to add both Dimensions and Measures as the Data Labels. To
demonstrate this, We just placed the Sales Amount Measure and English Country Region Name
on the labels Option present in Marks card
NOTE: It is always advisable to use Pie charts on High level Data. For instance, If you use the
same Pie Chart for State wise sales rather than Country wise, you will end up with following
screenshot. If you observe closely, We cant even identify few regions.
Tableau Histogram
Tableau Histogram is useful to visualize the statistical information, that is organized in user
specified range. Though it looks like Bar chart, Histograms display data in equal intervals.

Creating Tableau Histogram


First, Drag and Drop the Sales Amount from Measures Region to Text filed in Marks Shelf and
select the Histogram option from the Show Me window as shown below
Once you select the Histogram option, a new Bin will be created on Sales Amount.
Histogram automatically adds Sales Amount bin on Columns shelf and Sales Amount measure
on Rows Shelf. If you observe the below screenshot, the Default bin size is 500 but you can
change the range value by editing it. Please refer Tableau Bins article to understand the creation
and editing bins in Tableau.
From the above screenshot you can observe that, Sales Amount on Rows shelf is aggregated by
Count rather than Sum. To change the aggregate function, Please click on the down arrow beside
the measure and change the Measure value from Count to Sum as shown below
Once you select the Sum option, Histogram will display the Sum of Sales Amount against the
Sales Amount Bin
In this example, Our task is to display the Percentage of Total. In order to do that, Please select
Quick table Calculation option from the drop down menu and then select the Percentage of
Total as shown below
Once you are done, Histogram will display the Percentage of Total Sales Amount against the
Sales Amount Bin
How to edit Tableau Histogram Table Calculation
If you want to edit the table calculation, Please select the Edit Table Calculation.. from the drop
down menu
Once you select the Edit Table Calculation.. option, a new window called Table Calculation
will be opened to change the calculation type. From this drop down list, you can select the
required calculation
Lets understand the different regions present in the Report Design.

1. Data: This will display the list of currently connected data Sources. We have only one at
this time otherwise, it will display all the data sources available.

2. Dimensions: Columns with string data will be placed under the Dimensions section

3. Measures: Columns with Numeric data or Metric values will be placed under the
Measures section

4. This is the region where we design our Tableau reports by dragging Measures and
Dimensions
5. Columns or Rows: This is the region where we place the Measures and Dimensions as
per our requirement.

6. Marks: Using this section we can change the color, size, metric value. We can also add
the Tooltip for the tableau report.

7. Filters: Tableau reports will be filtered by the fields placed in this region.

8. Pages: Tableau reports preview will be divided into different pages using the field placed
in this Pages Card.

9. Show Me: This region will show the Visualizations or Charts available for the given data

10. Data Source: This tab is used to create new Data Source or Edit the existing Data Source

11. Sheet 1: This is the sheet we are currently working on

12. This button will be used to create new Sheet

13. This button will be used to create new Dashboard

14. This button will be used to create new Story


Data Labels in Tableau Reports
Data Labels in Tableau reports or any other Business Intelligence reports plays a vital role to
understand the report data. For example, By seeing the bar chart or Pie chart we can easily
understand, which country sales is greater than the other but we cant see how much sales (in
number) each country has done. In these situations we can enable Data Labels on the country
region.

Tableau Sort
In Tableau sort is the process of arranging or ordering the data in Ascending Order or
Descending Order. For example, When we are showing products by region report it would be
nice and effectively, If we show the products in any sort manner.

Tableau Sort Method 3


Select and Right Click on the Dimension, on which you want to perform sorting. Please select
the Sort.. option from the context menu
Once you click on the Sort.. option, a new Sort window will be opened to configure sorting
options. By default, data will be sorted by the order we specified in the data source but we can
change using the options specified in Sort by section

If you select the Sort by Field, we have to select the field name from the drop down list.
Remember if you are viewing Sales amount data in tableau report, you should select the same
filed here otherwise, sorting results may look strange
If you select the Sort by option Manual, we can manually change the order using Up and down
buttons
For now, we selected the Sort by option as Field and selected Sales Amount in descending order
Once you click OK button, Color (group) data will sorted by its sales amount in Descending
Order as shown below
Until now it looks nice and easy to implement Tableau sort because we are performing sort
operations on Color (group), which is lying inside the English Country Region Name. Lets see
what will happen, if we perform sorting in Country column.
What is QlikView?
QlikView is a business discovery platform that provides self-service BI for all business users in
organizations. With QlikView you can analyze data and use your data discoveries to support
decision making. QlikView lets you ask and answer your own questions and follow your own
paths to insight. QlikView enables you and your colleagues to reach decisions collaboratively.

QlikView compresses data and holds it in memory, where it is available for


immediate exploration by multiple users. For data sets too large to fit in memory,
QlikView connects directly to the data source. QlikView delivers an associative
experience across all the data used for analysis, regardless of where it is stored. You
can start anywhere and go anywhere; and are not limited to pre-defined drill paths
or pre-configured dashboards.
Features of QlikView
QlikView has patented technology which enables it to have many features that are useful in
creating advanced reports from multiple data sources quickly. Below is a list of features which
makes QlikView very unique.

Data Association is maintained automatically - QlikView automatically recognizes the


relationship between each piece of data that is present in a dataset. Users need not
preconfigure the relationship between different data entities.

Data is held in memory for multiple users, for a super-fast user experience - The
structure, data and calculations of a report are all held in the memory (RAM) of the
server.

Aggregations are calculated on the fly as needed - As the data is held in memory,
calculations are done on the fly. No need of storing pre-calculated aggregate values.
Data is compressed to 10% of its original size - QlikView heavily uses data dictionary
and only essential bits of data in memory required for any analysis. Hence it compresses
the original data to a very small size.

Visual relationship using colors - The relationship between data is not shown by arrow
or lines but by colors. Selecting a piece of data gives specific colors to the related data
and another color to unrelated data

Direct and Indirect searches - Instead of giving the direct value a user is looking for,
they can input some related data and get the exact result because of the data association.
Of course they can also search for a value directly.

QlikView accepts Excel spreadsheet for data analysis by simple drag and drop. You need to open
the QlikView main window and drag and drop the excel file into the interface. It will
automatically create the sheet showing the excel data.

Select the Excel file


Keep the main window of QlikView open and browse for the excel file you want to use.

Select a data source


On dropping the excel file into the main window the File Wizard appears. The File Type is
already chosen as Excel. Under Labels choose Embedded Labels. Click "Next step" to proceed.
Load Script
The Load script appears which shows the command that loads the data into the QlikView
document. This command can be edited.
Now the Excel wizard prompts to save the file in the form of *.qvw file extension. It asks to
select a location where you need to save the file. Click "Next step" to proceed. Now it is time to
see the data that is loaded from the Excel file. We use a Table Box sheet object to display this
data.
Create Table Box
The Table Box is a sheet object to display the available data as a table. It is invoked from the
menu Layout -> New Sheet Object -> Table Box.

On clicking Next we get the option to choose the fields from the Table Box. You can use the
Promote or Demote buttons to rearrange the fields.
Table Box Data
On completing the above step the Table Box Sheet Object appears which shows the data that is
read from the Excel file.
Any data analysis involves a lot of calculations. In Tableau the calculation editor is used to apply
calculations to the fields being analyzed. Tableau has a number of inbuilt functions which help in
creating expressions for complex calculations.

The description of different categories of functions are given below.

Number Functions

String Functions

Date Functions

Logical Functions

Aggregate Functions

Number Functions
These are the functions used for numeric calculations. They only take numbers as inputs.Below
are some examples of important number functions.
Function Description Example

Rounds a number to the nearest integer


CEILING(number) CEILING(2.145) = 3
of equal or greater value.

POWER(number,
Raises the number to the specified power. POWER(5,3) = 125
power)

ROUND(number, Rounds numbers to a specified number of ROUND(3.14152,2)


[decimals]) digits. = 3.14

String Functions
String Functions are used for string manipulation. Below are some important string functions
with examples.

Function Description Example

LEN(string) Returns the length of the string. LEN("Tableau") = 7

Returns the string with any


LTRIM(string) LTRIM(" Tableau ") = "Tableau"
leading spaces removed.

Searches string for substring and


REPLACE(string, REPLACE("GreenBlueGreen",
replaces it with replacement. If
substring, "Blue", "Red") =
substring is not found, the string
replacement) "GreenRedGreen"
is not changed.

Returns string, with all


UPPER(string) UPPER("Tableau") = "TABLEAU"
characters uppercase.

Date Functions
Tableau has a variety of date functions to carry out calculations involving dates. All the date
functions use the date_part which is a string indicating the part of the date like - month, day or
year. Below are the examples of some of the important date functions.

Function Description Example

Returns an increment added


DATEADD('month', 3,
DATEADD(date_part, to date. The type of
#2004-04-15#) = 2004-07-
increment, date) increment is specified in
15 12:00:00 AM
date_part.

Returns date_part of date as


DATENAME(date_part, DATENAME('month', #2004-
a string. The start_of_week
date, [start_of_week]) 04-15#) = "April"
parameter is optional.

Returns the day of the given


DAY(date) DAY(#2004-04-12#) = 12
date as an integer.

NOW( ) Returns the current date and NOW( ) = 2004-04-15


time. 1:08:21 PM

Logical Functions
These functions evaluate some single value or result of an expression and give a boolean output.

Function Description Example

The IFNULL function returns the first


IFNULL(expression IFNULL([Sales], 0)
expression if the result is not null, and
1, expression2) = [Sales]
returns the second expression if it is null.

ISDATE("11/05/98"
The ISDATE function returns TRUE if the
) = TRUE
ISDATE(string) string argument can be converted to a
ISDATE("14/05/98"
date and FALSEif it cannot.
) = FALSE

The MIN function returns the minimum of


an expression across all records or the
MIN(expression)
minimum of two expressions for each
record.

Aggregate Functions
Exam
Function Description
ple

Returns the average of all the values in the expression.


AVG(expressio
AVG can be used with numeric fields only. Null values are
n)
ignored.

COUNT(express Returns the number of items in a group. Null values are


ion) not counted.

Returns the median of an expression across all records.


MEDIAN(expres
Median can only be used with numeric fields. Null values
sion)
are ignored.

STDEV(express Returns the statistical standard deviation of all values in


ion) the given expression based on a sample of the population.

Tableau - Dashboard
A dashboard is a consolidated display of many worksheets and related information in a single
place. It is used to compare and monitor a variety of data simultaneously. The different data
views are displayed all at once. Dashboards are shown as tabs at the bottom of the workbook and
they usually get updated with the most recent data from the data source. While creating a
dashboard, we can add views from any worksheet in the workbook along with many supporting
objects such as text areas, web pages, and images.
Each view you add to the dashboard is connected to its corresponding worksheet. So when you
modify the worksheet, the dashboard is updated and when you modify the view in the dashboard,
the worksheet is updated.

Creating a Dashboard
Using the Sample-superstore, let's plan to create a dashboard showing the sales and profits for
different segments and subcategory of products across all the states. To achieve this objective we
follow the below steps.

Step-1
Create a blank worksheet by using the add worksheet icon located at the bottom of the
workbook. Drag the dimension Segment to the columns shelf and the dimension Sub-Category to
the Rows Shelf. Drag and drop the measure Sales to the color shelf and the measure Profit to the
Size shelf. This worksheet is referred as the Master worksheet. Right click and rename this
worksheet as Sales_Profit. The below chart appears.
Step-2
Next we create another sheet to hold the details of the Sales across the States. For this we drag
the dimension State to the Rows shelf and the measure Sales to the Columns shelf. Next we
apply a filter to the State field to arrange the Sales in descending order. Right click and rename
this worksheet as Sales_state. Follow the diagram below to create this sheet.
Step-3
Next we create a blank dashboard by clicking on the create new dashboard link at the bottom of
the workbook. Right click and rename the dashboard as Profit-Dashboard.
Step-4
Drag the two worksheets to the dashboard. Near the top border line of Sales Profit worksheet,
you can see three small icons. Click on the middle one which shows the prompt Use as filter on
hovering the mouse above it.
Step-5
Now in the dashboard click on the box representing Sub-category named Machines and segment
named Consumer.

you can notice that only the states where the sales happened for this amount of profit are filtered
out in the right pane named Sales_state. So this illustrates how the sheets are linked in a
dashboard.
Fo
recasting is about predicting the future value of a measure. There are many mathematical models
for forecasting. Tableau uses the model known as exponential smoothing. In exponential
smoothing, recent observations are given relatively more weight than older observations. These
models capture the evolving trend or seasonality of the data and extrapolate them into the future.
The result of a forecast can also become a field in the visualization created.

Tableau takes one time dimension and one measure field to create forecast.

Creating a Forecast
Using the Sample-superstore, let's forecast the value of the measure sales for next year .To
achieve this objective we follow the below steps.

Step-1
Create a line chart with Order Date (Year) in the columns shelf and Sales in the Rows shelf. Go
to the Analysis tab as shown below and click on Forecast under Model.
Step-2
On completing the above step we get the option to set various options for forecast. We choose the
Forecast Length as 2 Years and leave the Forecast Model to Automatic as shown below.
On completing the above steps we get the final forecast result as shown below.
Describe Forecast
We can also get the minute details of the forecast model by choosing the option Describe
Forecast. We get this option by right clicking on Forecast diagram shown above.
Wor
ksheet in the Tableau screen is the area where you create the views for data analysis. By default
Tableau provides three blank worksheets when you have established a connection to data source.
We can go on adding multiple worksheets to look at different data views in the same screen, one
after another.

Adding a worksheet
We can add a worksheet in two ways. Right click on the name of the current worksheet and
choose the option New Worksheet from the pop up menu. You can also click on the small icon to
the right of the last Sheet name to add a worksheet.
Tre
nd lines are used to predict the continuation of certain trend of a variable. It also helps to identify
the correlation between two variables by observing the trend in both of them simultaneously.
There are many mathematical models for establishing trend lines. Tableau gives us four options.
They are Linear, Logarithmic, Exponential and Polynomial. We will look into the linear model in
this chapter.

Tableau takes one time dimension and one measure field to create a Trend Line.

Creating Trend Line


Using the Sample-superstore, let's find the trend for the value of the measure sales for next
year .To achieve this objective we follow the below steps.
Step-1
Drag the dimension Order date to the column Shelf and the measure Sales to the rows shelf.
Choose the chart type as Line chart. In the Analysis menu go to model -> Trend Line.Clicking on
it brings up a pop up showing different types of trend lines that can be added. We choose the
linear model as shown below.

Step-2
On finishing the above step we get various trend lines. It also shows the mathematical expression
for the correlation between the fields, the P-Value and the R-Squared value.
Filtering is the process of removing certain values or range of values from a result
set.
There are three types of basic filters available in Tableau.They are as follows:

Filter Dimensions are the filters applied on the dimension fields.

Filter Measures are the filters applied on the measure fields.

Filter Dates are the filters applied on the date fields.


Filter Dimensions
These filters are applied on the dimension fields. Typical examples include filtering based on
categories of text or numeric values with logical expressions with greater than or less than
conditions.

Example
We use the Sample - Superstore data source to apply dimension filters on the sub-category of
products. We create a view for showing profit for each sub-category of products according to
their shipping mode. For it, we drag the dimension field Sub-Category to the Rows shelf and
the measure field profit to the Columns shelf.
Next drag the Sub-Category dimension to the Filters shelf to open the Filter dialog box. Click the
None button at the bottom of the list to deselect all segments. Then select the Exclude option in
the lower right corner of the dialog box. Finally, select Labels and Storage and then click OK.
The below picture shows the result with the above two categories excluded.
Filter Measures
These filters are applied on the measure fields. Filtering is based on the calculations applied to
the measure fields. So, while in dimension filters we use only values to filter, in measures filter
we use calculations based on fields.

Example
We use the Sample - Superstore data source to apply dimension filters on the average value of
the profits. First we caret a view with ship mode and subcategory as dimensions and Average of
profit as shown below.
Next we drag the AVG (profit) value to the filter pane. Choose Average as the filter mode. Next
choose "At least" and give a value to filter the rows which meet these criteria.
On finishing the above steps we get the final view below showing only the subcategories whose
average profit is greater than 20.
Filter Dates
Tableau treats the date field in three different ways while applying the date field. It can apply
filter by taking a relative date as compared to today, an absolute date or range of dates. Each of
this option is presented when a date field is dragged out of the filter pane.

Example
We choose the sample - Superstore data source and create a view with order date in the column
shelf and profit in the rows shelf as shown below.
Next drag the "order date" field to the filter shelf and choose Range of dates in the filter dialog
box. Choose the dates as shown below.
On clicking ok the below final view appears showing the result for the chosen range of dates.
QlikView has many built-in functions which are available to be applied to data that is already
available in memory. These functions are organised into many categories and the syntax of the
function appears as soon as it is selected. We can click on the paste button to get the expression
into the editor and supply the arguments.

Create Table Box


Create a Table Box by following the menu from the screen below.
On completing above step we get a window to show calculation condition at the bottom left.
List of Functions
Click on the button next to calculation condition and go to the Function tab. It shows the list of
functions available.
On choosing String from the functions category we can see only few functions which take a
string as an argument.
QlikView Aggregate functions are used to produce aggregate data from the rows of the table. The
functions are applied to the columns when creating the load script. Below is a sample list of
Aggregate functions. We also need to apply the Group by clause appropriately when applying
the aggregate functions.

SUM gives the sum of the numeric values of the column.

AVG gives the average of the numeric values of the column.


MAX gives the maximum of the numeric values of the column.

MIN gives the minimum of the numeric values of the column.

Applying SUM() function


Below is the load script to find the sum of the sales quantity and sales value across the Product
Lines and product categories.

Click OK and Control+R to reload the data into QlikView document. Now follow the same
steps as above in - Creating Sheet Objects to create a QlikView Table Box for displaying the
result of the script as shown below.
Applying AVG() function
Below is the load script to create the average of the sales quantity and sales value across each
Product Line.

# Average sales of Quantity and value in each Product Line.


LOAD Product_Line,
avg(Quantity),
avg(Value)
FROM
[E:\Qlikview\data\product_sales.csv]
(txt, codepage is 1252, embedded labels, delimiter is ',', msq)
Group by Product_Line;
Click OK and Control+R to reload the data into QlikView document. Now follow the same
steps as above in - Creating Sheet Objects to create a QlikView Table Box for displaying the
result of the script as shown below.
Applying MAX() & MIN() function
Below is the load script to create the maximum and Minimum of the sales quantity across each
Product Line.

# Maximum and Minimum sales in each product Line.


LOAD Product_Line,
max(Quantity) as MaxQuantity,
min(Quantity) as MinQuantity
FROM
[E:\Qlikview\data\product_sales.csv]
(txt, codepage is 1252, embedded labels, delimiter is ',', msq)
Group by Product_Line;
Click OK and Control+R to relaod the data into QlikView document. Now follow the same
steps as above in - Creating Sheet Objects to create a QlikView Table Box for displaying the
result of the script as shown below.
The Match() function in QlikView is used to match the value of a string on
expression with the data value present in a column. It is similar to the in function
that we see in SQL language. It is useful to fetch rows containing specific strings
and it also has an extension in form of wildmatch() function.
Load Script with Match() Function
The below script shows the Load script which reads the file named product_categories.csv. We
search the field Product_Line for values matching with strings 'Food' and 'Sporting Goods'.
Load Script with Wildmatch() Function
The wildmatch function is an extension of match() function in which we can use wildcards as
part of the strings used to match the values with values in the fields being searched for. We
search for the strings 'Off*','*ome*.
Cr
eating Table Box(Sheet Object)
For the above data lets create a Table Box which will show the data in a tabular form. Go to the
menu Layout -> New Sheet Object -> Table Box and choose the column as shown below.
Click Apply and then OK to finish creating the Table box. The below screen appears.
Using the Quick Chart Wizard
To start creating a bar chart we will use the quick chart wizard. On clicking it the below screen
appears which prompts for selecting the chart type. Choose bar Chart and click Next.
Choose the chart dimension
Choose Product Line as the First Dimension.
Choose the Chart Expression
The chart expression is used to apply the functions like Sum, Average or Count on the fields
with numeric values. We will apply the Sum function on the filed named Value. Click Next.
Choose the Chart Format
The Chart format defines the style and orientation of the chart. We choose the first option in each
category. Click Next.
The Bar Chart
The Bar chart appears as shown below. It shows the height of the field value for different product
lines.
A pie-chart is a representation of values as slices of a circle with different colours.
The slices are labeled and the numbers corresponding to each slice is also
represented in the chart. QlikView creates pie-chart using the chart wizard or chart
Sheet Object.
Using the Quick Chart Wizard
To start creating a Pie-chart we will use the quick chart wizard. On clicking it the below screen
appears which prompts for selecting the chart type. Choose Pie Chart and click Next.
Choose the chart dimension
Choose Product Line as the First Dimension.
Choose the Chart Expression
The chart expression is used to apply the functions like Sum, Average or Count on the fields
with numeric values. We will apply the Sum function on the filed named Value. Click Next.
Choose the Chart Format
The Chart format defines the style and orientation of the chart. We choose the third option. Click
Next.
The Pie Chart
The Bar chart appears as shown below. It shows the height of the field value for different product
lines.

You might also like