How to Blend Different Data sources in a Single Worksheet
Blending Different Data sources in a Single Worksheet
Beginning with Tableau 6.0, you can use multiple data sources for a single worksheet. One data source is primary; all others are secondary. For the primary resource, Tableau selects the data source of the first field you add to the view. Or, if one of your data sources is a cube, it must serve as the primary data source. The primary data source creates relationships with fields of the same name and data type that it finds in the other data sources.
If you need to use data from more than one data source, Tableau provides a solution which does not require building a middle-layer data repository. As long as the disparate data sources have at least one common field, Tableau facilitates using both sources via data blending.
How to Create a Data Blend
When your analysis requires data from multiple data sources, if one data source is large and takes a long time to query, using a join to link this table with another table can dramatically increase the query time.
One approach to reducing query time is by using data blending to blend only on the summary data of the larger table with the data of the smaller table.
Creating data blends requires a little planning. If you are going to bring data that doesn’t reside in your primary data source, you have to think about what field(s) you may need in order to achieve the desired result. There are two ways you can create a blend— the automatic method or manually defining the blend.
Tableau automatically recognizes when a field from the primary data source also exists in a secondary data source. These fields are marked with a gray broken link icon in the Data pane to indicate that they are potential linking fields. When you click the broken link icon, the link becomes orange and is connected. An automatic relationship is created between the two data sources. The relationship allows you to blend data from both data sources on a single sheet. You must have a linked field in order to use the data from the secondary data source.
You can modify the automatic relationships or create new custom relationships by selecting Data > Edit Relationships.
The automatic method works well if the field you are employing to create the blend has the same field name in both data sources, or if you alias the field names in Tableau so that they match. The Superstore datasource contains geographic sales data. The Superstore data set doesn’t include population data. But, the United States Census Bureau website has population data.
Population data :
Just two columns of data are included in the table. It is important to note the field description for the state. Once again— for automatic blending to work— the field name for the blend must be the same, in Superstore and the census data file. If the fields are not the same you will need to edit the name in the spreadsheet or rename the fields in Tableau so that they match.
To automatically blend the population data with the Superstore data builds a view in Tableau that contains the state field.
Superstore is the primary data source. The bar chart is filtered for the desired year. The population data is from a completely different data source, but both data sources include the word State. Automatic blending can now be done by pointing at the population data spreadsheet and dragging it into the worksheet shown in below diagram. Once that is done, the data from the population spreadsheet can be used in the workbook. The visualization uses the blended population data to express sales per hundred thousand population by state.
Look at the data window in the upper left of below diagram. The blue check next to the Superstore data source indicates that it is the primary data source. The orange check next to the population data denotes it is the secondary data source. Since the secondary source is highlighted, you see its dimensions and measure fields below. The orange border on the left side of the dimensions and measures shelves confirms that they come from the secondary data source and the orange link to the right of the State field indicates that the field is used for the blend.
Sales by state :
Population data blended with Superstore :
When you perform data blending you must ensure that all of the records you expected to blend actually must come into the data set. The states of Massachusetts (MA) and Missouri (MO) didn’t come over in the blend because the state names in the census data are not abbreviated. This can be fixed by right-clicking on the abbreviated state label for Missouri and Massachusetts and aliasing full spelling of each state name. After that is done, the population data from those states will be blended as well.
This is an important point with data blending. As a “designer” you must ensure the integrity of the data blend. The whole point in doing this exercise is to use the blended data to calculate per capita sales by state.
To save space, below diagram shows only the top seven states by per capita sales. The labels to the right of each bar show the sales per hundred thousand people. The color of each bar encodes the total sales of each state.
A scenario that requires a more complicated blend would be the comparison of the budget data from a spreadsheet with actual data from a database. Assume that you have defined a budget by product category for each month in the year 2012, and that you want to create a visualization that will display the actual and the budgeted sales by month. Building this view will require a blend on the product category and the date field. The steps required are:
- Connect to both data sources.
- Use the edit relationship meant to define the blend.
- Build the visualization.
Bar chart using blended data :
After connecting to the Superstore data set and the spreadsheet containing budgeted sales, it is possible to define the blend manually. The blending must include both the product category field and a date field. In this example, month and year are used.
Actual sales data from the primary data source (the orders table in Superstore) is displayed as blue or gray bars. Budgeted data from the secondary data source is plotted using vertical black reference lines for each cell. Notice the two orange links in the dimension shelf for the budget data source. Both fields are being used in the blend.
Bullet graph using blended data :
A bullet graph is a variation of a bar graph developed to replace dashboard gauges and meters. The bullet graph is generally used to compare a primary measure to one or more other measures in the context of qualitative ranges of performance such as poor, satisfactory, and good. You can create a bullet graph by adding two reference lines: a distribution to indicate the qualitative ranges of performance, and a line to indicate the target. Reference distributions can also be used to create bullet graphs.
The edit relationships menu :
Select the Edit Relationships option from the data menu. This exposes the relationships window. By default, the automatic radio button option will be selected. Product category will appear automatically because that field name exists in both data sources. Since the view contains sales data by month and year for the year 2012, the custom option must be used to select the date field . Clicking the Add button exposes the add/ edit field mapping window where the specific data aggregation can be selected from each data source. Clicking the OK button creates the second link. Confirming that, in the relationship window, finalizes the links for both fields.
The SUM (budget) pill (with the orange check mark on the marks card) was used to create the black reference line. The calculated field used to create the bar colors is displayed in the caption below the graph and is stored in the primary data source. Gray bars denote items above the budget plan. The gray color gradient behind the sales bars comes from a reference distribution that uses color hue to show sales at 60 percent, 80 percent, 100 percent, and 120 percent of planned sales.