How to Join Database Tables with Tableau
Join Database Tables with Tableau
When trying to determine whether to join tables or use data blending, consider some of the following:
- Where your data is coming from
- How many connections you have in your workbook
- The number of records you have in the data
Most Tableau users aren’t database experts. This blog post introduces a fundamental database concept— joining tables. Seldom your data source includes every bit of information you need in a single table. Even if you normally connect to Excel it may be advantageous to use related data from more than one tab. As long as the data resides in a single spreadsheet or database and each table includes unique identifiers that tie the tables or tabs together, you can perform joins of these tables within Tableau. These identifiers are called Key Records.
Database joins can be complex, but the basic principle is to bring together related information in your view. In Tableau , you can define joins when you make your initial data connection or add them later. This example will use the Orders and Return tabs (tables) from the Superstore sample data set. Figure 2.8 shows portions of both tables. The Orders table includes billing information. The Returns tab includes the smaller returned order table.
Start by connecting to the spreadsheet as you would if you were going to connect to one table. In the Connection Menu under Step 2, select Multiple Tables and click the Add Table button to expose the Add Table menu. Then select the Returns table as seen in the figure
Super store orders and return tables :
Joining multiple tables :
While in the Add Table menu, ensure that the Returns table is highlighted and click the Join button. This will expose the menu in which you define the join type as shown in figure.
In the example, you see that the Left outer join type has been selected. If you preview the results you will see that the join will result in 9,426. Following these steps results in a left outer join between the Orders and Returns tables. Keep in mind that you can also join additional tables later, just by pointing at the data source on your data shelf, right-clicking, and selecting the Edit Tables option. Using different join types can result in different record counts, so it is important that you understand the different join types.
Joining tables in Tableau :
Many relational data sources are made up of a collection of tables that are related by specific fields (columns). In order to analyze two tables together, you can join the two tables using a common field. When you combine tables using a join, you can view and use the data from both tables in your analysis.
The Default Inner Join when you join two tables together, Tableau will default to the inner join type. The below diagram illustrates the inner join. An inner join returns data if a given value for the primary key exists in both tables. Unless you have the query pre-built, Tableau recommends that you use the canvas area on the Data Source page to create inner joins.
Using an inner join returns only records that match in both the left and right tables. In the Superstore example, this join type returns only ninety-eight records. It is a good practice when you join tables to know how many records are in each table. If you’re working with a spreadsheet you can look at each tab and note the total row counts in each. Remember to deduct the header from your row totals. Alternatively, as you are doing the join, utilize the preview buttons to check the row counts.
- If the relationship between the tables is one-to-many (1:n) or many-to-many (n:n), the measures might not appear to double in Tableau. This is a normal consequence of a join where the relationship is not always one-to-one.
- Count Distinct aggregation type can be helpful when building views after this type of join
(Measure > Count (Distinct)).
The Inner join :
When you complete the join, you can also drag the record count field into the view to see how many total records are available. You can have more than one join clause to ensure that the correct results are returned. If you’re a database expert, this won’t present any challenge. If you are like many Tableau users you are probably not accustomed to creating joins. If you run into problems, ask for help from a database expert.
The Left and Right Join Types
Tableau provides two other join types via point and click options in the Join menu. These join types give priority to either the left table or the right in the set returned.
Pick the primary table first. In the previous example, the primary table is the Orders table so it is considered as the left table. The new table added in the join is the Returns table on the right. Selecting left gives priority to the original table. Selecting right gives priority to the new table. The below diagram represents left outer join type.
The Left outer join :
In the example, the left join returns every record in the orders table plus the matching records in the returns table. Earlier you saw that join generated over nine thousand records which are being returned. The right join gives priority to the right returns table in below diagram
The Right outer join :
Since there are fewer rows in the returns table the number of records will drop significantly and only include details from matching records in orders. If you preview results when using left and right joins you may see a lot of null fields in yellow. Or, if you check the record counts and place the key record that you use in the join on your row shelf, you will see the word null appear whenever a record exists in the primary table that is missing in the joined table.
In Superstore, a right join would result in 1,673 records being returned, but only 98 of those records will be matched to the orders table. The remaining 1,573 records will return null. These are the order records in the order table that have no matching record in the returns table.
Customizing Tableau’s Join Script
So far, all of the joins we’ve covered have been enabled via point and click menus. There may be times when your data needs require something other than the inner, left, or right outer join types. You may need to ask very specific questions of your data, and sometimes providing the answers might require a customized join.
For most relational data sources you can connect to a specific query rather than the entire data source. Often this can be useful when you know exactly the information you need and you understand how to write SQL queries. For those of you who have never written any SQL statements before, this may seem daunting. There are ways to edit Tableau’s default connection script to minimize manual coding. Using the same Superstore data, what if you needed to analyze the earliest sales date by the customer, but generalize that analysis by year? One possible way to express this is as in below diagram
Earliest initial sales over time :
The bar-in-bar chart shown in Figure 2.14 uses color to indicate the sales year in which customers purchased from the Superstore for the first time. The horizontal axis shows all sales by year. The dark gray bar denotes sales to customers who made their first purchase in 2010. The orange bar aggregates sales for customers who first purchased in 2012. Notice that the connection in the data window says Custom SQL. The data is coming from the Superstore data set, but the number of fields available has been restricted. Also, there is a dimension field called Original Order Date that does not exist in the Superstore data set. This was created by customizing the connection script.
You can let Tableau do some of the work by connecting to Superstore as you normally would, and then editing Tableau’s generated connection script. Below diagram shows the related menus.
Don’t try writing your SQL statements using the tiny space available in the workbook connection window. Instead, look at the small “…” button that is located in the Step 2 “selections to the right of the Custom SQL radio button”. Click on that to expose a more spacious editing window and manually edit the connection script there. Alternatively, you can copy the script into any text editor (Notepad or Notepad + +), make revisions, and paste the completed code back into Tableau. Below diagram shows larger view of Tableau’s custom SQL window.
The connection and custom SQL windows :
For most relational data sources you can connect to a specific query rather than the entire data source. Often this can be useful when you know exactly the information you need and you understand how to write SQL queries.
Editing the connection script :
The top select statement is the script generated by Tableau. Removing the lines related to fields that are not needed removes those fields from the result that is returned to Tableau. The customized script starts with the inner join statement and continues till the end. If you are interested in learning more about SQL scripting, refer to the reading references on the book’s website.