When to Use a Direct Connection or a Data Extract in Tableau
Knowing When to Use a Direct Connection or a Data Extract
Direct connections allow you to work with live data. When you extract data you import some or all of your data into the Tableau’s data engine. This is true in Tableau Desktop and Server. Which connection method is the best to use? There is no straightforward answer for this question. It is entirely dependent on your situation, requirements, and network resources.
Some General Things to Keep in Mind:
- If you hide fields in Tableau after creating an extract and before creating an incremental or full refresh, the fields will continue to be hidden but are not physically removed from the extract. The hidden fields can be unhidden at any time.
- Tableau Desktop uses the location pointed to by the system’s %TEMP% environment variable for all temporary files used in the process of creating extracts. Tableau Server uses the folder :\ProgramData\Tableau\Tableau Server\data\tabsvc\temp.
- Tableau data extracts attempt to retain the case sensitivity of the source data. For instance, string comparisons are by default case insensitive in SQL Server but case sensitive in Oracle. Extracts based on those data sources would match their respective case sensitivity for the purpose of making string comparisons.
- A user with the permission to schedule an extract refresh can change the type of the refresh from incremental to full. To prevent users from being able to schedule full extract refreshes during business hours, make sure to configure your schedules accordingly.
- A Tableau extract written to a contiguous area of free space will be more performant than one written in fragments across a hard drive. Regularly defragmenting a Tableau Server’s hard drive or using SSD drives are two ways to insure that extract files are contiguous to the extent possible.
The Flexibility of Direct Connections
Connecting to your data source with a direct connection means you are always visualizing the most up-to-date facts. If your database is being updated in real-time you only need to refresh the Tableau visualization via the F5 function key or by right-clicking on the data source in the data window and selecting the Refresh option.
If you connect to a massive data, the visualization is very dense, or your data is in a high-performance enterprise-class database; you may get faster response time with a direct connection . Choosing a direct connection doesn’t preclude the possibility of extracting the data later. You can also swap from an extract to a live connection by right-clicking the data source and un-checking the Use Extract option.
The Advantages of a Data Extract
Data extracts don’t have the advantage of real-time updating that a direct connection provides, but using Tableau’s data engine provides a number of benefits:
- Performance improvement : Data extraction not only offers increased performance when the underlying data source is unacceptably slow, but it also can speed up the performance when the use of Custom SQL slows it down.
- Additional functions
- Data portability : A TDE can be bundled with Tableau visualizations in a packaged workbook for easy sharing and collaboration.
Perhaps your primary database is already heavily loaded with requests. Using Tableau’s data engine enables you to split the load from your primary database server to the Tableau Server. The Tableau’s extract may be updated daily, weekly, or monthly during off-peak hours. The Tableau’s Server can also refresh extracts incrementally and in time intervals as low as fifteen minutes. In many cases, the small time consumed during the data extract update is more than offset by the performance gains.
There are several options available for creating an extract. First, you can aggregate the extract, which will roll up rows so that only the aggregation and fields used are updated for the visible dimensions and measures. Aggregating for Visible Dimensions when performing a data extract will reduce the amount of data that Tableau is importing. The appropriate level of fidelity is provided, but the size of the extract file is significantly reduced— making the extract file more portable as well as improves security.
Extracting incrementally also speeds refresh time because Tableau isn’t updating the entire extract file. It is adding only new records. To do incremental extracts you must specify a field to use as the index ; Tableau will only refresh the row if the index has changed, so you need to be aware that changes to a row of data that don’t change the index field to be excluded from the update.
Another way to speed extracts is to apply filters while extracting the data. If the analysis doesn’t require your entire data set you can filter the extract to include only the records required. If you have a very large data set you will rarely need to extract the entire contents of the database. For example, your database may include ten years of historical data but you may only require one year of history.
Once you have created an extract you may append another file. This may be a great alternative to custom SQL if you are considering a table UNION. This technique might be useful if you need to combine monthly data that is stored in separate tables.
If your data source is from a file (Excel, Access, text), doing an extract. It will add calculation functions (median and count distinct) that are not supported by the data source.
Extracts can be saved locally and used when the connection to your data source is not available. A direct connection doesn’t work if you don’t have access to your data source via a local network or the Internet. Perhaps you need to supply a dashboard to an executive that will be flying to a remote location. Providing that person with a data extract (.tde) file provides that person with a fully-functional, high-performance, datasource. Data extract files are also compressed and are normally much smaller than the host system database tables.
In enterprise environments, data governance is an important consideration. If you distribute many data extract files to field staff, keep in mind that you should consider the security of those files. Appropriate safeguards should be in place (non-disclosure agreements) before you provide these files to travelling or remote staff. Consider restricting what the extract includes via filters and aggregating to visible dimensions.
Using Tableau’s File Types Effectively
You can generate quite a few file types when using Tableau.
Tableau provides flexible options for the sharing of data and to design metadata. This is accomplished through a variety of file types:
- Tableau Workbook (. twb)
- Tableau Packaged Workbook (. twbx)
- Tableau Datasource (. tds)
- Tableau Bookmark (. twb)
- Tableau Data Extract (. tde)
You should see many of these files in your My Tableau repository folder which is normally located in My Documents. Data extract (. tde) files were covered in the previous section. Here, you will see how the other file types can be used.
TABLEAU WORKBOOK FILES
This file type is probably the most common that you will see and create when working with Tableau. It is in XML format (try editing it in a text editor) and contains all the information on each sheet and dashboard that is contained within your workbook. Information such as what fields are being used in each view and how measures are being aggregated, the formatting and styles applied and any other setup you’ve made to a sheet or dashboard (e.g. whether a quick filter is shown). It also includes data source connection information and any metadata you have created for that connection (see more on this below under .tds).
To create a .twb file, from Tableau Desktop, select File > Save
Tableau Workbook files (. twb ) are the main file type created by Tableau to save your entire workbook. These are normally small files because the only data they contain is the metadata related to your connection and the pill placements for rendering the views and dashboards in the workbook. What is not saved is the underlying data from the data source.
To clarify: A .twb file does not contain any of the actual data from the database. It contains the definition of how you wish to display the data. This means workbook files will normally be very small. But, if you want to share the workbook with other people you need to be certain that they have access rights to the database or that you also provide the data source with the workbook.
TABLEAU PACKAGED WORKBOOKS
Whilst a Tableau Workbook (.twb) as described above holds all the information, Tableau requires to draw your viz, it does not include the data itself. A Packaged Workbook however, combines the information in a workbook and bundles it with any local data – i.e. data that is not on a server. You can think of it as a zip file, and indeed if you rename the .twbx file as a .zip, you can open it with windows to see the .twb and the corresponding data files. A .twbx will also include any custom images, as well as any custom geocoding you may have used in your work.
The primary reason you would save your work as a Packaged Workbook is so that you can share it with other Tableau Desktop users, or for others to open using Tableau Reader.
To create a .twbx file, from Tableau Desktop, select File > Save As and then select the .twbx option from the dropdown menu at the bottom of the Save As dialogue box
If you want to share a workbook with a colleague that doesn’t have access to the data source to create the workbook, you can still share the file by saving it as a packaged workbook. People without a license to Tableau Server or Tableau Desktop can also access packaged workbooks using Tableau Reader.
Packaged workbooks (. twbx) bundle the data and metadata into a single file. If you later need to access the original data source file contained within the packaged workbook, point at the file, right-click, and select the Unpackage menu option.
TABLEAU DATASOURCE FILES
When you connect to your data for the fist time, you may have a little bit of data ‘modelling’ to do – setting the right data types, changing default aggregations, setting default colours, creating some custom calculated fields, etc. You are giving Tableau information about the data you will be using – you are setting up its ‘metadata’. When you want to connect to this data again, you don’t want to really go through all this data modelling a second time, so instead you can save your metadata as a .tds file (again, it is saved in XML format) and connect to your data though this file instead. You could also distribute this file so that your colleagues have access to the nice formatting and custom fields you have worked to set up.
The Tableau is clever enough to pick up new columns/fields in the data source if they appear and column ordering does not matter but if column names change or disappear completely, you will need to reconfigure.
To create a .tds file, from Tableau Desktop, right click on your data source connection and select Add to Saved Data Sources. Alternatively you can publish the .tds to Tableau Server by right clicking and selecting Publish to Server instead.
Changes made within your data window (the left side of your workbook) alter the metadata of your connection. Grouping, sets, aliased names, field-type changes, and any other modifications made in your workbook are part of the metadata. Can you share just the metadata with others? The answer is yes. This is done by creating a Tableau Datasource (. tds) file.
A Tableau datasource file defines where the source data is, how to connect to it, what field names have been changed, and other changes applied in the dimensions and measures shelves. Data source files can be saved locally or published to Tableau Server. This is particularly helpful if you work in a large enterprise. Perhaps you have a very small number of database experts that understand your database schema as well. They can create the connection, define table joins, group or rename fields, and then publish the data source file for less experienced staff to use as a starting point.
TABLEAU BOOKMARK FILES
A slightly lesser known Tableau file type is the Tableau bookmark. This file is a bit like an export of one single worksheet, which you can then import into another workbook to save you recreating the view from scratch. Tableau 8.1 introduces functionality to help copy and paste worksheets from one workbook to another, so this file type may become used less, but it can still be handy if you regularly use a particular view in many of the workbooks you create (a header page or appendix, for example).
To create a .tbm file, click Window > Bookmark > Create Bookmark. To reuse a bookmark, clcik Window > [bookmark name]. Note that you cannot create a bookmark from a dashboard page.
What if you have a massive workbook (with many worksheets) and you want to share one worksheet only with a colleague? This is done by using a bookmark (. tbm) file. Bookmark files, saves the data and metadata related to a worksheet within your workbook— including the connection and calculated fields.
To create a bookmark file go to the Windows menu bar, and look for the Bookmark menu option and select Create Bookmark.
The bookmark will become visible when a new Tableau session is started. The file will appear in the Windows menu. Opening the bookmark file will initiate the connection and add it to the workbook. Tableau bookmark files are stored in your Tableau Repository in the Bookmarks folder.