How to Create Visual Analytics with Tableau Desktop
The seeds for Tableau were planted in the early 1970s, when IBM invented Structured Query Language (SQL) and later in 1981 when the spreadsheet became the killer application of the personal computer. Data creation and analysis fundamentally changed for the better. Our ability to create and store data increased exponentially.
The business information (BI) industry was created with this wave; each vendor providing a product “stack” based on some variant of SQL. The pioneering companies invented foundational technologies and developed sound methods for collecting and storing data. Recently, a new generation of NOSQL2 (Not Only SQL) databases are enabling web properties like Facebook to mine massive, multi-pet byte 3 data streams.
Deploying these systems can take years. Data today resides in many different proprietary databases and may also need to be collected from external sources. The traditional leaders in the BI industry have created reporting tools that focus on rendering data from their proprietary products. Performing analysis and building reports with these tools requires technical expertise and time. The people with the technical chops to master them are product specialists that don’t always know the best way to present the information.
The scale, velocity and scope of data today demands reporting tools that deploy quickly. They must be suitable for non-technical users to master. They should connect to a wide variety of data sources. And, the tools need to guide us to use the best techniques known for rendering the data into information.
The Shortcomings of Traditional Information Analysis
Entities are having difficulty in getting widespread usage of traditional BI tools. A recent study by the business application research (BARC, 2009) had reported that adoption rates are surprisingly low.
In any given BI using organization just over 8 percent of employees are actually using BI tools. Even in industries that have aggressively adopted BI tools (e.g., wholesales, banking, and retail), usage barely exceeds 11 percent.
In other words, 92 percent of the people that have traditional BI tools, don’t use them. The BARC Survey noted, these might be some of the causes:
- The tools are too difficult to learn and use.
- Technical experts were needed to create reports.
- The turn around time for reports is too long.
Companies that have invested millions of dollars in BI systems are using spreadsheets for data analysis and reporting. When BI system reports are received, traditional tools often employ inappropriate visualization methods. Stephen Few has written several books that illuminate the problem and provides examples of data visualization techniques that adhere to best practices. Stephen also provides examples of inappropriate visualizations provided by legacy vendor tools. It turns out that the skills required to design and build database products are different from the skills needed to create dashboards that effectively communicate. The BARC study clearly indicates that this IT-centric control model has failed to deliver compelling answers that attract users.
People want to make informed decisions with reliable information. They need timely reports that present the evidence to support their decisions. They want to connect with a variety of data sources, and they don’t know the best ways to visualize data. Ideally, the tool used should automatically present the information using the best practices.
The Business Case for Visual Analysis
Whether the entity seeks profits or engages in non-profit activities, all enterprises use data to monitor operations and perform analysis. Insights gleaned from the reports and analysis are then used to maintain efficiency, pursue opportunity, and prevent negative outcomes. Supporting this infrastructure (from the perspective of the information consumer) includes three kinds of data.
Three Kinds of Data that Exist in Every Entity
Reports, analysis and ad hoc discovery are used to express three basic kinds of data.
- Known Data (type 1)
Encompassed in daily, weekly, and monthly reports that are used for monitoring activity, these reports provide the basic context used to inform discussion and frame questions. Type 1 reports aren’t intended to answer questions. Their purpose is to provide visibility of operations.
- Data you know you need to know (type 2)
Once patterns and outliers emerge in type 1 data, the question that naturally follows is: Why is this happening? People need to understand the cause of the outliers so that the action can be taken. Traditional reporting tools provide a good framework to answer this type of query as long as the question is anticipated in the design of the report.
- Data you don’t know you need to know (type 3)
By interacting with data in real-time while using appropriate visual analytics, Tableau provides the possibility of seeing patterns and outliers that are not visible in type 1 and type 2 reports. The process of interacting with granular data yields different questions that can lead to new actionable insights. Software that enables quick-iterative analysis and reporting is becoming a necessary element of effective business information systems.
Distributing type 1 report in a timely manner is important, but speed in the design and build stage of type 1 reports is also important when a new type 1 report is created. To effectively enable type 2 and 3 analyses, the reporting tool must adapt quickly to ad hoc queries and present the data in intuitively understandable ways.
How Visual Analytics Improves Decision-Making
Rendering data accurately with appropriate visual analytics reduces the time required to achieve understanding. Review the following examples to see how visual analytics can reduce the time to insight. The goal of these reports is to provide sales analysis by region, product, category and product sub-category.
Figure 1.1 presents data using a grid number (crosstab) and pie charts. Crosstabs are useful for finding specific values. Pie charts are intended to show one-to-many comparisons of dimensions. The pie charts compare sales by product sub-category.
Figure 1.1: Sales Mix analysis using a crosstab and pie charts
Crosstabs are not the most effective way to make one-to-many comparisons or identify outliers . Pie charts are commonly used for comparisons but are one of the least effective ways to compare values across dimensions. It is difficult to make precise comparisons especially between slices, and even more so when there are many slices.
Figure 1.2 employs a bar chart and heat map to convey the same information. Bar charts provide a better means for comparing product sub-categories. The heat map on the right provides total sales for each category. The gray scale color range highlights the high and low selling product sub-categories. The color encoding in the bar chart provides additional information on profit ratio. Reference lines in the bar chart display the average sales for all product sub-categories within each region.
Clearly the bar chart and heat map communicate the sales values more quickly while adding profit ratio information with the use of color. The reference lines within each region, and product category provide average sales values. One could argue that the bar chart doesn’t communicate the details available in the crosstab, but in Figure 1.3 those details and more are provided via Tooltips that pop out when you point your mouse at a mark.
Appropriate visual analytics improve decision-making by making it easier to see summary trends and outliers without sacrificing desired details by making those details available on demand.
Figure 1.2: Sales mix analysis using a bar chart and a heat map
Figure 1.3: Adding labels and Tooltips
Turning Data into Information with Visual Analytics
Data that is overly summarized loses its ability to inform. When it’s too detailed, rapid interpretation of the data is compromised. Visual analytics bridges this gap by providing the right style of data visualization and detail about the situational need. The ideal analysis and reporting tool should possess the following attributes:
- Simplicity— is easy for non-technical users to master.
- Connectivity— seamlessly connects to a large variety of data sources.
- Visual Competence— Provide appropriate graphics by default.
- Sharing— Facilitate sharing of insight.
- Scale— Handle large data sets.
Traditional BI reporting solutions aren’t adapted for the variety of data sources available today. Analysis and reporting can’t occur in these tools until the architecture is created within the proprietary product stack. Tableau Software was designed to address these needs.
The Tableau Software Ecosystem
Tableau’s product line includes desktop design and analysis tools for creating and consuming data. For larger deployments, Tableau Server permits information consumers to access reports in a secure environment without the need to load software. Reports are consumed in Tableau Server via a web browser. Tableau Server also enables reports to be consumed on IOS or Android tablet computers. Tableau Public is a free tool that facilitates sharing public data on the web via blogs or WebPages. For those that want a hosted solution, Tableau Public Premium is a fee-based service that uses the same technology as Tableau Public in a private consumption environment.
Tableau Desktop and Tableau Reader
Desktop is the design tool for creating visual analytics and dashboards. There are two versions: Personal Edition and Professional Edition. Professional Edition is more popular because it connects to a wider variety of data sources than Personal Edition. Less common data sources can be accessed via the Open Database Connectivity (ODBC) standard.
Table 1.1 displays the available connections arranged by the type of data source. Personal Edition only connects to local files.
Table 1.1: Data sources Accessible to Tableau Desktop
Tableau Desktop is licensed by a named-user. Tableau allows you to reassign licenses and also permits you to install Tableau Desktop on multiple computers as long as the named-user is the only person with access to them.
Tableau also permits you to share content with another desktop tool. Tableau Reader is a free version that allows users to consume Tableau Desktop reports without the need for a paid license. The only requirement is that the Tableau report must be saved as a packaged workbook.
You can save and share data using a variety of different file types. The differences between each file type relates to the amount and type of information being stored in the file. Table 1.2 summarizes different Tableau file types.
Table 1.2: Tableau File Types
When you save your work on the desktop, the default save method creates a workbook (twb) file. If you need to share your work with people that don’t have a Tableau Desktop license or don’t have access to the data source you can save your work as a packaged workbook (twbx) by using the Save As option while saving your file.
Tableau Data sources (tds) are useful when you frequently connect to a particular data source or you have edited the metadata associated with that data source in some way (renaming or grouping fields for example). Using saved datasources reduces the time required to connect to the data.
Tableau Bookmarks (tbm) allow you to share a single worksheet from your workbook with others. To create a bookmark file, access the main file menu window/ bookmark/ create bookmark option.
Tableau Data Extracts (tde) leverage Tableau’s proprietary data engine. When you create an extract, your data is compressed. If your data source is from a file (Excel, Access, text) Data Extracts add formula functions that don’t exist in those sources—including count distinct and median. If you are publishing workbooks via Tableau Server, Data Extracts provide an effective way to separate the analytical load Tableau generates from your source database.
If you produce a large number of workbooks that have to be updated regularly or when you have a large number of people consuming your work, Tableau Server will save your time. Server allows people to view and interact with your work via a web browser. Server will also automatically refresh data extracts that have been published to Tableau Server.
The server is licensed in two ways: by named-user or by core licensing. Named-user licensing makes sense in smaller deployments when fewer than 150 people need access to Tableau Reports. In larger deployments with dynamic access requirements, core licensing is more cost-effective and reduces administrative time because the license is defined by the number of cores in your database server’s processor.
Tableau Server provides enhanced security and permits users to customize their access to reports within boundaries defined by the server administrator. Tableau Server’s interface provides users with tools for finding, organizing, and commenting on reports. The server enables users to create subscriptions that provide e-mail notification when updated reports are published. It also provides administrators with the ability to monitor access and monitor system performance. Details regarding installation, access, and administration will be covered in Chapters 9 and 10.
Tableau Public is a free hosted web service that can be used to publish Tableau Reports on the web. It supports Commonly used content management systems like Word Press, Tumblr, and TypePad. Tableau’s licensed desktop editions can also publish content to Tableau Public. Tableau also offers a free Public desktop edition for creating and publishing reports. Tableau Public has the following limitations:
- Tableau Desktop only connects to Microsoft Access, Excel, or text files.
- Your work can only be saved to Tableau’s public Server.
- Storage space on Tableau Public is limited to 50 megabytes per named-user.
- Datasource size is limited to 100,000 records.
- Workbooks saved on Tableau Public can be viewed and downloaded by anyone.
For these reasons, Tableau Public is an ideal way for hobbyists and bloggers to create and share interactive visualizations on the web. But, it is not a substitute for full desktop or server licensing.
Tableau Public Premium
The premium edition is a fee-based service that permits subscribers to protect the confidentiality of their data by blocking the ability for information consumers to download source workbook files. Subscriber fees are based on the customized record limits and storage limits. For entities that do not have the resources or desire to manage their own instance of Tableau Server, Tableau Public Premium offers a cost effective way to share proprietary data over the web and maintain security over the source data set used to create the visualizations.
Recommended Hardware Configuration
Tableau provides minimum hardware specifications on their website, which are presented below. Analysts that build reports should have better equipment. More internal memory will have a significant positive effect on speed.
Install 4 to 8 megabytes of internal memory for the best performance. Tableau’s rendering engine will take advantage of modern graphics cards as well. Solid-state disk drives outperform physical hard disks. But, don’t outfit your Report-building analysts with state-of-the-art equipment if the majority of your user base is using 4 year-old junk. What performs well on a well-appointed computer may not be as enjoyable an experience on a dated system.
- Microsoft Windows 7, Vista, XP, Server 2008, Server 2003 (on x86 or x64 chipsets), or Microsoft Windows 8.
- 32-bit or 64-bit versions of Windows.
- Minimum of an Intel Pentium 4 or AMD Opteron processor.
- 250 megabytes minimum free disk space. 32-bit color depth recommended.
- Note: Internet Explorer is not supported.
At the time of this writing (January 2013) Tableau did not support Apple operating systems. Many people successfully use Apple products to run Tableau by running a virtual Windows environment on their laptop. Apple’s Boot Camp provides a means to run Windows on a MacBook. Other commercial products such as VMware Fusion or Parallels Desktop can be used to run Tableau on a MacBook as well.
The tableau is believed to be planning a desktop Mac OSX version, but there have been no official statements from the company regarding release dates.
- Microsoft Windows Server 2008, 2008R2, 2003 SP1, or higher; Windows 7 or x86 or x64 chipsets; or Microsoft Windows 8
- 32-bit or 64-bit version of Windows
- Minimum of a Pentium 4 or AMD Opteron processor
- 32-bit color depth recommended
- Internet Protocol version 4 (IPv4)
Very Small Deployments (proof of concepts, initial evaluations, 1-2 users)
- Dual-core 2.0 GHz or higher, minimum recommended CPU
- 4.0 gigabytes minimum system memory
- 2.5 gigabytes minimum free disk space
Small Deployments (less than 25 users)
- Quad-core, 2.0 GHz or higher, minimum recommended CPU
- 8 gigabytes minimum system memory
- 5 gigabytes minimum free disk space
Medium Deployments (less than 100 users)
- Two Quad-core, 2.0 GHz or higher, minimum recommended CPU
- 32 gigabytes minimum system memory
- 50 gigabytes minimum free disk space
Large Enterprise Deployments
Many factors affect the sizing and configuration of hardware for large enterprise deployments. The number of concurrent users, demand patterns and network infrastructure must all be considered. Server licenses can be deployed over multiple hardware boxes to ensure good response times. You should consult your Tableau representative for better configuration options.