Improve Tableau Performance By Data Optimization
“Why is my workbook taking so long to load? How do I improve my Tableau dashboard performance?”
As a data analytics consultant, one of the more frequent pains described to me by clients is slow performing dashboards. To troubleshoot and improve Tableau dashboard performance, first, we need to understand how Tableau creates the views.
The process from request to render a visualisation
Source: Designing Efficient Workbooks, Tableau Software
Put very simply, when a user requests a new view on Tableau Server, a query is sent to the data source. Tableau takes the data returned from the query to compute a model. This model is then back to the browser. Given this multi-step process to render a view, it’s no surprise that there isn’t a silver bullet solution to improving workbook performance. Instead, there are multiple areas we can investigate for optimising performance. The first of which is data source optimisation.
Your workbook will be slow if the incoming data is a mess! This is because the quality of the data will determine the type of queries from Tableau to the data source, the speed of query execution, and the type of data returned.
Here are my 6 tips for optimising your data source to improve Tableau performance:
1. Referential Integrity allows Join Culling
Imagine you’ve joined multiple tables in Tableau, but the query only requires data from one of the tables. Tableau has a backend functionality, join culling, that culls unnecessarily joins. In this scenario, no joins statements would be included in the query.
However, join culling only works if referential integrity is set, and it works more reliably with inner joins.
It’s best practice to set referential integrity in the database through using foreign keys, checking constraints with user-defined functions, or triggers. Tableau will also automatically join cull if referential integrity is pre-set in the database.
Otherwise, you can select the Assume Referential Integrity option from the Data menu. However, later method will be slower than the former as it only affects performance on Tableau’s end, and the results may be unreliable if the joins are not a pure star schema.
2. Set Indexes on joining/filtering dimensions
Indexes can significantly speed up the performance of queries by organising how information is stored in a database.
The fields you should consider indexing:
The fields frequently filtered on
Fields you use as context filter often (Recommended to set as the primary index)
The best type of index classification is dependent on the DBMS and data.
3. Parsing and Transforming
Having worked with JSON, XML or HTML data, I prefer to parse and clean my data prior to input into Tableau. This reduces the number of calculations required on Tableau’s side. However, if you need to perform string calculations in Tableau, REGEX is the fastest way to do it.
This includes transposing my data. Tableau prefers long over wide data and while this could be done in-platform using Pivot. However, I’ve found that it’s easier to group and control data transformations in preparation tools such as Tableau Prep and Alteryx.
4. Try to reduce complex calculations, especially LODs
Level of Detail calculations allow computation of values at different levels and granularity. However, with large and complicated data sets requiring views with statements calculating multiple granularities, they can be performance heavy.
If a LOD calculation is causing performance issues, one way around this is to set granularity IDs in your data source beforehand.
For example, using a data preparation tool such as Alteryx, I could replicate LOD calculations by sorting and then assigning a unique identifier using a multi-row formula.
5. Join on integers instead of strings
Given that INTs are calculated faster than strings, joins on INT type fields will be faster than string joins. Therefore, wherever possible, try to cast IDs data types to INT in the data source.
6. Cast your data types correctly in the data source
While it is possible to recast data types in Tableau (e.g. strings into dates, int), it’s best practice to cast them correctly in the data source. This is because when you re-cast a data type in Tableau, every query between Tableau and the data source will include the CAST function.