Course: Creating and publishing data sources in Tableau
In this course we look at creating, enriching and publishing a trusted and governed data source that we can use in our own analysis as well as making it available for other users to utilise too.
This Week’s Project
For this week’s project we’re going to be creating a curated data source that other users in our organisation can then use for their own analysis and reports. If you’re using Tableau in an organisation you may be responsible for managing particular data sources or providing access to that data to others within your organisation.
Creating a curated data source means you can create a consistent data source that users can trust is correct, of good quality and is governed so that their analysis is accurate. It also makes it easier for users to know which data to use and ensures everyone is using the same data in a consistent way for a given topic.
By doing most of the work up front to prepare, clean and size the data also saves your organisation time as the same steps don’t need to be done by each user every time the data is used or refreshed and you can provide regular updates
Throughout this project we’ll also take a look at connecting to various data sources within Tableau Desktop including:
Text/CSV Files.
Excel Files.
PDF Files.
JSON Files.
Databases such as Microsoft SQL Server.
Online databases such as Snowflake.
Cloud services such as Google Drive & Google Sheets.
Native connectors to systems such as Salesforce.
Using Web Data Connectors to connect to third party software or APIs.
In addition we’ll look at:
Combining data sources together using unions.
Adding additional data sources using data relationships/joins.
A template and checklist for preparing, cleaning and enriching your data source for your users.
Creating data extracts and filtering your data.
Sharing and publishing your curated data source within Tableau Desktop and Tableau Online/Tableau Server.
Connecting to and using your curated data source as a user in Tableau Desktop and Tableau Online/Tableau Server.
Required Data:
The files you need for this can be found in the Course Data Files folder for this project. You’ll need to download a copy of each of the files in this folder.
The Optional Data folder contains some data which you can use if you would like to follow along with the steps in that stage of the course where we look at connecting to databases and Google Drive. You can simply upload the Database Data into your database (if you have one to use) and copy the Google Drive Data into your Google Drive.
If you do wish to follow along with the steps that require connecting to a system/database and you don’t already have one here is where you can create each:
Microsoft SQL Server Database - You can either download and run the developer or express edition of SQL Server for free or take a free 30 day trial to SQL Server hosted in Microsoft Azure.
Snowflake Database - Offers a free 30 day trial, no credit card needed.
Google Drive - Free to use for individuals.
Salesforce - Developers can sign up for a free developer account which enables use of the Tableau Connector (note: free trials of the none developer edition do not enable use of the Tableau Connector).
Jira - Free for up to 10 users, once set up you can take a free 30 day trial of the Tableau Connector Pro for Jira in the Atlassian Market Place.
Tableau Online - You can get access to a your own free development instance of Tableau Online by joining the Tableau Developers Programme.
Please note: Most of these connectors are only available within Tableau Desktop and not Tableau Public, if using Tableau Public or you don’t have access to these systems you can still see the steps to connect to them throughout the course.
Solution
The curated data source that we’re going to create is going to cover our sales orders for the current calendar year, we’re also going to add in data on returns as well as the sales territories and sales people each customer and account is linked to. Once complete the curated data source will provide all the data our users need in order to analyse sales performance within our organisation.
The steps below take you through how to connect to each data source, combine and enhance your data and then prepare and publish as a curated data source.
Each step has detailed instructions the first time you come across a new item/function within the course, the accompanying videos then show you a demonstration of going through the steps.
Please Note: The videos in this coursedon’t contain any audio or voice over.
Section 1 - Connecting to local data files on your computer or a shared network drive
1. Connect to a single table of data from MS Excel
To start with we’re going to connect to data which is contained in a Microsoft Excel spreadsheet. This is very common requirement with Tableau, if you’re using Tableau at work you may connect to databases more often but it’s highly unlikely you’ll never have to connect to an Excel file!
Our data is contained in a file called Orders.xlsx which contains a sheet with the orders for each quarter within the year and another sheet which lists all the orders that have been returned.
Let’s begin by getting this data into Tableau.
Open Tableau Desktop or Tableau Public.
Click Connect to Data.
Under To a File click Microsoft Excel.
Navigate to the Orders.xlsx file and click Open.
Drag Q4 Orders from the left pane and drop it onto the canvas in the middle of the screen.
Tip: If you’re unfamiliar with the data source page and the different areas on it you can get an overview on the Tableau Help Data Source Page here.
You should now see a preview of the Q4 orders data at the bottom of the screen in the data grid as well as a summary of the columns/fields in your data in the meta data grid (depending on the version of Tableau you’re using you may need to toggle the meta data grid on to see it).
Getting data into Tableau is as simple as that, if we just wanted to analyse the Q4 orders data we could go ahead and click Sheet1 at the bottom of the screen and start analysing the data.
It’s more likely that you’d have to bring in more data, or supplement your data so we’ll take a look at doing that next.
2. Union data from the same file
The data that we’ve connected to so far shows us all our orders for the 4th quarter of the year (Q4), but we can also see on the left pane that there are sheets in our data for the first three quarters as well.
We’d like to also bring this data in so we can see how we’ve been doing across the entire year. We can do this by “unioning” these sheets together. A union just means we’re going to add (or append) some data to the bottom of another sheet, as long as the sheets contain the same amount of columns and the data in each column is in a consistent format across the then we can union them together.
A typical example of this might be when we want to add in data from previous time periods, or perhaps if we’re combining data from multiple regions, teams or business units.
Drag the Q3 Orders sheet from the list of sheets in the left pane over to the Q4 Orders table that’s already on your canvas from the previous step.
As you do this the Q4 Orders table should highlight in orange and a Union box should appear below it, drop Q3 Orders onto the Union box.
Repeat the same with both Q2 Orders and Q1 Orders.
Tip: Watch the video at the end of this section to see how to make the union box appear if it’s not clear when you try to do it.
Tableau has now added the Q3, Q2 and Q1 orders to your original data and If you scroll through the data grid now you should see that the table looks the same but it has data from all four quarters contained in it now.
To see what Tableau has done, or if you want to edit the union, you can double click the Q4 Orders table on your canvas and the title at the top should show that it’s made up of 4 tables. If you then right click on the table and click Edit Union… you can see what the union you created is made up of.
If you know at the start you want to union your data you can simply highlight each of the tables in the left pane (on windows click Q1 Orders, then hold the shift key and click Q4 Orders to select them all) and drag and drop them onto the canvas and Tableau will automatically union them for you.
You can also manually create a union using the New Union option, we rarely use this but it can be useful if you want to use any of the more advanced options which we’ll take a look at now:
Remove the Q4 Orders table from your canvas by dragging it off the canvas.
Drag the New Union from the bottom of the left pane and drop it onto the canvas.
Highlight all four of the Orders tables in the left pane and then drop them onto the Union pop up menu that should have appeared.
This now gives us the Union window that allows us to tell Tableau how we want to union our data. We’re going to use the Specific (manual) option but click on Wildcard (automatic) and we’ll take a quick look at what you can do in here.
The Wildcard (automatic) gives us some more advanced options, these are really useful if you need to regularly add data to your report and the files aren’t always supplied in a consistent way.
Notice that you can select which workbooks and sheets you want to include or exclude, and that you can also use a wildcard character. You may use this if you always get a file called Orders_Nov21 but the date can change, in this case you’d use the wildcard and enter Orders* to say get any file that starts Orders but can have anything after that.
You can also use the two Expand search check boxes to tell Tableau to search for files in any subfolders or the parent folder that your source file is in too!
You can read more about these advanced options in the Tableau Help article on unions here.
Click back to the Specific (manual) option.
Make sure you have all 4 tables in your list and that the Tables in union says 4, then click OK.
Notice that your table now says Union to indicate that it’s a union of your data? Let’s rename this to make it a bit clearer what it is for anyone that comes to use it in the future.
Right click the Union table in the middle of your data pane and click Rename.
Give it a more meaningful name such as Current Year Orders.
At the top of the screen click the name of the data source (it should say Union (Orders) and rename it to something more meaningful such as Sales Data.
We’ve now got a table of data with all of our orders and sales fata from each quarter of the year.
Quite often you’ll want to supplement your data with data from the same source so we’ll take a look at that in the next step by adding in the returns data.
3. Join data from the same source (multiple Excel Sheets)
In this step we’re going to add in our Returns data which is the final sheet in the Orders.xlsx file and will enable us to flag which orders were returned and decide if we want to keep them in the sales figures or not.
If you take a look at the Returns sheet you’ll notice that this is essentially a list of order IDs, whereas before we added rows to the bottom of our data using a union this time we want to essentially add the order ID column from the Returns sheet to our Orders data so we can see if each order had a return associated with it.
Combining data in this way is usually called a “Join” since you’re joining data together from two different sources. From version 2019.4 onwards Tableau introduced the concept of Data Relationships, these aren’t strictly the same as data joins however we use the term interchangeably in this course as they essentially do the same thing from a user perspective.
With a join in Tableau you explicitly tell Tableau how to join the two data sources together and it does so into one table. Relationships work in a very similar way in that you tell Tableau which fields to use to relate one data table to another, however Tableau doesn’t actually join your data together into one table, instead it keeps the original tables and only joins them when you pull data into a view.
The difference here might seem quite small but it was actually a significant change in the way Tableau works, previously you’d have to do a lot of work to remove duplicates from your joined data depending on how you were using it, whereas with the new data relationships it works a lot smoother and it’s much easier to ‘join’ data in Tableau now.
You can read more about Relationships in the Tableau help pages here.
Lets go ahead and join (or relate!) our Returns data to our Current Year Orders data:
Drag Returns from the list of Sheets on the left pane and drag it near to Current Year Orders on the canvas.
As you do this an orange ‘noodle’ should appear linking the two tables together, drop Returns anywhere while the orange noodle is visible.
Notice that Tableau has automatically worked out that the Order ID field exists in both tables and should therefore be used to join the data together (we’ll look at an example in a later step where Tableau couldn’t determine how to relate our data).
And that’s it done! Notice in the canvas that both tables still exists on their own, if you click on either you should see their respective data in the data grid when you click either of them. Rather than joining the two tables into one physical table Tableau has created a ‘relationship’ between the two tables and will only join the data when you start to use it and only if you bring on a field from both data sets.
We’ve now got all the data we need from our Orders file, next we’ll look at how we can add in data from different files.
4. Combine data from multiple MS Excel files
It’s pretty common to not have all the data you need in one file and need to bring data in from a number of files, Tableau doesn’t restrict you to using just one file and you can easily bring in and join data from multiple data sources or files in exactly the same way as above.
In this step we’re going to add our customer allocation and territory assignments data from our Sales Allocations.xlsx file. These will allow us to show which of our sales territories each of our customers, and therefore orders, are allocated too and then also see which sales people/teams each territory belongs to and allow us to do deeper analysis on our sales performance using this additional data.
This is really useful as it allows our data to be much more flexible. Imagine if we decided to re-allocate some of our customers to different territories, or perhaps a sales person left and was replaced by someone else. If we had all this data sat in one sheet we’d have to get that sheet updated with the new data and then load it back in again, by relating our data we can just load in the sheet that has changed and automatically update all of our data.
Let’s add in the new Sales Allocations.xlsx file and the sheets from within it:
From the data source page click Add at the top of the screen next to Connections to add a new data source.
Under To a File click Microsoft Excel.
Navigate to the Sales Allocations.xlsx file and click Open.
Drag Customer Allocations and drop it near Current Year Orders ensuring that the orange noodle connects to the Current Year Orders table and not the Returns one!
Notice when you drop it this time there’s a red warning on the connecting noodle? This is because Tableau couldn’t work out which fields to join our data on.
Click into the Relationships box at the bottom of the screen and then Select Customer ID from Current Year Orders list and Customer Account Number from the Customer Allocations list to join the data on.
The noodle should turn orange to indicate it has successfully joined the two data tables.
Repeat the same steps and this time drag Territory Assignments and drop it so that it joins to to Customer Allocations.
We’ve now got much more data to use in our analysis, as well as our orders and returns we can now see which sales territories, teams and divisions each customer is allocated too and therefore start to analyse our sales data from that perspective.
In the next step we’ll look at how we can connect to and add data from different data sources including other files such as CSV, Text or JSON files as well as directly from databases and systems plus other advanced options such as importing from PDF files and APIs.