Course: Creating a curated data source in Tableau
This week we look at preparing and enriching data for our sales orders so we can create and share a trusted and governed curated data source that our users can use confidently in their own analysis.
This Week’s Project
For this week’s project we’re going to be creating a curated data source that other users in our organisation can then use for their own analysis and reports. If you’re using Tableau in an organisation you may be responsible for managing particular data sources or providing access to that data to others within your organisation.
Creating a curated data source means you can create a consistent data source that users can trust is correct, of good quality and is governed so that their analysis is accurate. It also makes it easier for users to know which data to use and ensures everyone is using the same data in a consistent way for a given topic.
By doing most of the work up front to prepare, clean and size the data also saves your organisation time as the same steps don’t need to be done by each user every time the data is used or refreshed and you can provide regular updates
Throughout this project we’ll also take a look at connecting to various data sources within Tableau Desktop including:
Text/CSV Files.
Excel Files.
PDF Files.
JSON Files.
Databases such as Microsoft SQL Server.
Online databases such as Snowflake.
Cloud services such as Google Drive & Google Sheets.
Native connectors to systems such as Salesforce.
Using Web Data Connectors to connect to third party software or APIs.
In addition we’ll look at:
Combining data sources together using unions.
Adding additional data sources using data relationships/joins.
A template and checklist for preparing, cleaning and enriching your data source for your users.
Creating data extracts and filtering your data.
Sharing and publishing your curated data source within Tableau Desktop and Tableau Online/Tableau Server.
Connecting to and using your curated data source as a user in Tableau Desktop and Tableau Online/Tableau Server.
Required Data:
The files you need for this can be found in the Course Data Files folder for this project. You’ll need to download a copy of each of the files in this folder.
The Optional Data folder contains some data which you can use if you would like to follow along with the steps in that stage of the course where we look at connecting to databases and Google Drive. You can simply upload the Database Data into your database (if you have one to use) and copy the Google Drive Data into your Google Drive.
If you do wish to follow along with the steps that require connecting to a system/database and you don’t already have one here is where you can create each:
Microsoft SQL Server Database - You can either download and run the developer or express edition of SQL Server for free or take a free 30 day trial to SQL Server hosted in Microsoft Azure.
Snowflake Database - Offers a free 30 day trial, no credit card needed.
Google Drive - Free to use for individuals.
Salesforce - Developers can sign up for a free developer account which enables use of the Tableau Connector (note: free trials of the none developer edition do not enable use of the Tableau Connector).
Jira - Free for up to 10 users, once set up you can take a free 30 day trial of the Tableau Connector Pro for Jira in the Atlassian Market Place.
Tableau Online - You can get access to a your own free development instance of Tableau Online by joining the Tableau Developers Programme.
Please note: Most of these connectors are only available within Tableau Desktop and not Tableau Public, if using Tableau Public or you don’t have access to these systems you can still see the steps to connect to them throughout the course.
Solution
The curated data source that we’re going to create is going to cover our sales orders for the current calendar year, we’re also going to add in data on returns as well as the sales territories and sales people each customer and account is linked to. Once complete the curated data source will provide all the data our users need in order to analyse sales performance within our organisation.
The steps below take you through how to connect to each data source, combine and enhance your data and then prepare and publish as a curated data source.
Each step has detailed instructions the first time you come across a new item/function within the course, the accompanying videos then show you a demonstration of going through the steps.
Please Note: The videos in this coursedon’t contain any audio or voice over.
Section 1 - Connecting to local data files on your computer or a shared network drive
1. Connect to a single table of data from MS Excel
To start with we’re going to connect to data which is contained in a Microsoft Excel spreadsheet. This is very common requirement with Tableau, if you’re using Tableau at work you may connect to databases more often but it’s highly unlikely you’ll never have to connect to an Excel file!
Our data is contained in a file called Orders.xlsx which contains a sheet with the orders for each quarter within the year and another sheet which lists all the orders that have been returned.
Let’s begin by getting this data into Tableau.
Open Tableau Desktop or Tableau Public.
Click Connect to Data.
Under To a File click Microsoft Excel.
Navigate to the Orders.xlsx file and click Open.
Drag Q4 Orders from the left pane and drop it onto the canvas in the middle of the screen.
Tip: If you’re unfamiliar with the data source page and the different areas on it you can get an overview on the Tableau Help Data Source Page here.
You should now see a preview of the Q4 orders data at the bottom of the screen in the data grid as well as a summary of the columns/fields in your data in the meta data grid (depending on the version of Tableau you’re using you may need to toggle the meta data grid on to see it).
Getting data into Tableau is as simple as that, if we just wanted to analyse the Q4 orders data we could go ahead and click Sheet1 at the bottom of the screen and start analysing the data.
It’s more likely that you’d have to bring in more data, or supplement your data so we’ll take a look at doing that next.