In this blog series, we examine common scenarios in Business Intelligence using Finance Data. The series explores common use cases that often occur in business intelligence, such as pivoting data, changing the format, and working with dates. The blog series uses Power BI and Azure Synapse Analytics to showcase the data quickly and scale effortlessly to meet the business need.
You can watch the whole thing in this video, or carry on below for part two of the blog.
In the previous post, we covered some simple data engineering with Power Query, and we imported the data into Azure Synapse Analytics storage.
CONTENTS
Blog 1: Data Engineering with Power Query
Blog 2: Working with SQL in Azure Synapse Analytics
– Creating tables in Azure Synapse Analytics.
— Creating a table in Azure Synapse Analytics using a Select statement.
— Validating the creation of the new data table.
— Create a new Country dimension table.
— Validating the creation of the new Dimension Table.
– Shaping the Data using Views in Azure Synapse Analytics.
— Creating a view to display Units.
— Creating a view to display Percentages.
— Creating a view to display Billions.
Blog 3: Visualising Azure Synapse Analytics Data in Power BI.
Bonus: Cleaning up Azure Resources
Working with SQL in Azure Synapse Analytics
In this post, we will focus on Azure Synapse Analytics and conduct tasks such as creating tables and views, testing out the tables, and formatting the data. If the Azure Synapse Analytics SQL Pool instance is paused, then it will need to be restarted.
Creating tables in Azure Synapse Analytics
The next step is to create new tables which will store the cleansed data. One table will store the IMF Data in a better format, and the other table will store only the required columns from the Country data.
Creating a table in Azure Synapse Analytics using a Select statement
In this section, we will create a new table based on the original data held in the IMFData.AzureData table, which was created when we imported the Excel file from the storage account into Azure Synapse Analytics. The new table will be called IMFData.IMFData.
To create a table, we use the CREATE TABLE statement in the same way as we do in ANSI SQL. Here is an example of the script:
CREATE TABLE IMFData.IMFData
WITH
(
CLUSTERED COLUMNSTORE INDEX,
DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT
ROW_NUMBER() OVER (ORDER BY [WEO Country Code]
,[ISO]
,[WEO Subject Code]
,[Country]
,[Subject Descriptor]
,[Units]
,[Scale]
,[Year] ASC) as ROW_ID
,[WEO Country Code]
,[ISO]
,[WEO Subject Code]
,[Country]
,[Subject Descriptor]
,[Units]
,[Scale]
,[Year]
,
CASE WHEN ISNUMERIC([Value]) = 1
THEN CONVERT(DECIMAL(18,2), [Value]) else 0 END
AS Value
FROM [IMFData].[AzureData];
Figure 16 Creating a Table in Azure Synapse Analytics
In the example, the schema and table name are assigned in the first row as expected. The new target columns have got the same name in the new table as they do in the source table. In this example, we are also using some error handling to ensure that valid numbers are imported into the new table, and if the value is not numeric, it is replaced with a zero. To achieve this feature, a CASE statement is utilized which will check if the value column is a number or not. If the value is a number, then it will be converted to a decimal. If the value cannot be converted to a number, then it will be replaced with a zero.
One key difference is that, when creating a table in Azure synapse, it is crucial to use the WITH statement to specify the table type. In this example, we are using the clustered column store that will be used as an index. Further, in this example, the distribution being used is round-robin.
To create an index, a row number function is used, it will give each individual row a specific ID number.
In the example script, you can see that data is being extracted from the SELECT statement to create the new data table. The table will be created when the select statement is executed.
To create the table, select the RUN button at the top left-hand side of the query pane. There is an example below:
The next step is to check that the table has been created properly and this step will be illustrated in the next section.
Validating the creation of the new data table
It’s important to check that the table has been created properly and the data has been imported correctly.
Figure 17 The Data view in Azure Synapse Analytics Studio
In the data view, the SQL Database can be viewed along with the tables. To create a new script, right-click on the table name and there are a few options. An example is provided here:
Figure 18 Executing a SQL Script on a table
Using Azure Synapse Analytics studio, the table may not appear immediately. To view the new table, the user can refresh the Table folder by right-clicking on it and selecting the Refresh option.
As a test that the new IMFData.IMFData table has been created, it is a good idea to execute a command to view the data. In this example, there is a default SQL command that allows the user to select the top 100 rows of data. We can select the top 100 rows by right clicking on the table name, selecting the option for New SQL Script and then selecting the option to SELECT TOP 100 rows as displayed in the previous illustration. When the user selects this option, a new SQL Script file will be created with the necessary SQL command pre-populated in the Azure Synapse Analytics Studio window. Here is an example:
We can then execute the SQL statement using the Run button, which is outlined in yellow in the previous illustration. When the query has completed execution, we can see the results and the bottom pane of the Azure Synapse Studio.
Note the results in the Scale column. The Scale column indicates whether the number is presented in Billions, Units or Percentages. It will be difficult to present the data correctly and clearly in Power BI because the scales will impact how the data appears; there is a big difference between 3 billion and 3, and the graphs will be difficult to read. Later, we will shape the data further by creating views that will help make the data easier for Power BI to display.
Create a new Country dimension table
When building analytics, it is important to make the reports and dashboards as easy as possible for the business users to understand. Using interactive maps is a good way to help users to make the most of their data.
In this example, the country data can be used as a dimension. It can be used later to help create rich visualizations in Power BI.
Here is an example:
CREATE TABLE Dim.DimCountry
WITH
(
CLUSTERED COLUMNSTORE INDEX,
DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT
CAST([Index] AS INT) AS CountryKey
,[ISO]
,[CountryName]
FROM [dim].[country]
Figure 19 Create table statement for country data in Azure Synapse Analytics
In this example, the code creates a new table which has dim as the schema name and dim country as the table name. In this example, you can see clustered column store indexes configured and the distribution is round robin. This SQL Statement will create a table using a select statement. In this example, the Index column is converted from a string to an integer using a cast function. Some of the original columns are dropped by virtue of simply not being selected as part of the table creation process. When we execute the statement, the table is created, and the data is imported from the source table.
Validating the creation of the new Dimension Table
Once the SQL statement is executed using the Run button, it is important to check that the table has been created correctly and the data meets expectations. In Azure Synapse Analytics Studio, it is possible to check by selecting the Tables folder at the right-hand side and choose the refresh option. It is possible to check that the table has been created and populated properly. To do that, we can right-click on the table and select the option to run a SQL statement. Here’s an example, where we select the top 100 rows from the Dim.DimCountry table:
Figure 20 Creating a new script using the Azure Synapse Studio interface
The Azure Synapse Analytics Studio will create a new query to view the contents of the table by running the following SQL command, which will display the top 100 rows of data.
SELECT TOP (100) [CountryKey]
,[ISO]
,[CountryName]
FROM [dim].[DimCountry]
Figure 21 Selecting data from the Country dimension
The results will appear in the Results pane in the Azure Synapse Analytics Studio.
Shaping the Data using Views in Azure Synapse Analytics
Azure Synapse Analytics uses standard SQL to work with data. Therefore, if data specialists are familiar with SQL, then it will be easier to extrapolate that expertise from other database technologies such as SQL Server.
As it was noted earlier, the IMF data is presented in different scales, such as Billions, Percentages and Units. One way of resolving this issue is to create different views on the data that display data that is restricted to a specific scale. The data will be clearer for the end-user to understand.
Creating a view to display Units
The following view will use the [IMFData].[IMFData] table as a basis for data, and use a WHERE clause to restrict the data where the Scale is equal to ‘Units’ and the Units are specified in US dollars only.
CREATE VIEW [IMFData].[vIMF U.S. Dollars in Units]
AS
SELECT [ROW_ID]
,[WEO Country Code]
,[ISO]
,[WEO Subject Code]
,[Country]
,[Subject Descriptor]
,[Units]
,[Scale]
,[Year]
,[Value]
FROM [IMFData].[IMFData]
WHERE Units = ‘U.S. dollars‘ AND Scale = ‘Units‘
Figure 22 Create View to display Units data
Here is an example of the query being used to display the top 100 rows of data from the view.
Figure 23 View execution result in Azure Synapse Analytics Studio
Creating a view to display Percentages
The following view will use the [IMFData].[IMFData] table as a basis for data, and use a WHERE clause to restrict the data where the Unit is equal to ‘Percentage of GDP’.
CREATE VIEW [IMFData].[vIMF Percent of GDP]
AS SELECT [ROW_ID]
,[WEO Country Code]
,[ISO]
,[WEO Subject Code]
,[Country]
,[Subject Descriptor]
,[Units]
,[Scale]
,[Year]
,[Value]
FROM [IMFData].[IMFData]
WHERE Units = ‘Percent of GDP‘;
GO
Figure 24 Create View to display Percentage of GDP data
It’s possible to test the view with the following query, which will return the GDP Percentages only.
SELECT [ROW_ID]
,[WEO Country Code]
,[ISO]
,[WEO Subject Code]
,[Country]
,[Subject Descriptor]
,[Units]
,[Scale]
,[Year]
,[Value]
FROM [IMFData].[vIMF Percent of GDP]
Creating a view to display Billions
The following view will use the [IMFData].[IMFData] table as a basis for data, and use a WHERE clause to restrict the data where the Scale is equal to ‘Units’ and the Units are specified in US dollars only.
CREATE VIEW [IMFData].[vIMF U.S. Dollars in Billions]
AS SELECT [ROW_ID]
,[WEO Country Code]
,[ISO]
,[WEO Subject Code]
,[Country]
,[Subject Descriptor]
,[Units]
,[Scale]
,[Year]
,[Value]
FROM [IMFData].[IMFData]
WHERE Units = ‘U.S. dollars‘ AND Scale = ‘Billions‘;
It’s possible to test the view with the following query, which will return data that is specified in Billions. We can see the results in the Studio.
Figure 25 Query Result in the Studio
To Wrap-up
In this post, we focused on Azure Synapse Analytics and conducted activities such as creating tables and views, testing out the tables, and formatting the data. In the next post, we will explore Power BI in detail, using the data that was prepared in the blog series.
If the Azure Synapse Analytics SQL Pool instance is still running, then it will need to be paused so that you are not charged.