Data mining, in its present format, has been part of Microsoft Business Intelligence for about 10 years. Unfortunately, it is one of least discovered and least used features of Microsoft BI.
Many companies have heard of, and use, MDX (MultiDimensional eXpressions) and/or DAX (Data Analysis eXpressions) to extract meaning /reports from their cubes and tabular models. Yet few have investigated DMX (Data Mining eXtensions to SQL) to extract meaning/reports from their data mining models – usually because they don’t have data mining models in the first place. This blog is a modest attempt to spark your interest in the possibilities of data mining (but as a short blog, it can only look at a tiny subset of the features and has to make a few generalisations).
Data mining is also referred to as knowledge discovery and is a major part of a broader field called data analytics. It allows you to ‘discover’ new meaning and insights from your data, whether the data is relational, multidimensional, or tabular. The Microsoft implementation of data mining is part of SSAS (SQL Server Analysis Services), although you can use Excel 2013, 2010, or 2007 as one of the many interfaces. Please note that data mining requires a multidimensional installation of SSAS, it is not part of SSAS tabular (this is true up to and including SSAS 2014 – although this may change in the future).
To explain data mining is difficult in a brief article – partly due to the fact that there are nine (eleven counting variations) algorithms or methods provided by Microsoft. Essentially it discovers meaning in your source data. The meaning that it discovers is often meaning that you are unaware of. Data mining highlights and makes obvious this hidden meaning – it can also be used to find meaning that you are vaguely aware of, but would be very difficult or impossible to verify using more conventional SQL/MDX/DAX queries and reports. To illustrate the concept, we will concentrate upon the Association Rules algorithm to discern cross-selling and up-selling opportunities. This is often referred to as market basket or basket analysis. You can see a similar algorithm at work when you search for a book or CD on Amazon, and it presents a list of other products you may be interested in.
There are other algorithms too. For example, you can look at sales and data mining will show you projected future sales – used for planning future production resources. Or you can analyse sales/behaviour by customer demographics and data mining will predict future sales/behaviour of new customers – used for planning marketing campaigns and anomaly detection. Other useful algorithms examine correlations – used in predicting outcomes in health, science, and sales. The beauty of the Microsoft offering is you don’t need to understand the R language or Mahout (both popular in data analytics) – you don’t even need to know DMX (Microsoft’s data mining language) to produce amazing results in a few minutes.
The source data for SSAS data mining can be in a cube (multidimensional), in a conventional database (relational), in a tabular model (tabular!), or Hadoop (Big Data). To utilise the latter you may need to use SSIS (SQL Server Integration Services) possibly with Hive or Pig, or Excel’s Power Query and Power Pivot, to provide some structure to the data. The source data is fed into a mining structure. The mining structure contains one or more data mining models – these can be based on different algorithms and/or the same algorithms with different parameters. The source data educates (trains) the models so they can arrive at their conclusions. To retrospectively view the source data once it’s in the structure you can use a DMX Cases query. As for most of data mining there are GUI alternatives for nearly everything you can do in a DMX query – these graphical interfaces are available in Excel, SSMS (SQL Server Management Studio), SSIS, SSRS (SQL Server Reporting Services), BIDS/SSDT-BI (Business Intelligence Development Studio/SQL Server Data Tools-Business Intelligence). The findings or results of the data mining models can be seen in a DMX Content query or graphically – this is where the meaning first becomes apparent, by highlighting behaviour, trends, correlations, similarities, differences, and more. Further meaning can be discovered (I almost said mined!) by looking at new, as well as existing, data. This is where you begin to predict future behaviours and trends. Not surprisingly, these new results are obtained by writing a DMX Prediction query – again there are GUI alternatives that make life much, much easier.
In our example, we are going to look at the Association Rules (market basket analysis) algorithm. You will need Adventure Works Multidimensional to try this for yourself as we are going to examine a pre-built Microsoft example. You can easily build your own example for SSAS from Excel or BIDS/SSDT-BI in less than five minutes – the source data is from Adventure Works DW relational database via the vAssocSeqOrders and vAssocSeqLineItems views. To view the results of the mining, we will use SSMS (although you could use Excel just as well). In SSMS connect to your SSAS multidimensional instance and expand, in turn, Databases, Adventure Works DW (i.e. multidimensional not relational), Mining Structures, Market Basket, Mining Models. Finally, right-click on Association and choose Browse, then click on the Dependency Network tab. This shows the relationships between purchases, including the direction (one-way or two-way) and the strength of the relationship. If you select Water Bottle, for example, examine the relationships and the colour-coding of these relationships. A completely obvious relationship exists with Road Bottle Cage and it’s a two-way relationship. A totally non-intuitive relationship (and this is what data mining is all about) is with Cycle Cap. Customers who bought a Cycling Cap then went on to buy a Water Bottle, and vice versa (see the following screenshot).
Now we have the content, we can easily build a prediction query. Again, we will use SSMS, but you can also do this from Excel, SSIS, SSRS, and more. Right-click on Association a second time and choose Build Prediction Query …. This opens a graphical query designer. Right-click on the background and choose Singleton Query. In the Singleton Query Input click in the Value column and then click the ellipsis that appears. Click Water Bottle, Add, OK in the Nested Table Input dialog to return to the designer. In the first row at the bottom of the designer, choose Prediction Function as Source and PredictAssociation as Field. Drag v Assoc Seq Line Items (from the Mining Model at the top left and not from the Singleton Query Input) to Criteria/Argument. Type a comma then 5 (to show the five most likely purchases) after this. Your graphical query should look like this:
Now click the third button at the top right to see the results – it shows that customers who buy a Water Bottle are likely to buy a Cycling Cap (a little more work could also show us the probability). Click the middle button (it’s labelled SQL not DMX as DMX is an extension to SQL) to look at the DMX generated. You may want to copy this syntax to the clipboard and paste into a DMX query in SSMS. Run the query and expand the resulting nested table to verify the relationship between Water Bottle and Cycling Cap. To expand the nested table, type in the FLATTENED key word after the first SELECT and re-run the query. The final query (including a natural prediction join) and query result are shown below:
PredictAssociation([Association].[v Assoc Seq Line Items],5)
NATURAL PREDICTION JOIN
(SELECT (SELECT ‘Water Bottle’ AS [Model]) AS [v Assoc Seq Line Items]) AS t
Mountain Bottle Cage
Road Bottle Cage
I hope this little ‘taster’ has whetted your appetite. If you would like to pursue Microsoft data mining a little further, here are two books to get you started (they are both still totally relevant for SSAS 2014):
“Data Mining with Microsoft SQL Server 2008” by Jamie MacLennan et al (Wiley)
“Practical DMX Queries for Microsoft SQL Server Analysis Services 2008” by Art Tennick (McGraw-Hill)
By Art Tennick
Art Tennick is a Microsoft Business Intelligence trainer, consultant and author based in the UK with over 15 years of BI exposure and the author of 20 published books and hundreds of magazine articles. To contact Art, email [email protected] or visit him on LinkedIn.
Have a comment? Let us know below, or send an email to [email protected]