INDEX
- Moving from Reporting to Business Intelligence and Optimization with Analytics
- – The Gartner Roadmap from Descriptive Analytics to Prescriptive Analytics
- Moving from Business Intelligence to Prescriptive Analytics with Azure
- Choosing your Data Storage for Business Intelligence: Data Warehouse or Data Lake?
- Data Warehouse or Data Lake?
- – Do you simply want to store data?
- – Do you need cloud data warehousing?
- – Do you need to store data with a flexible data model?
- Moving from Storage to Analytics
- – Do you require user-friendly analytics?
- – Analytics – the choice between proprietary and open source
- Visualizing the Data
- What’s coming next, and what does this mean for businesses?
- – How Ballard Chalmers can help you
Moving from Reporting to Business Intelligence and Optimization with Analytics
In the last two years alone, 90% of the world’s data has been created (Source: IORG, 2019). Business leaders understand that their data is valuable, but many businesses collect data and static business intelligence reports but struggle to find a path from business intelligence reporting to analytics.
The purpose of Business Intelligence is to optimize the business, not simply create reports. When allocating budgets, it can be hard to prove a return on investment in Business Intelligence projects since the output is a series of reports or dashboards rather than a demonstration of business optimization.
Further, in the competition for budget, Business Intelligence projects can lose out to technology upgrades where it is easier to demonstrate business optimization and value. Organizations often need to prove a return on investment, and it is hard to provide evidence for return on investment where the organization limits Business Intelligence activity to patterns of simple, static reporting. This constraint means it is harder to embark on a path of optimization through Business Intelligence and then onto further optimization through analytics.
How can organizations break out of this pattern of simply storing volumes of raw data to gaining valuable insights? Gartner put forward a suggested roadmap to categorize the purpose of reporting and analytics as organizations proceed through their journey from static business reporting to optimization through Business Intelligence and analytics.
The Gartner Roadmap from Descriptive Analytics to Prescriptive Analytics
The initial Business Intelligence stage, defined as Descriptive Analytics, focuses on what happened. This stage is characterized by traditional static Business Intelligence reports and visualizations such as pie charts, bar charts, or line graphs. At this point, Business Intelligence has an operational focus to gather data, generate reports, and produce ad-hoc reporting. Often, this stage involves basic reports, and it may include flagging under-performing areas that require attention.
Organizations usually stop at this point, believing that this is what ‘good’ looks like for a successful data-driven organization. However, this is a good starting point rather than an end. The objective is to move the organization from the ‘Descriptive Analytics’ stage towards optimizing the organization.
Business analytics has the objective to predict and prescribe actions with the overall goals in mind. Analytics needs good Business Intelligence as a foundation. Without good Descriptive Analytics, it will not be easy to make predictions or find good connections between different siloed datasets.
Ideally, businesses are moving towards the Predictive Analytics stage in the Gartner process. The Predictive Analytics stage emphasizes prediction rather than description. In Gartner’s approach, analytics are intended to be easily consumed by accessible tools and it promotes rapid, relevant analytics. For some organizations, current architecture will not support prescriptive or predictive analytics because there is no opportunity to collaborate with data scientists.
Business Intelligence is often categorized as an IT function that uses data and technology rather than a business function aimed at business performance optimization. Moving from simple Business Intelligence to analytics can involve moving from an IT mindset about data to a business mindset focused on outcomes. From the IT perspective, this shift means more than reducing storage and managing data costs. Instead, the focus shifts to analyzing the data and using it to help drive decisions and further analytical questions.
Moving from Business Intelligence to Prescriptive Analytics with Azure
Why is it essential to move the organization from the primary Business Intelligence Descriptive Analytics to later stages? The Descriptive Analytics stage is an excellent place to start since the initial data must be correct, and more complex analytics will not work correctly unless the descriptive stage is in place.
From the Gartner definition, it is possible to dive deeper into how organizations can move through the maturity stages from descriptive Business Intelligence, which Gartner describes as Descriptive Analytics, through to Prescriptive Analytics, which use data to help provide actionable recommendations to support decision making.
How can we move from business intelligence technology to technology that supports predictive analytics? When we look at Azure, for example, the range of Azure technologies plus the sheer cadence of Azure updates means that it is easy to get lost in all options.
When you are architecting a system, how do you know what datastore option to choose? And what’s the tipping point to move you from one solution to another? Is it data size, data complexity, cost? The list goes on, which means that we introduce complexity. Let’s start with looking at storing data through to creating a data platform to support predictive analytics.
Choosing your Data Storage for Business Intelligence: Data Warehouse or Data Lake?
In today’s multi-cloud environments, organisations have data in different clouds as well as on file servers, and possibly even on their laptops. As a first step, the organisation should have an inventory of the data sources that they use on a regular basis. As the organisation proceeds to understand their data better, one common question arises of where the data should be stored; in a data warehouse or a data lake?
Data Warehouse or Data Lake?
In Business Intelligence, data warehouses continue to be popular because they are very mature technologies. They work well with tools such as Excel, Power BI and Tableau. Data warehouses perform well because the data is wholly curated and structured to answer specific query patterns quickly.
Organisations can use data lakes to improve their analytical processes. Here, a data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed.
Data lakes address the shortcomings of data warehouses in two ways. First, in data lakes, the data is stored in structured, semi-structured, or unstructured formats. Second, the data read process determines the data schema rather than writing the data. Data Lakes are popular because of their cost-effectiveness. There is never any need to throw away or archive the raw data, and it is always there should any of the analysts want to revisit it.
Do you simply want to store data?
Azure Data Lake stores data for advanced analytics and large volumes of unstructured data. At this point, this is the semi-structured or unstructured data sitting in data lakes. Processes are run on this data to yield Business Intelligence answers and analytics insights.
If the organization wants to store data, the teams can store it in Azure Blob Storage. This storage facility is great for storing images, for example.
Data Lakes are considered cost-effective for all content storage and offer different processing abilities on various data formats. Data Lakes also offer fast availability of data, agility, and flexibility.
Organizations can enhance their data warehouses with data lakes. The data lake can serve as a staging area for the data warehouse, which then acts as the more curated data to be analyzed. Indeed, data warehouses are slowly accumulating many of the features that formerly were found only in data lakes.
Do you need cloud data warehousing?
In a traditional on-premises data warehouse, data processing engines and data storage are coupled together. In the Microsoft stack, Microsoft SQL Server uses a proprietary SQL data processing technology and a proprietary data storage mechanism and now has R and Python as processing languages.
For cloud data warehousing, Microsoft offers Azure Synapse Analytics, which is a Data Warehousing technology that provides data warehousing in the cloud at scale. The data is stored in a structured format and offers support for Big Data Analytics in Spark and Python. Azure Synapse Analytics fits in well with data visualization technologies, and check out our latest post on this topic here, along with a tutorial on using these technologies in an end-to-end Azure Synapse Analytics and Power BI three-part tutorial.
Data processing engines and data storage are distinct from each other in a data lake. The technologists can select which engines to use to analyze the data and which data format to use from the technology perspective.
In Azure, technologists can keep data in Azure Storage for backup purposes or use Azure Data Lake Storage for fast Big Data analytics. As we move from structured to unstructured data, the options change to accommodate the data’s changing requirements. Technology such as Polybase offers data virtualization to help mix unstructured data and structured data. The SQL Server 2019 Database Engine already contains Polybase, so it is simpler to query unstructured data.
Azure Data Lake is excellent for enterprise analytics applications where there are large amounts of data where conversion and loading are the only actions needed. It is helpful for processing data from relational databases into Azure or repetitive loads with no intermediary step.
Do you need to store data with a flexible data model?
In Business Intelligence, another way of working with operational data is to use a NoSQL database. In Microsoft Azure, we have Azure Cosmos DB, a fully managed NoSQL database service built for a flexible data model. It has the advantage of scale with consistent low latencies. Developers tend to like the flexibility of NoSQL, which has rich query capabilities that make it an excellent fit for web, mobile, gaming, and IoT, and many other applications that need seamless scale.
Moving from Storage to Analytics
Once the data is in Azure Blob Storage, Azure Data Lake Store or Azure Cosmos DB, how can you move from simply storing the data to analytics? The choice often depends on the type of data science required by the enterprise.
Do you require user-friendly analytics?
Databricks is a superior technology to use for beginner analysts and seasoned data scientists. Databrick is also suitable for encouraging collaboration with different teams. Databricks’ vision is to make ‘big data easy’ so that every organization has simplified access to data science, whether they are IT administrators or data scientists. From the data scientists’ perspective, Databricks does not require much maintenance, leaving the analyst to focus on producing results using the collaborative, interactive environment it provides in notebooks. From the Azure perspective, one of its greatest strengths is its zero-management cloud solution and ease of use.
Azure Databricks is an excellent choice for business intelligence and ad-hoc analytics. It is also advantageous if you have various data science language skills in the enterprise since it can facilitate collaboration among data scientists who use different languages, such as Python, R, Scala or SQL. Azure Databricks can serve as a data source to other technologies such as Azure Synapse Analytics, Azure Cosmos DB, and Power BI.
Analytics – the choice between proprietary and open source
If you require these open source technologies, then you could consider HDInsight, which offers clusters billed on a per-minute basis. HDInsight uses Apache Hadoop, which is an open-source distributed data analysis solution. Hadoop manages the processing of large datasets across large clusters of computers and it detects and handles failures. HDInsight is very effective at rapidly collecting large amounts of data. With HDInsight, the data scientist can quickly spin up open source projects and clusters, with no hardware to install or infrastructure to manage.
Why HDInsight? It uses open-source technology but it is also backed by the proprietary HDInsight offering through Azure. HDInsight provides dynamic machines that are billed only when active. Azure enables elastic computing, where you can add machines for particular workloads or projects and then remove them when not needed. HDInsight can take advantage of this scalable platform. It can also capitalize on the security and management features of Azure, integration with Azure Active Directory and Log Analytics. You can also make use of Hadoop with Azure Databricks, but as a storage function, rather than a function for data analysis and management.
However, anecdotal evidence shows that there is more of a learning curve when it comes to HDInsight. Generally, comprehensive training is required, and background knowledge of SQL is very helpful. There are a number of Microsoft-certified intensive courses that can help to teach learners how to use HDInsight. Some of these are full-time 5-day courses whereas others are self-paced. Many courses also contain exams. Overall, HDInsight is more difficult to learn and it may be better suited to organizations where there is a well-established data science team.
Visualizing the Data
Microsoft’s flagship data visualization tool is called Power BI, and it can consume data from all of these sources. The best tool is the right tool for the job, and fortunately, Azure provides a range of technologies to suit a variety of needs.
What’s coming next, and what does this mean for businesses?
Microsoft puts great significance in delivering a growing NoSQL and open source stack to its customers, and this is a space to watch as the need for analytics increases.
Note that it is not always necessary to restrict the architecture to the least number of possible technologies. For example, sometimes a mix of Azure HDInsight and Databricks occurs because Databricks is more user-friendly and easier to work with, so is better for exploration, whereas HDInsight is better for processing data.
Organizations may find that their technological architectures become increasingly complex, but sitting the technologies in one place – Azure – can help with support and maintenance.
How Ballard Chalmers can help
If you need help demystifying your data or choosing the right data storage, Ballard Chalmers can help. We are Microsoft Gold Partners in the Data Platform and Data Analytics, and experts in Azure, and Data Warehouse and Business Intelligence. We can help you plan, build and deliver your custom solution, helping you move from descriptive analytics to prescriptive analytics.
Give us a call on 01342 410 223 or contact us here.