BI 101

What is Business intelligence?

Business Intelligence (BI) software consolidates, analyzes and displays the data collected or generated by your company’s transactional business systems to provide powerful insights into how your business works and how it can be improved.

BI software consolidates data from different sources and assembles it in “data warehouses” or “data marts” that eliminate distinctions in data formats. It then presents the results through a reporting, analytics or dashboard interface. BI software thus serves as a common platform for shared, company-wide insight. BI software makes analysis and report-making much faster and more reliable.

Using BI software, users can do things like profile customer behavior, monitor buying patterns, detect bottlenecks an opportunities to reduce costs, and pinpoint top customers. The data warehouse serves as a gateway to the underlying data systems. Users can drill-down into the source data to find the details behind a particular query.

Properly implemented, BI is a decision-support system that puts the entire wealth of your enterprise’s electronically stored information to optimal use.

What is a data warehouse?

A data warehouse is a repository of your organization’s stored data. In essence, it is a place where content from different systems can be consolidated together in one logical location. A related term is “data mart”. Although there can be important design differences between data warehouses and data marts, in general a “warehouse” supports an entire corporation or enterprise, and a “mart” supports one specific business area. For the purposes of this discussion, we will not make a distinction between data warehouses and data marts.

Is “data warehouse” just a fancy name for a large database?

No.

Some operational systems and databases provide reporting capability. Nonetheless, the key difference is that these systems are optimized for data gathering and recording of business transactions, whereas a BI data warehouse is optimized to support decision making across the entire enterprise.

A data warehouse provides a common model and structure for all data regardless of its original source. This common data model makes it easier to report on and analyze information.

The end result with this kind of system is that people spend less time consolidating or looking for information and more time using information for business insight and to drive better performance.

What is “Dimensional Data”?

Dimensional data is a data that is organized so that individual datums, or datapoints–such as number of products ordered, sales date, customer, location, can be combined along any number of axes, or dimensions.

To see what we mean by this, consider that in a spreadsheet, data is arrayed in two dimensions—rows and columns. Say, for example, that you had a spreadsheet that listed customer names in rows and values of transactions in a column. The datapoint in a cell that listed a parituclar customer name and a particular transaction value would have two dimensions. Now imagine that you had 365 such spreadsheets with the exact data format, one spreadsheet for every day of the year. If you were to logically stack them on top of each other, as in a three-dimensional chess game like Mr. Spock plays on Start Trek, you would have a cube, and any data point could have three attributes—three dimensions—customer name, date, and value of transaction. That way you could ask the system to tell you, for example, all transactions above a certain value after a given date. In BI stystems, any datapoint can have an arbitrarily large number of attributes, or dimensions.

What is a “Cube”?

A Cube is a data warehose in which data is partitioned in dimensions. This approach makes it much easier for users to understand and manipulate the data available to them. In BI data warehouses, data is stored in a multi dimensional cube that is optimized for analysis and reporting. You can have not just two or three dimensions, but basically as many as you need.

ETL is the name given to the software tools that create cubes from existing data sources.

What is ETL?

Extract, Transform, Load (ETL) is a common approach to building a data warehouse from an organization’s various sources.

The process can be particularly complicated because of the need to extract and converge data across various systems. In big organizations, different data sets can contain hundreds of millions of rows of data.

On this order of magnitude, you need a powerful process to bring the data together in a standard environment. The most common method is to use ETL software to extract the data from its source, transform it into a consistent format, and load it into the data warehouse.

The ETL process is typically a series of connected steps:

  • Build the reference data.
  • Extract data from the various input sources.
  • Validate that data (make sure it is internally consistent)
  • Transform the data into reference format (“clean” it, apply business rules, check for data integrity, create aggregates or disaggregates, and so forth).
  • Load the transformed data into staging tables.
  • Audit reports (such as compliance with business rules).
  • Publish the staged data to target tables in the cube.
  • Create an archive of the data and the processes used to create it.
  • Clean up.

In large companies, the process can be very complex and costly, and big operational headaches can happen when the process isn’t properly designed, debugged, and implemented.

How do BI tools allow users to exploit the wealth of the data mart?

BI presentation and visualization tools make it easier for people to find meaning inherent in a data warehouse or cube. The following are different BI tools:

  • Reports present corporate data in a consistent, highly readable format using text, charts, and graphs. Most BI systems provide templates to kick-start the reporting process. Users can also create their own report formats.
  • Dashboards present metrics in a visual way. They provide users with at-a-glance content, using graphics (such as bar charts, graphs, or pie charts) and gauges (such as traffic lights) to highlight important information.
  • Alerts notify users when there is a change in data conditions, or when a user must perform a task at a particular time.
  • Online Analytical Processing (OLAP) tools allow users to “slice and dice” data to highlight key trends of interest.
  • Data mining techniques use advanced algorithms to reveal hidden patterns and transform data into information.
  • Performance management tools allow you to monitor key performance metrics according to pre-set criteria.

How is BI software installed and delivered to customers

There are two approaches to deploying BI software, on premise or as a service.

On premise is the traditional approach, where BI software licenses are purchased from a vendor and installed on computers owned by the enterprise.

Larger organizations often make substantial investments in BI solutions to bring these decision support capabilities to hundreds or thousands of staff. In this case, in-house deployments can be complex, involving large-scale implementation, administration and management.

Deploying enterprise BI on the premises means spending time understanding a company’s information infrastructure, its data sources, the information needs of users, the ways in which data can best be turned into the right information, and how to get that information to the right people.

Because of the time and effort involved in getting the system up and running, it can take months or years before users are able churn out their first reports. For many smaller organizations, this kind of enterprise-scale BI deployment is too expensive, complex, and resource intensive to be practical.

The other option is where the BI software is offered as a service (Software as a Service: SaaS) over the Internet. In this case the software is not “purchased” and the computers on which it executes are not owned by the enterprise. Rather, the company subscribes to the service.