Business Intelligence (abbreviated BI) is one of the hottest areas in Information Technology today. In fact, a number of specialists in other areas, such as predictive analytics, are rebranding themselves. It isn’t that they are doing anything fundamentally new. Rather it’s that their potential clients, who tended to glaze over at the old labels, are now interested in and receptive to “business intelligence.”
This is a short blog entry, and, as such, in no way purports to cover the topic of BI. Moreover, BI can be approached from a number of different directions – some radically different from others. All we want to do here is to provide an entrée into some of the areas covered under the rubric of “business intelligence.”
By its name BI means intelligence in and for business. Thus it is not just the collection of data or information, but rather implies “intelligent” processing, understanding, and usage of that information. The intelligence part will lead to useful interpretations, judgments, correlations, and conclusions – and provide evidence and support for better decisions, namely those which lead to increased revenue and profits and which reduce and prevent errors and missteps.
First, let’s break BI down into two parts:
- The sources of the information.
- The analysis or processing of that information.
This is a somewhat artificial distinction, as collection and processing may be intertwined at least in part. However, it will give us a place to start investigating. As the title of this blog indicates, we will focus on some of the sources of BI. The different sources will give us a broad categorization for BI applications.
Surveys are a widely used and potentially very effective method for collecting business data. However, there are two important caveats with surveys:
- In order to be maximally useful and effective they must be properly designed. It takes genuine expertise to put together such a survey.
- The proper interpretation of survey results also requires expert analysis with sophisticated math and statistics. The “obvious” conclusions of the survey may be wrong, and the deeper, more correct (read: profitable) conclusions may even be counterintuitive.
There is also a great deal of variation in surveys. For example, they may be anonymous, or else the respondent identities may be known and therefore connectable with business or personal profiles. The appropriate treatment and weighting in each case, along with how best to utilize profile information when available, are also important topics calling for expertise. In addition, time factors and timeliness are key considerations.
BI experts specializing in surveys are sometimes able to enter the scene after a survey has been completed, even if it hasn’t been done under optimal conditions, and draw very useful conclusions.
When good surveys and the proper analysis, the results can be staggering (translate: a huge increase in profitability).
One of the advantages of surveys is that they are a relatively lightweight approach. They don’t require a major technology deployment, and a well selected sample can represent much larger numbers.
Social Networking Analysis
In my earlier blog, The Overlooked Power of Social Networking, I noted: “The social media are great repositories of sentiment, and this sentiment can be mined and analyzed. What’s more, because the social media are highly dynamic, changes in sentiment can be monitored with virtually no lag time.” In addition, I indicated that social networking analysis can supplement the survey as a vehicle for garnering business intelligence.
Surveys and social networking analysis have a great deal in common. They both offer the means to reach out beyond the organization to the customer and prospect, or to any other strategic group or segment outside the walls. They both can be targeted to specific demographics as desired. Of course technically they involve different ways and means, and require different types of expertise to facilitate. However, once the survey or social networking data are collected, they can run through various types of analytics to yield valuable BI.
Survey and social networking data are both a matter of statistics. Surveys often involve subjective evaluative responses. They will often give range breakdowns (such as annual household income) that may or may not be optimal. Poorly designed questions may be leading, designed to garner positive answers, while skirting very specific nagging problems or failing to allow expression of legitimate complaints. Many responses are not totally honest.
The social media also have their limitations. The ratio of valid comments to “junk” may be low. They also may represent a skewed demographic. The structure of the information is minimal or non-existent. The sheer quantity of data, especially from Twitter, can be daunting. It is no simple feat to sift through all this to glean useful conclusions. Nevertheless, conclusions will start to emerge almost immediately even when very simple techniques are used. The more refinement applied, the more focused and interesting the results become. Statistical studies, artificial intelligence, and natural language processing are among the advanced techniques that can be used. The social media also have a “primitive” or “uncensored” quality reflecting a raw edge of sentiment difficult to extract from anywhere else.
A big advantage of social networking over surveys is that the data are ongoing and real-time. For example, you can measure the increase in volume and change of sentiment about a product immediately following a Super bowl commercial.
In my two books, The Tech Advisor and Workflow 101, I discuss workflow – or the software automation of organizational process – at some length. However, workflow automation is built around the collection, retention, management, and utilization of data. It is “data driven.” A brand new workflow system will use existing data and there will be new data created for it. Furthermore, workflow generates its own data as it runs: such as process status, employee performance, and throughput statistics.
For example, workflow automation for customer service will be a repository for all kinds of data, about various products and the issues that plague them, and even information that could lead to improvements in service rep training and support tools. Such data have value throughout the organization – for manufacturing, quality control, human resources, management, training, project planning, and so on.
In large enterprises there are often many workflow automations, as well as databases, applications, and systems of various kinds, typically of heterogeneous origin. There is an organizational impetus to connect and integrate these systems, either in a piecemeal or in cohesive, overarching way. There are usually a number of motivations for doing so, including:
- Streamlining operations.
- Eliminating redundancies in activities and processes.
- Connecting valuable data and resources to where they are needed.
- Centralizing systems around communications buses or service brokers.
- Bringing direct process control and governance out to higher level management.
- Bringing data and information about process, both real-time and historical, out to higher level management.
Motivation 6 is the one particularly associated with BI. A number of software companies that develop workflow, database, and other enterprise software platforms also sell systems specifically designed to cull information about process and bring it out for centralized dashboards and reports. These systems may also facilitate the other five motivations. They may, for example, allow business rules to be defined, thereby adding new control features as well as streamlining process.
From the BI standpoint, complex, combined workflow automations can yield very rich and layered data, making it possible to skim the cream off the top of organizational process.
We can enlarge upon the idea of business process to include all organizational operations that generate or operate with computer data. Large organizations will often have big systems handling major portions of their operations, such as ERP (enterprise resource planning), CRM (customer relationship management), supply chain management, sales systems, marketing systems, and so on. While these do fall under the rubric of “business process” as we discussed in the last section, we consider them separately here, as they are often large, self-contained systems. As such, they will often have their own centralized databases which can be accessed as sources of BI.
Thus we have a slightly different emphasis these operational systems than with business processes. In the latter, we are looking at ways to pick up process data at key junctures. In the former, we are simply connecting into the system’s data repository.
Key Performance Indicators
Many BI specialists like to establish key performance indicators (KPIs) as a way of monitoring organizational performance in various areas. KPIs are often chosen in conjunction with a management framework such as the Balanced Scorecard developed by Kaplan and Norton, which has been a major tool for defining and driving organizational improvement in the last two decades. KPIs are a very carefully chosen set of metrics that provide a snapshot of the state of the whole organization. More than that, they are pressure points, chosen as strategic targets for the application of process improvements. Defining KPIs requires art, experience, and expertise – to achieve both an accurate representation as well as a tableau for quick and meaningful results.
KPIs are typically connected with the workflows or automated processes, which we discussed in the last section. We sometimes speak of “instrumenting” business processes, by introducing points of measurement that are “hooked up” in some manner to a central performance monitoring station. Think of this like hooking up gauges wired to a central console.
To the extent that KPIs are set up through software – and are not simply instituted as “manual” procedures – they may piggyback on existing process automation. Alternatively, they may be created ad hoc, perhaps as very lightweight workflows that do little more than monitor and bring out data at key points.
Note: KPIs for BI are very similar to the metrics used to verify compliance – Sarbanes-Oxley, for example. Here the experts come in and look at the entirety of organizational processes and risks, and select a number of critical points that address the principal risks and collectively demonstrate compliance. These points (often referred to as “controls”) are then monitored on an ongoing or regular basis. Ideally the monitoring will be realized by software automation, and will include workflow features such as reminders, alerts, and escalations – to prevent any failures. Even though the immediate goal is compliance, these interventions can be included under the rubric of BI. The procedures will not only ensure compliance (a critical business goal), but may very well lead to other improvements.
Data Mining and Data Warehousing
Data mining is a catchall term for any form of digging through a lot of data to get the gold. In general, we can consider the “gold” to be BI. The big database where the gold is to be found might be, for example:
- An internal company database. For example, an auto company might have a database of all new cars purchased over the last twenty years with complete details on the models, the financial history of the transactions, profiles of the purchasers, warranties, dealer maintenance history, etc.
- An external database that is purchased, licensed, or made available by a business affiliate. For example, the same auto company may purchase a database from insurance companies, big auction houses, CarFax, and other sources, to track their cars after they go outside their dealer network.
- Public databases, or data collected by universities and research organizations. Some of these data may be available for free. Often they may be accessible only through specialized interfaces, perhaps over the web.
Often data mining is outsourced to companies that do that as a specialty. These companies provide interfaces to their customers allowing them to run queries and get results, but the maintenance and processing of the data are kept at the data mining company. These companies also take on data mining projects under contract.
Typically we think of data mining as going along with standard business activities such as market research, product development, and customer relationship improvement. However, there are also organizations for which data mining is essentially a core process. Consider online search tools like Google or Yahoo. Even today’s large-scale online stores are fueled by continuous data mining. Consider Amazon.com, which finds matching products, suggestions that match user profiles, and so on.
Data warehousing refers to collecting data from various sources (from inside the organization and sometimes from external sources as well), and putting them into a large “warehouse” from where they can be mined. Data warehousing is particularly associated with BI, as one of the primary motivations for collecting data in a warehouse is to perform analysis, do forecasting, and provide decision support – based on the ability to look at the broad spectrum of information.
Data warehousing is a major area of IT, where there are numerous theories, methodologies, products, and services to be found – all well beyond the scope of this discussion.
“Big data” is a recent term referring to a collection of data so large that it cannot effectively be handled by conventional database management systems. Data capture, processing, storage, search, sharing, analysis, and visualization become enormous challenges as the amount of data grows. Data are growing at a staggering rate. In my prior blog entry, Green IT: Why it is so important., I discussed some of the issues involved in the big data phenomenon.
In the mid-1990s I came up with an ad hoc principle: the degree of difficulty in managing data grows by the square of the amount of data. This was not meant to be a quantitative fact, but rather was based on observation of numerous situations. To me, it was actually counterintuitive; I would have thought that dealing with a larger amount of data would require only incrementally more work, but my experience proved to be very different. Big data challenges are like this as well. As data grow, the traditional hardware and software platforms cannot be scaled up to handle them. New platforms based on large-scale parallelism, such as Apache Hadoop, are being deployed and further developed. If you think about the processing involved in the Google search engine, sifting through gigantic quantities of data and returning query results almost instantly, you get a feel for big data.
Also there is something like the compound interest effect going on with big data. As we observed with business process, the automation of process produces more data, and the extraction of intelligence from automated process produces still more data. In fact data mining from any source produces more data.
Another factor that leads to data growth involves the inclusion of more specific information, such as timestamps, geographic locations, and source/trace paths. (This information is found in many social media, for example.) The drive in BI analytics to identify meaningful correlations makes it worthwhile to capture as much of this sort of contextual information as possible. As we have seen, not only does this lead to larger datasets, but the analytics themselves generate even more data.
Despite the challenges, big data is an important source of BI. Large organizations are now working with it and are creating huge repositories of useful information. Anyone who does a Google search, shops at a major online store, or uses a large social networking site, is the beneficiary of ever-evolving big data technology. Moreover, scientific research projects, such as weather modeling, astronomical surveys, and genome decoding, are making great strides thanks to new big data technologies. A great deal of research on big data is going on today, looking for new and better ways to understand it, handle it, and take advantage of it.
The concept of rich data goes back twenty years or so. Back then it was used to describe the document-centric data model used in its Lotus Notes® software. In the typical relational database model the individual datum is generally small – one cell in a row-column (record-field) structure. In contrast, rich data is more of a kitchen sink model. The “datum” is quite open-ended. It can include conventional data records, rich text documents, files, images, audio, video, and just about any other kind of digitized information.
For the sake of simplicity, we can consider rich data as covering the entire range of digital formats. A critical issue nowadays for both small and large companies is the ability to locate files and documents. A large company recently started a major initiative with the goal of saving each employee an average of five to ten minutes a day in searching for internal information. Of course Windows Explorer will search into text files and Office documents. SharePoint and other tools add PDF files to the searchable list. However, what happens when the organization has files on hundreds, perhaps thousands of servers? What if the access permissions block out those who need the information in many cases? There are many issues here.
Of course not all files are Office documents or PDFs that can be searched as text. Many software applications generate unique file formats. Even if these are essentially textual, the text may not be retrievable by standard search tools outside the native applications. Even then, the native applications may not be very good at locating native files, much less at searching through them.
Furthermore, search-ability means more than just finding text. What if you have a whole bunch of image files (like JPGs or PNGs) and you want to find pictures with cars in them, using advanced pattern recognition? What if YouTube wants to scan video files to find which ones are violating copyrights? What if you want to find audio files matching a particular voiceprint, or containing the word “Hadoop”? What if you want to scan streaming content, rather than static files? For example, you want to scan all major cable network feeds to identify musical performances or to profile commercial content.
Rich data, like the other sources, are extremely important reservoirs of potential BI. The tools are becoming increasingly sophisticated, but demands are growing, and the challenges are daunting.
The Analysis and Processing of BI
This blog entry is limited to a brief discussion of some of the important sources of BI. The analysis and processing that turns the data into intelligent information, decision support, actionable results, or valid system inputs might involve many diverse approaches. These might include relatively simple analysis, more complex logic and algorithms, advanced calculations and statistics, and even artificial intelligence. The end result might be a set of reports, a dashboard, a huge console of many screens in a war room, or a major strategic initiative for organizational improvement – or the results may simply feed automatically into other systems that consume them.
To work effectively with BI requires a range of management consulting, IT, and math and statistics skills. To use BI to effect significant organizational change requires a rich and powerful methodology.
The Road to Business Intelligence
BI is a vast subject area, with topics that are far beyond our scope here. Our goal has been to shed some light on BI by distinguishing some of its sources. While most readers will be familiar with surveys and data mining, they may not be familiar with some of the other sources, at least not in terms of how they connect with BI.
There are a number of common features among the different BI sources, and there is of course some overlap. For example, social networking and rich data sources may turn into big data. Nevertheless, these different sources of BI represent some of the roads into BI which may be particularly useful for specific needs and applications. If you are an organizational leader, perhaps this discussion will stimulate your thinking about ways in which you can better exploit BI for profit, sustainability, quality, and consistency.
Copyright © 2012 Patrick D. Russell