Data Mining


What is Data Mining?

Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both.

Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.


How Data Mining Works

How is data mining able to tell you important things that you didn't know or what is going to happen next? That technique that is used to perform these feats is called modeling. Modeling is simply the act of building a model (a set of examples or a mathematical relationship) based on data from situations where the answer is known and then applying the model to other situations where the answers aren't known. Modeling techniques have been around for centuries, of course, but it is only recently that data storage and communication capabilities required to collect and store huge amounts of data, and the computational power to automate modeling techniques to work directly on the data, have been available.

As a simple example of building a model, consider the director of marketing for a telecommunications company. He would like to focus his marketing and sales efforts on segments of the population most likely to become big users of long distance services. He knows a lot about his customers, but it is impossible to discern the common characteristics of his best customers because there are so many variables. From his existing database of customers, which contains information such as age, sex, credit history, income, zip code, occupation, etc., he can use data mining tools, such as neural networks, to identify the characteristics of those customers who make lots of long distance calls. For instance, he might learn that his best customers are unmarried females between the age of 34 and 42 who make in excess of $60,000 per year. This, then, is his model for high value customers, and he would budget his marketing efforts to accordingly.


What is data mining good for?

Data mining software allows users to analyze large databases to solve business decision problems. Data mining is, in some ways, an extension of statistics, with a few artificial intelligence and machine learning twists thrown in. Like statistics, data mining is not a business solution, it is just a technology.


Data Mining Examples

A simple example of data mining, often called Market Basket Analysis, is its use for retail sales. If a clothing store records the purchases of customers, a data mining system could identify those customers who favour silk shirts over cotton ones. Another is that of a supermarket chain who, through analysis of transactions over a long period of time, found that beer and diapers were often bought together. Although explaining this relationship may be difficult, taking advantage of it is easier, for example by placing the high-profit diapers in the store close to the high-profit beers. (This example is questioned at Beer and Nappies -- A Data Mining Urban Legend.) The two examples above deal with association rules within transaction-based data. Not all data is transaction based and logical or inexact rules may also be present within a database. In a manufacturing application, an inexact rule may state that 73% of products which have a specific defect or problem, will develop a secondary problem within the next 6 months.

More examples

One Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered information in various ways to increase revenue. For example, they could move the beer display closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays.

Bass Brewers is the leading beer producer in the UK and has a 23% of the market. The company has a reputation for great brands and good service but realised the importance of information in order to maintain a lead in the UK beer market. We've been brewing beer since 1777, with increased competition comes a demand to make faster, better informed decisions. Mike Fisher, IS director, Bass Brewers Bass decided to gather the data into a data warehouse on a system so that the users i.e. the decision-makers could have consistent, reliable, online information. Prior to this users could expect a turn around of 24 hours but with the new system the answers should be returned interactively. For the first time, people will be able to do data mining - ask questions we never dreamt we could get the answers to, look for patterns among the data we could never recognise before. Nigel Rowley, Information Infrastructure manager This commitment to data mining has given Bass a competitive edge when it comes to identifying market trends and taking advantage of this.


See also

 

For more about data warehousing:         Data warehousing     |     Data integration     |     Data mining

See our comprehensive range of other professional data cleansing software products at