Web Mining

What is Web Mining?

Web mining is the integration of information gathered by traditional data mining methodologies and techniques with information gathered over the World Wide Web. (Mining means extracting something useful or valuable from a baser substance, such as mining gold from the earth.) Web mining is used to understand customer behavior, evaluate the effectiveness of a particular Web site, and help quantify the success of a marketing campaign.

Web usage mining

Web usage mining is the application that uses data mining to analyse and discover interesting patterns of userís usage data on the web. The usage data records the userís behaviour when the user browses or makes transactions on the web site. In order to better understand and serve the needs of users or Web-based applications.

It is an activity that involves the automatic discovery of patterns from one or more Web servers. Organizations often generate and collect large volumes of data; most of this information is usually generated automatically by Web servers and collected in server log. Analyzing such data can help these organizations to determine the value of particular customers, cross marketing strategies across products and the effectiveness of promotional campaigns, etc.

The first web analysis tools simply provided mechanisms to report user activity as recorded in the servers. Using such tools, it was possible to determine such information as the number of accesses to the server, the times or time intervals of visits as well as the domain names and the URLs of users of the Web server. However, in general, these tools provide little or no analysis of data relationships among the accessed files and directories within the Web space. Now more sophisticated techniques for discovery and analysis of patterns are now emerging. These tools fall into two main categories: Pattern Discovery Tools and Pattern Analysis Tools.

Application Areas of Web Mining

  • E-commerce
  • Search Engines
  • Personalisation
  • Website Design


Text mining models tend to be very large. A model that attempts to classify, for instance, news stories using Support Vector Machines or the NaÔve Bayes algorithm will be very large, in the megabytes, and thus slow to load and evaluate. Concept mining models can be minute in comparison - hundreds of bytes.

For some applications, such as plagiarism detection, concept mining offers new possibilities. Where the plagiariser has been cunning enough to perform a thesaurus based substitution that will fool text comparison algorithms, the concepts in a document will be relatively unchanged. So 'the cat sat on the mat' and 'the feline squatted on the rug' appear very different from text mining algorithms, and nearly identical to concept mining algorithms.

See also


For more about data warehousing:         Data warehousing     |     Data integration     |     Data mining

See our comprehensive range of other professional data cleansing software products at