Sequence Mining

What is Sequence mining?

Sequence mining is concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence.
It is usually presumed that the values are discrete, and thus Time series mining is closely related, but usually considered a different activity. Sequence mining is a special case of structured data mining.

How Sequence mining works

Finding patterns in sequences is a challenging problem. In many domains, data are represented as sequences. In the medical domain, symptoms exhibited by a patient can be ordered according to their occurrence in time, and some patterns can be found that relate a certain subsequence of symptoms with a particular disease. Also, genetic analysis must take into account the sequential nature of DNA. In the financial domain, the daily price of a stock during say a quarter or a year can be naturally represented as a sequence of values.

Finding patterns in stock sequences is valuable for predicting stock market prices. In the market analysis domain, finding patterns in the sequence of items (e.g. books or supermarket products) that a person buys will help us predicting the buying behavior of the person and target him/her for future sales of products.

In the WEB domain, finding patterns in the sequence of webpages that a person visits, helps in predicting which pages the person will visit next. That is useful to recommend links to the person as well as for organizing web sites. These are just a few examples of domains in which sequences are the natural way of representing information and in which finding patterns on those sequences is of great importance.

Parallel Algorithms for Sequence Mining

Discovery of sequential patterns is becoming increasingly useful and essential in many scientific and commercial domains. Enormous sizes of available datasets and possibly large number of mined patterns demand efficient and scalable algorithms.

