Tutorial at the 25th International WWW conference

12th of April 2016, Montreal

This tutorial aims at outlining fundamental methods for studying categorical sequences on the Web. Categorical sequences can refer to any kind of transitional data between a set of states, for example human navigation (transitions) between Web sites (states). Presented methods focus on sequential pattern mining, modeling and inference aiming at better understanding the production of sequences. A core model utilized in this tutorial is the Markov chain model. We hope that this tutorial raises interest and awareness of the field at hand and provides participants with basic tools for analyzing sequential user behavior on the Web.

The tutorial is structured into 4 parts:

In this tutorial, we provide Python code in the form of jupyter notebooks. These notebooks will be used throughout the tutorial but should also give attendees the opportunity to try things out and recap the material later on.

The code can be found on github.

Running the notebooks

You can run/study the notebooks by the following three options:

Using the interactive notebook environment is probably the simplest way of running the notebooks. If you prefer to setup the notebooks on your notebook server, you should use Python 2.7 and the following list of packages need to be installed (e.g., by using pip): numpy, scipy and scikit-learn. We recommend to use the Anaconda Python distribution that already includes the necessary packages.

Sequential Pattern Mining (Part 2):

Markov Chain Modeling (Part 3):

Comparing Hypotheses about Sequential Data (HypTrails) (Part 4):

Key References: