Analyzing Sequential User Behavior on the Web

Tutorial at the 25th International WWW conference

12th of April 2016, Montreal

This tutorial aims at outlining fundamental methods for studying categorical sequences on the Web. Categorical sequences can refer to any kind of transitional data between a set of states, for example human navigation (transitions) between Web sites (states). Presented methods focus on sequential pattern mining, modeling and inference aiming at better understanding the production of sequences. A core model utilized in this tutorial is the Markov chain model. We hope that this tutorial raises interest and awareness of the field at hand and provides participants with basic tools for analyzing sequential user behavior on the Web.

The tutorial is structured into 4 parts:

In this tutorial, we provide Python code in the form of jupyter notebooks. These notebooks will be used throughout the tutorial but should also give attendees the opportunity to try things out and recap the material later on.

The code can be found on github.

Running the notebooks

You can run/study the notebooks by the following three options:

Interactive notebook environment on mybinder
Rendered HTML notebooks on nbviewer
Running code/notebooks on your own: Jupyter notebook server, Anaconda Python distribution

Using the interactive notebook environment is probably the simplest way of running the notebooks. If you prefer to setup the notebooks on your notebook server, you should use Python 2.7 and the following list of packages need to be installed (e.g., by using pip): numpy, scipy and scikit-learn. We recommend to use the Anaconda Python distribution that already includes the necessary packages.

Sequential Pattern Mining (Part 2):

Jupyter Notebook: [github] [mybinder] [nbviewer]

Markov Chain Modeling (Part 3):

Jupyter Notebook: [github] [mybinder] [nbviewer]

Comparing Hypotheses about Sequential Data (HypTrails) (Part 4):

Jupyter Notebook: [github] [mybinder] [nbviewer]
Additional notebook: [nbviewer]

Key References:

Mooney, C. H., & Roddick, J. F. (2013). Sequential pattern mining--approaches and algorithms. ACM Computing Surveys (CSUR), 45(2), 19.
Singer, P., Helic, D., Taraghi, B., & Strohmaier, M. (2014). Detecting memory and structure in human navigation patterns using markov chain models of varying order. PloS one, 9(7), e102070.
Singer, P., Helic, D., Hotho, A., & Strohmaier, M. (2015, May). Hyptrails: A bayesian approach for comparing hypotheses about human trails on the web. In Proceedings of the 24th International Conference on World Wide Web (pp. 1003-1013). International World Wide Web Conferences Steering Committee.

Philipp Singer
philipp.singer@gesis.org
http://www.philippsinger.info/
@ph_singer
Florian Lemmerich
florian.lemmerich@gesis.org
http://florian.lemmerich.net/
@f_lemmerich

CC image courtesy of user puliarfanita on Flickr

Analyzing Sequential User Behavior on the Web

Tutorial at the 25th International WWW conference

Agenda and Slides

Source Code and Notebooks

References

Presenters