Analyzing Sequential User Behavior on the Web
Tutorial at the 25th International WWW conference
12th of April 2016, Montreal
This tutorial aims at outlining fundamental methods for studying categorical sequences on the Web. Categorical sequences can refer to any kind of transitional data between a set of states, for example human navigation (transitions) between Web sites (states). Presented methods focus on sequential pattern mining, modeling and inference aiming at better understanding the production of sequences. A core model utilized in this tutorial is the Markov chain model. We hope that this tutorial raises interest and awareness of the field at hand and provides participants with basic tools for analyzing sequential user behavior on the Web.
Source Code and Notebooks
In this tutorial, we provide Python code in the form of jupyter notebooks. These notebooks will be used throughout the tutorial but should also give attendees the opportunity to try things out and recap the material later on.
The code can be found on github.
Running the notebooks
You can run/study the notebooks by the following three options:
- Interactive notebook environment on mybinder
- Rendered HTML notebooks on nbviewer
- Running code/notebooks on your own: Jupyter notebook server, Anaconda Python distribution
Using the interactive notebook environment is probably the simplest way of running the notebooks. If you prefer to setup the notebooks on your notebook server, you should use Python 2.7 and the following list of packages need to be installed (e.g., by using pip): numpy, scipy and scikit-learn. We recommend to use the Anaconda Python distribution that already includes the necessary packages.
Sequential Pattern Mining (Part 2):
- Jupyter Notebook: [github] [mybinder] [nbviewer]
Markov Chain Modeling (Part 3):
- Jupyter Notebook: [github] [mybinder] [nbviewer]
Comparing Hypotheses about Sequential Data (HypTrails) (Part 4):
- Jupyter Notebook: [github] [mybinder] [nbviewer]
- Additional notebook: [nbviewer]
References
Key References:
- Mooney, C. H., & Roddick, J. F. (2013). Sequential pattern mining--approaches and algorithms. ACM Computing Surveys (CSUR), 45(2), 19.
- Singer, P., Helic, D., Taraghi, B., & Strohmaier, M. (2014). Detecting memory and structure in human navigation patterns using markov chain models of varying order. PloS one, 9(7), e102070.
- Singer, P., Helic, D., Hotho, A., & Strohmaier, M. (2015, May). Hyptrails: A bayesian approach for comparing hypotheses about human trails on the web. In Proceedings of the 24th International Conference on World Wide Web (pp. 1003-1013). International World Wide Web Conferences Steering Committee.