A Quantitative analysis of digital library user behaviour based on access logs
Abstract
This paper quantitatively analyzes the usage of Gallica, a website platform for accessing the digital library of the Bibliothèque nationale de France (BnF). Our approach relies on the access logs retrieved from the Apache HTTP Servers of Gallica. The server access logs record all requests processed by the server and thus, contain the web pages and the timestamp of the requests along with the corresponding IP of the users. These access logs are augmented with additional structured data via The Open Archives Initiative Protocol for Metadata Harvesting in order to store, when it is possible, the metadata of consulted documents. Beyond straightforward statistics (such as the duration of a session, the number of documents consulted by each session, the most popular type of documents over all the Gallica users), our research aims to model user navigational behaviours by a Mixture of Continuous-Time Markov Chains. This model allows to cluster users into classes of typical paths of navigation on Gallica. The results provide relevant insights on the way the users interact with the interface of Gallica, highlighting the mean duration of some actions such as the interaction with the search engine or the consultation of documents.
Even if our approach requires the use of additional informations in order to properly interpret the models and the correlation that it highlights, it allows the integration of all types of behaviour, including the most stealthy and the most difficult to catch in traditional surveys, giving them their fair weight in terms of audience.