Research

Impact of search latency on user engagement in web search
The users show high variation in the way they perceive the response latency of a web service as this depends on users’ demographics, context, and potentially many other factors. Therefore, search results could be served to each user at custom latencies depending on the estimated behavioural impact of latency on the users, while minimising the usage of hardware resources. In this project, I have been researching mechanisms for accurate prediction of user-perceived response latency as well as the impact of latency on user experience. In follow-up work, I plan to develop proper models for personalising response latency on a per-user basis, eventually aiming to achieve financial cost savings for the search engine company without hurting user engagement.

Ad retrieval
In this project, I developed methods for effective <query, ad> matching, to promote higher engagement with ads, improve CTR, and increase advertisement revenues in sponsored search. More specifically, I investigated latent associations between queries and ads using machine learning techniques and NLP features.

Predicting long-term user engagement
In this work, I investigated long-term engagement with a web site. In particular, I analysed the interactions of a large sample of users with the Yahoo News portal and characterised long-term user engagement by means of different metrics. I also investigated the feasibility of predicting long-term engagement, as well as anticipating customer churn, using short-term interaction signals as well as classification and regression techniques. The findings of this work have important implications for Internet companies, where long-term user engagement is likely to lead to better monetisation and customer retention (i.e., reduced churn).

Modelling news article quality
The goal of this project is to identify certain proxies for characterising the editorial quality of news articles in an automatic and scalable way. In order to learn models that can predict accurately the quality of news articles, I generated a ground-truth dataset with the help of expert judges. To this end, I performed an editorial study and created an in-domain, annotated news corpus. Using this annotated news corpus, I modelled news article quality witht the help of shallow, syntactic, probabilistic and more complex features. Currently, I am researching methods for computing the reputation of named-entities that appear in news articles, as well as the degree of controversy in news.

Scalable mouse tracking analysis for inferring user intent
The measurement of within-content engagement remains a difficult and unsolved task, partly because of the lack of standardised, well-validated methods of measurement, especially in an online context. To this end, I performed a study where I observed how users interact with online news in the presence or lack of interest. I collected mouse tracking data, which are known to correlate with visual attention, and examined how cursor behaviour can inform models of user engagement using unsupervised learning methods.

Knowledge Module engagement
Existing approaches to measuring user engagement with specific SERP modules are limited to traditional metrics like click-through rate or dwell time, which are not available or suitable in the context of the knowledge module. To address this gap, I examined what fraction of the daily Yahoo search users notice it, and to what extent they perceive it as a useful aid to their search activities. I also collected and analysed mouse cursor interactions over a number of SERPs, and investigated the feasibility of predicting the answer to the above questions by using implicit, low-cost, and scalable feedback signals.

Discovery and localisation of points of interest
This project involves accurately locating points of interest visible in photos, by exploiting the location information and compass orientation supplied by modern photo cameras. This is accomplished by analysing the fields of view of the cameras capturing the scenes, while taking the inaccuracy of the sensors providing these measurements into account. My contribution involves a rigorous evaluation of the models’ performance against different baselines, using machine learning and statistical analysis techniques.

Linking entities to past articles
One aspect of engagement deals with keeping users on the site longer, by allowing them to navigate through content with enhanced, click-through experiences. So far these links have been manually curated by professional editors, and due to the manual effort involved, the use of such links has been limited. To address this issue, I participated in the design and evaluation of an automated approach to detecting and linking newsworthy events to associated articles.

© Ioannis Arapakis 2016 with help from Bootstrap