We all know the drill. When we sign into Netflix, HBO or TV2 Play, we are presented with a wealth of options. The specific selection of titles that you are offered is made by advanced and undisclosed algorithms.
However, within current research on the development of recommendation systems, an ongoing challenge is been the lack of data where users have explicitly stated their preferences – such as ”I don’t like romantic dramas” or ”I love everything by Woody Allen”.
As part of their master’s thesis, a group of software students at Aalborg University have attempted to close exactly this gap. They have built MindReader, which is part recommendation platform, but also part data-gathering tool to allow other researchers to test entirely new strategies within machine learning and information processing:
- Recommendation systems are machine-learning models aimed at deducing a specific user’s preferences on the basis of data. But within research, the current systems are built on very large and widely used datasets that are only concerned with users’ ratings of movies. They do not, for instance, take into account what the user thinks of a certain director, actor or genre, says Anders Højlund Brams, originator of MindReader along with Anders Langballe Jakobsen and Theis Erik Jendal. He continues:
- And it is incredibly hard to predict whether someone will like a specific movie just from his or her ratings of other movies. The result is a lot of guesswork, and it means that the systems end up recommending movies that you might not actually like.
We know what the user thinks of Matt Damon
In order to build the system, the three students have taken their point of departure in so-called knowledge graphs that enable the interrelations of huge amounts of information on movies using data such as director, actor and genre. This provides a comprehensive overview of often complex interrelations. They have also sought guidance from Professor Katja Hose, Assistant Professor Peter Dolog and post.doc. Matteo Lissandrini, all of whom work with knowledge graphs and recommendation systems at the Department of Computer Science.
In practice, users in MindReader are asked, anonymously, to state their preferences on everything from children’s movies to Danish actors and English thrillers. Eventually, the system will present a number of recommendations of movies you might watch – and movies you should not waste your time on.
- With the dataset that our users have built, researchers for the very first time have access to a user’s specific preference with regards to for instance horror or action movies starring Matt Damon – not just what they think of the movies, but what they think of the genres and the actor separately. With this information, the system can provide better recommendations and at the same time save a lot of time, as exploring every single aspect of the movies a user likes is no longer necessary. Now, it is just a matter of letting researchers figure out how to use this new information most efficiently.
Most people have an opinion on genres
The point where MindReader really shows its worth is in situations when a new user that the system does not know needs recommendations. There are many ways to handle this – for instance, the system can utilize metadata such as gender and social network – but another popular strategy is to carry out an interview where the user is presented with a few questions in order to deduce the user’s preferences as fast as possible.
In this situation, it is important to keep the interview as short as possible in order not to make the experience unpleasant to the user, and this is where machine learning can be applied to figure out what questions to ask in order to explore a user’s preferences most efficiently from the lowest possible number of questions.
- We are currently working on training models for creating this specific kind of interview. In the response statistics at MindReader, we can see that the items that most often generate a “don’t-know” response are movies and to a certain degree actors, but on the other hand, users almost always have an opinion on the genres they are asked about.
To this date, more than 2000 users have used MindReader and rated more than 180,000 items, but the three students hope that even more people will want to try out the system:
- Being able to offer well-functioning recommendation tools is crucial to streaming services. It would be really cool if the dataset that MindReader’s users have built could be utilized broadly by researchers, and if a tech giant such as Netflix could use it for developing better screening processes for their new users. The more people who try out the system, the better. Because the more data, the better the chance of finding an optimal solution with regards to the user’s experience and saved time, says Anders Højlund Brams.
- Try out MindReader and contribute to the development of the system
- Read the three students’ article outlining the construction of MindReader
- Want to know more about AAU software educations?
- The full MindReader dataset is available for download
For more information, contact Anders Langballe Jakobsen: +45 2757 2038
Press: Nina Hermansen, email@example.com: +45 2090 1829