The use of Self-Organizing Maps in Recommender Systems

A survey of the Recommender Systems field and a presentation of a State of the Art Highly Interactive Visual Movie Recommender System

Sam Gabrielsson and Stefan Gabrielsson

A Uppsala Master's Thesis in Computer Science 20p , submitted on August 2006 to the Department of Information Technology at the Division of Computer Systems for the degree of Master of Science in Computer Science at Uppsala University .

Reviewer: Olle Gällmo
Supervisor: Filip Mican, Discshop Svenska Näthandel AB
Examiner: Anders Jansson

Abstract

Is one's entire psyche's most secret landscape really a fairly public thing, given just a few pieces of information?

Douglas R. Hofstadter

This is a thesis about recommender systems , and the different approaches recommender systems use to solve the information overload problem . Our focus lies on two common approaches, collaborative filtering and content based filtering . Both of these approaches have their weaknesses and strengths. To overcome the weaknesses of each approach, various hybrid filters have been developed. We will start by analyzing these three approaches based on previous research literature and will then proceed to implement different variants of these approaches, including our own filtering approach for the movie domain. These implementations will be done in Java and open sourced for further development by other researchers in this area. The results will be evaluated and compared against previous research in this area in order to validate our implementations. Evaluation will be done by using standard metrics that are commonly used for evaluating the accuracy of recommender systems.

Various algorithms from the machine learning community have been used in the effort to improve and solve some of the problems in the previously mentioned approaches. We will concentrate on one such algorithm, Kohonen's self-organizing map algorithm. The self-organizing map algorithm is an unsupervised learning algorithm which we believe is suitable for recommender systems in the movie domain. Our implementation of this algorithm will be used together with collaborative filtering approaches in the effort of designing a recommender system for movies. The result of this approach will be evaluated and compared against the results from our previous implementations and discussed in the context of previous results from the recommender systems research community.

Evaluating the effectiveness of recommender systems is often done by analyzing the accuracy of the recommendations produced by the techniques used to implement the different approaches. However, the goal for a recommender system is not only to give accurate recommendations but also to conceive to the user trust and encourage the user to explore the recommendations. This is more of an interface issue than an algorithmic issue, we have chosen to call this the recommendation interface problem . Similar conclusions have been drawn by other researchers and different attempts to solve this has been done. We will summarize and discuss proposed solutions. We will introduce and describe what we call visual recommendations , and show how this approach solves the recommendation interface problem by creating a visual recommender system called MOVSOM.

The testing and evaluation will be done on the well used MovieLens dataset, as well on a larger dataset taken from an e-commerce site selling DVDs, together with movie attributes provided by the IMDb.

Our empirical evaluation results shows that MOVSOM produces recommendations of movies that are comparable to state of the art techniques and with the combination of our solution to the recommendation interface problem we believe that this approach has a very promising future as a recommender system for movies.

I love deadlines. I like the whooshing sound they make as they fly by.

Douglas Adams

To all the makers of TV-series and movies in the world!

Thanks for the encouragement, Bill!

...thanks for introducing us to the SOM, Olle!

...thanks for the grammar checks, Linda!

and a thanks to Joss Whedon! ;)

Abbreviations

BMN: Best matching node
CBF: Content based filtering
CCA: Curvlinear component analysis
CF: Collaborative filtering
IDF: Inverse document frequency
MDS: Multidimensional scaling
PCA: Principal component analysis
SOM: Self-Organizing Map
VP: Vector projection
VQ: Vector quantization
VQ-P: Vector quantization-projection

Introduction

The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' (I found it!) but 'That's funny...'

Isaac Asimov

Recommender systems are one solution to the information overload problem. Other solutions can for example be found in the field of information retrieval and information filtering, where search engines like Google and various text-retrieval applications are successful applications developed to deal with the problem. Recommender systems emerged as an independent research area around the mid 90's and has since then become a broad research topic with substantial integration of machine learning and information filtering. E-commerce sites use recommender systems for making recommendations that are tailored to a user's preferences while at the same time increasing their own sales. Although recommendations can be non-personalized , like "customers who bought this item also bought these items" or popularity based, e.g. top lists, resulting in the same recommendations to all customers, recommender systems try to make recommendations as personalized as possible.

The main goal for a recommender system is to provide the user with recommendations that reflect the user's personal taste and to convince the user to trust and explore the given recommendations. Applications of this has been successfully used on the Internet by e-commerce sites like Amazon.com, that offers millions of products to its customers, and by communities in the entertainment domain like MovieLens(*) , a research project that runs a website where people can become members and receive recommendations for movies.

Recommender systems that make personal recommendations achieve their goal by maintaining profiles for the users that consists of their preferences. The profiles are used as filters, only items that match user's preferences will slip through and be presented as recommendations. Depending on the domain, items can be books, movies, WebPages, Usenet news articles, etc.

Throughout the text we will be referring to the user that gets a recommendation as the active user and the item that is recommended as the active item .

Two main filtering approaches for making personalized recommendations that have emerged over the years are collaborative filtering (CF) and content based filtering (CBF). CF uses the opinions of other users, that are similar to the active user, as a filter. CBF only uses the preferences of the active user as a filter. In this thesis we will try out different established techniques that use both of the approaches, as well as some techniques suggested by ourselves. We will for example develop our own CBF for the movie domain where movies will be represented with attributes such as directors, actors, genres, etc. Movies will be matched against movies in a user's profile, where similarity is taken as a linear combination over the attributes, and the movies with highest similarity will be recommended.

The CF and CBF approach both have their weaknesses and strengths. With knowledge of issues related to one approach that aren't present in the other, research has been done on how to best combine these two approaches into hybrid filters .

Different methods have been tried to combine the two approaches into new techniques, many taken from the field of artificial intelligence. We will use a technique from the field of artificial neural networks called Kohonen's self-organizing map (SOM). The SOM is a mapping from high dimensional data onto e.g. a two dimensional grid, that preserves the neighborhoods of the original data. This allows us to cluster and form neighborhoods of users and items on large two dimensional maps. A user map will be used in combination with CF, i.e. users in the active users neighborhood on the map will be used as recommenders for the active user. An item map will be used in combination with CBF, i.e. movies that are similar to the movies in the active user's profile will be recommended to the user. Additionally a heuristic goodness function will be used that based on the recommendations from each map weights how safe and serendipitous the recommendations are.

We believe that a good recommender system should be able to give recommendations that are new, unexpected, trust inspiring and somewhat transparent, so that the users can see the logic behind why a particular item was recommended. To show the logic behind a recommendation we will show the users the maps produced by the SOM algorithm. We believe that if a user can zoom in on the recommenders profiles and the recommended movie's neighborhood, then this will be a sufficient explanation for why the recommendation was made.

Contributions

The main objective of this thesis is to evaluate the usability of Kohonen's SOM in the construction of a recommender system for movies. The reason for this is two-fold; some previously known problems with content based filtering and collaborative filtering can be solved, and secondly, the visualization ability allows the motivations behind recommendations to be made transparent to the users in a very simply and intuitive manner.

Our contributions are:

We will release new datasets(*) that can be used for testing recommender systems based on transaction data, rating data and attribute data.
We will give a detailed analyzis of the datasets we provide.
We will configure the MovieLens dataset so it can be used together with IMDb attribute data.
We will release a recommendation technique "framework"(*) in Java that can be used by others to test their similarity and prediction algorithms.
We will implement common recommendation techniques and provide detailed evaluations on clearly described comparable evaluation datasets.
We will implement our own public experimental recommender system(*) for movies based on the SOM, with focus on visual recommendations and new ways for users to explore recommendations.

Outline

In Part I, the first chapter "Historical development and related work" surveys the historical development that resulted in today's recommender systems by going through and presenting related work. Issues related to recommender systems will be discussed in the context of others research. The chapter ends with a presentation of a few well known recommender systems. In the next chapter "Implementation and evaluation of common recommendation techniques" we present and evaluate implementations of common recommendation techniques, as well as a few variations of our own. We provide a thorough statistical analysis of the data we use for evaluation. We provide a detailed explanation of various common evaluation metrics used for evaluating recommender systems, and we discuss how we are going to use them to evaluate the recommendation techniques we have implemented. We end this part with a presentation of the results from the evaluations.

Part II begins with the chapter "Artificial neural networks" , which serves as an introduction to the field. Commonly used learning algorithms will be discussed as well as different neural networks. The chapter ends with a brief historical survey of the field of artificial neural networks. This is followed by the chapter "Self-Organizing Maps" where we introduce Kohonen's SOM algorithm both as a biologically inspired algorithm and as a clustering and projection algorithm. Its visualization capabilities will be discussed as well as some other related issues regarding the SOM. The chapter ends with a presentation of some related work in the area of using the SOM in the recommendation process. In the final chapter of Part II, with the title "Implementation and evaluation of SOM based recommendation techniques" we will implement and evaluate a few recommendation techniques based on the SOM.

Some of these techniques from Part II will be used again in Part III which begins with the chapter "Trust in recommender systems" in which we will discuss the design and trust issues regarding recommender systems by presenting some results from the literature as well as our own solutions to these issues. This is followed by a chapter on what we refere to as "Visual recommendations" . Finally we come to the last chapter entitled "MOVSOM - State of the art" where we describe the implementation of a experimental recommender system called MOVSOM. MOVSOM relies heavily on visualizing important aspects of the recommendation process and by that conceiving to the user trust and encouraging the user to explore the recommendations further. MOVSOM uses the topological ordering property of the SOM as well as its clustering and projection capabilities, which are implemented and evaluated in Part II. The benefits of MOVSOM will be discussed and a few similar web-based solutions in other domains will also be examined and presented. Ideas for future work on the MOVSOM will then be presented. In the end we conclude.

Part I - Recommender Systems

Those people who think they know
everything are a great annoyance
to those of us who do.

Isaac Asimov

This part consists of two chapters. The first is theoretical, in which necessary concepts are explained regarding the field of recommender systems. We begin by presenting the development of the field, from its roots in information retrieval to where it is today, a field of its own, influenced by many other. Historical important events and work done by others are presented. A detailed analysis of various recommendation techniques are given and a brief discussion of problems and challenges regarding some of these techniques are presented. The chapter ends with a presentation of some well known websites and how they use recommender systems on their sites.

In chapter two, recommendation techniques that we have chosen to implement are evaluated and the results discussed. Different measures for evaluating recommender systems are presented and used. All recommendation techniques are presented in clearly written pseudo code.

Chapter 1 Historical development and related work

Chapter 2 Implementation and evaluation of common recommendation techniques

Part II - Self-Organization applied to Recommender Systems

Progress isn't made by early risers.
It's made by lazy men trying to find
easier ways to do something.

Robert Heinlein

This part consists of three chapters. The first two are theoretical, giving the necessary background for chapter three, in which we use the knowledge from the previous chapters in our attempt to implement and evaluate recommender systems based on the SOM paradigm.

The purpose of the first chapter is to give the reader a basic understanding in what artificial neural networks are. Motivations behind the development of artifical neural networks are discussed, the basic biological neuron and the artificial neuron are described, network architectures and learning processes are outlined, and a brief historical summary of some of the main events in the field of artifical neural networks is given.

The second chapter deals with self-organizing systems in which Kohonen's SOM is the main topic. Self-organization as a biological phenomena is discussed and the requirements for self-organization are described. Then, after a short introduction to the SOM paradigm, the incremental SOM algorithm is outlined in detail together with a presentation of how parameters for the algorithm should be selected, how to measure the quality of the SOM visualization and clustering issues in the SOM context. Some important properties regarding the final SOM is also presented and a discussion of the theoretical aspects of the SOM is given. Presentations of related work in the area of using the SOM paradigm in recommender systems are given at the end of the chapter.

Finally, in chapter three, we describes why we have chosen to use the SOM paradigm as a part of a recommender system by referring back to Part I and the issues regarding recommender systems that was presented there. We proceed by showing how we have implemented different recommender systems based on the SOM paradigm. This is done with pseudo code and descriptive commentaries of the code. The systems are then evaluated in the same manner as in Part I.

Chapter 3 Artificial neural networks

Chapter 4 Self-Organizing Maps

Chapter 5 Implementation and evaluation of SOM based recommendation techniques

Part III - The recommendation interface problem

Seeing, contrary to popular wisdom, isn't believing.
It's where belief stops, because it isn't needed any more.

Terry Pratchett

The first two chapters in this part deals with the recommendation interface problem in terms of trust and transparency , and how one way of solving these issues is with well explained recommendations. Interaction design for recommender systems are briefly discussed and the concept of visualization of recommendations are described. Previous studies done in the research community regarding these areas are discussed and presented. Next, follows a chapter presenting the MOVSOM, a highly interactive visual movie recommender system. We will describe the user interface of MOVSOM and its system architecture. Challenges concerning our approach will also be discussed. No empirical evaluation of the interface is made, in the sense of user interviews or tests, instead we highly encourage any reader of this thesis to visit MOVSOM's web-site(*) and judge for himself. In the end we conclude and present future work within the area of visual recommendations.

Chapter 6 Trust in recommender systems

The use of Self-Organizing Maps in Recommender Systems A survey of the Recommender Systems field and a presentation of a State of the Art Highly Interactive Visual Movie Recommender System