Challenge Rules


The RecSys Challenge 2018 is organized by Spotify, The University of Massachusetts, Amherst, and Johannes Kepler University, Linz. The topic of this year’s challenge is automatic playlist continuation . Given a user playlist containing some number of seed tracks, participants will generate a list of recommended tracks for continuing that playlist. For this purpose, Spotify has produced the Million Playlist Dataset, a dataset consisting of one million user-created playlists and associated metadata (to be used as a “training set” for participants’ systems). In the challenge, participants have the unique opportunity to explore the interesting characteristics of this new dataset, and to develop novel algorithms for the task of playlist continuation.

The challenge is split into two parallel challenge tracks. In the main track, teams can only use data that is provided through the Million Playlist Dataset, while in the creative track we encourage the participants to use external, public and freely available data sources to boost their system.

The task

The goal of the challenge is to develop a system for the task of automatic playlist continuation. Given a set of playlist features, participants’ systems shall generate a list of recommended tracks that can be added to that playlist, thereby ‘continuing’ the playlist. We define the task formally as follows:


A user-created playlist, represented by:


Note that the system should also be able to cope with playlists for which no initial seed tracks are given. To assess the performance of a submission, the output track predictions are compared to the ground truth tracks (“reference set”) from the original playlist.

Participation by non-academic researchers

We want to allow participation by all researchers. However we find it challenging to release a comprehensive dataset while protecting both our users' privacy and our intellectual property in an extremely competitive environment. Restricting the release of the dataset to academic researchers is the best solution we could find that would allow us to conduct this challenge. To that end, we allow non-academic researchers to participate as advisors under the condition that the advisor does not get access to the full Million Playlist Dataset. Here are the details:

Challenge dataset and submission format

As part of the challenge, we release a separate challenge dataset (“test set”) that consists of 10,000 playlists with incomplete information. It has many of the same data fields and follows the same structure as the Million Playlist Dataset, but the playlists only include K tracks.

For each playlist in the challenge set, participants will submit a ranked list of 500 recommended track URIs . The file format should be a gzipped csv (.csv.gz) file . The order of the recommended tracks matters : more relevant recommendations should appear first in the list. Submissions should be made in the following comma-separated format:

Important notes about submissions:

A sample submission (sample_submission.csv) is included with the challenge set. The sample shows the expected format for your submission to the challenge. Also included with the challenge set is a python program called You can use this program to verify that your submission is properly formatted. See the challenge set README file for more information on how to verify and submit your challenge results.


Submissions will be evaluated using the following metrics. All metrics will be evaluated at both the track level (exact track must match) and the artist level (any track by that artist is a match). In the following, we denote the ground truth set of tracks by \( G \), and the ordered list of recommended tracks by \( R \). The size of a set or list is denoted by \( |\cdot| \), and we use from:to-subscripts to index a list. In the case of ties on individual metrics, earlier submissions are ranked higher.


R-precision is the number of retrieved relevant tracks divided by the number of known relevant tracks (i.e., the number of withheld tracks):

\[ \text{R-precision} = \frac{\left| G \cap R_{1:|G|} \right|}{|G|}. \]

The metric is averaged across all playlists in the challenge set. This metric rewards total number of retrieved relevant tracks (regardless of order).

Normalized discounted cumulative gain (NDCG)

Discounted cumulative gain (DCG) measures the ranking quality of the recommended tracks, increasing when relevant tracks are placed higher in the list. Normalized DCG (NDCG) is determined by calculating the DCG and dividing it by the ideal DCG in which the recommended tracks are perfectly ranked:

\[ DCG = rel_1 + \sum_{i=2}^{|R|} \frac{rel_i}{\log_2 (i + 1)}. \]

The ideal DCG or IDCG is, on our case, equal to:

\[ IDCG = 1 + \sum_{i=2}^{|G|} \frac{1}{\log_2 (i + 1)}. \]

If the size of the set intersection of \( G \) and \( R \), is empty, then the DCG is equal to 0. The NDCG metric is now calculated as:

\[ NDCG = \frac{DCG}{IDCG}. \]

Recommended Songs clicks

Recommended Songs is a Spotify feature that, given a set of tracks in a playlist, recommends 10 tracks to add to the playlist. The list can be refreshed to produce 10 more tracks. Recommended Songs clicks is the number of refreshes needed before a relevant track is encountered. It is calculated as follows:

\[ \text{clicks} = \left\lfloor \frac{ \arg\min_i \{ R_i\colon R_i \in G|\} - 1}{10} \right\rfloor \]

If the metric does not exist (i.e. if there is no relevant track in \( R \)), a value of 51 is picked (which is 1 + the maximum number of clicks possible).

Rank aggregation

Final rankings will be computed by using the Borda Count election strategy. For each of the rankings of p participants according to R-precision, NDCG, and Recommended Songs clicks, the top ranked system receives p points, the second system receives p-1 points, and so on. The participant with the most total points wins. In the case of ties, we use top-down comparison: compare the number of 1st place positions between the systems, then 2nd place positions, and so on.

General rules

The following general challenge rules are applicable for both the main track and creative track. Please refer to the license agreement for further explanations.

  1. Each person participating in the challenge only has one account on the submission website and is part of only one team. If this rule is violated, all involved teams will be disqualified from the challenge.
  2. Each team can only make 1 submission per day per track. The last submission day is June 30 2018.
  3. Team mergers are allowed and should be communicated with the challenge organizers. The merged teams form one team from then on.
  4. Privately and publicly sharing source code between teams is not allowed, unless the teams are merged.
  5. The leaderboard is calculated and updated daily based on 50% of the challenge dataset. When the challenge is finished, the entire leaderboard is recalculated using the complete challenge dataset and the final submission from each team. The winning teams are selected based on the final leaderboard.
  6. It is strictly prohibited to utilize Spotify search and playlist services (via the web API or any Spotify client) for any reason whatsoever (including but not limited to: getting additional data to train, cross-validate, tune, or test algorithms for the challenge).
  7. Open source code: at the end of the challenge, each team is required to open source the source code that was used to generate their final challenge solution.
  8. Selected top teams will be invited to submit a paper to the RecSys Challenge workshop describing the algorithms that they developed for this challenge. Teams without a paper submission will be removed from the final leaderboard, and do not receive an award.
  9. Ask us: if you are not sure whether something is allowed or not, do not hesitate to contact us! We can quickly give you the information you need.

Track-specific Rules

This year we will organize two parallel challenge tracks, for which different rules apply.

Main track

Teams that participate in the main track can only use data that is provided through the Million Playlist Dataset. No external data sources can be consulted to complete the challenge. This includes pretrained models and public Spotify data that is not present in the dataset.

Creative track

In the creative track, participating teams are allowed to use external data sources to develop their systems. These data sources, however, should be public and freely accessible by all participants. Teams are obliged to release a complete list of all data sources and the purpose for which they have been used, accompanied by source code and documentation that describes how to obtain this data, if applicable.


For each track, the top three teams will be awarded a share of prize money. The top team will receive $4,000, the second $2,000, and the third $1,000.