Challenge Set Readme

Version 1, February 13, 2018

This is the challenge set for the RecSys Challenge 2018.

This challenge set contains 10,000 incomplete playlists. The challenge is to recommend tracks for each of these playlists. See recsys-challenge.com for challenge details.

Format

The challenge set consists of a single JSON dictionary with three fields:

Playlist Challenge Categories

The 10,000 playlists are made up of 10 different challenge categories, with 1,000 playlists in each category:

  1. Predict tracks for a playlist given its title only
  2. Predict tracks for a playlist given its title and the first track
  3. Predict tracks for a playlist given its title and the first 5 tracks
  4. Predict tracks for a playlist given its first 5 tracks (no title)
  5. Predict tracks for a playlist given its title and the first 10 tracks
  6. Predict tracks for a playlist given its first ten tracks (no title)
  7. Predict tracks for a playlist given its title and the first 25 tracks
  8. Predict tracks for a playlist given its title and 25 random tracks
  9. Predict tracks for a playlist given its title and the first 100 tracks
  10. Predict tracks for a playlist given its title and 100 random tracks

How the challenge set was built

The playlists in the challenge set are selected using the same criteria used to select playlists for the full Million Playlist Dataset. See the MPD https://recsys-challenge.spotify.com/readme for more details on how the playlists were selected. Additionally, playlists in the challenge set meet the following constraints:

Verifying the challenge set

To verify that you have an uncorrupted challenge set you can check its md5. E.g.

  % md5sum --check md5

  challenge_set.json: OK

Use check.py to verify that the challenge set is internally consistent.

% python check.py

stats:
   tests: 4634003
   errors: 0

  challenge_set.json is OK

Sample Submission

Included in the challenge set is a sample challenge submission:

 sample_submission.csv

This sample shows the expected format for your submission to the challenge. Your submssion should follow the following rules:

'pid' is the playlist id of the challenge playlist

You can verify that your submission is in the proper format as follows:

python verify_submission.py challenge_set.json sample_submission.csv
Note that the verify_submission.py that was shipped with the challenge set has two errors:
  1. Submissions should not be zipped prior to uploading to the leaderboard.
  2. The track and team name fields team_info field were swapped.
A corrected version of the program can be found at verify_submission.py.