TIL: Pulling Puzzles from Lichess

by Graham Marlow

Lichess is an awesome website, made even more awesome by the fact that it is free and open source. Perhaps lesser known is that the entire Lichess puzzle database is available for free download under the Creative Commons CC0 license. Every puzzle that you normally find under lichess.org/training is available for your perusal.

This is a quick guide for pulling that CSV and seeding a SQLite database so you can do something cool with it. You will need zstd.

First, wget the file from Lichess.org open database and save it into a temporary directory. Run zstd to uncompress it into a CSV that we can read via Ruby.

wget https://database.lichess.org/lichess_db_puzzle.csv.zst -P tmp/
zstd -d tmp/lichess_db_puzzle.csv.zst

CSV pulled down and uncompressed, it's time to read it into the application. I'm using Ruby on Rails, so I generate a database model like so:

bin/rails g model Puzzle \
  puzzle_id:string fen:string moves:string rating:integer \
  rating_deviation:integer popularity:integer nb_plays:integer \
  themes:string game_url:string opening_tags:string

Which creates the following migration:

class CreatePuzzles < ActiveRecord::Migration
  def change
    create_table :puzzles do |t|
      t.string :puzzle_id
      t.string :fen
      t.string :moves
      t.integer :rating
      t.integer :rating_deviation
      t.integer :popularity
      t.integer :nb_plays
      t.string :themes
      t.string :game_url
      t.string :opening_tags

      t.timestamps
    end
  end
end

A separate seed script pulls items from the CSV and bulk-inserts them into SQLite. I have the following in my db/seeds.rb, with a few omitted additions that check whether or not the puzzles have already been migrated.

csv_path = Rails.root.join("tmp", "lichess_db_puzzle.csv")
raise "CSV not found" unless File.exist?(csv_path)

buffer = []
buffer_size = 500
flush = ->() do
  Puzzle.insert_all(buffer)
  buffer.clear
end

CSV.foreach(csv_path, headers: true) do |row|
  buffer << {
    puzzle_id: row["PuzzleId"],
    fen: row["FEN"],
    moves: row["Moves"],
    rating: row["Rating"],
    rating_deviation: row["RatingDeviation"],
    popularity: row["Popularity"],
    nb_plays: row["NbPlays"],
    themes: row["Themes"],
    game_url: row["GameUrl"],
    opening_tags: row["OpeningTags"]
  }

  if buffer.count >= buffer_size
    flush.()
  end
end

flush.()

And with that you have the entire Lichess puzzle database available at your fingertips. The whole process takes less than a minute.

Puzzle.where("rating < 1700").count
# => 3035233