Today I Learned

RSS feed

30 Mar, 2025: Ruby and RSS feeds

I've been digging into Ruby's stdlib RSS parser for a side project and am very impressed by the overall experience. Here's how easy it is to get started:

require "open-uri"
require "rss"

feed = URI.open("https://jvns.ca/atom.xml") do |raw|
  RSS::Parser.parse(raw)
end

That said, doing something interesting with the resulting feed is not quite so simple.

For one, you can't just support RSS. Atom is a more recent standard used by many blogs (although I think irrelevant in the world of podcasts). There's about a 50% split in the use of RSS and Atom in the tiny list of feeds that I follow, so a feed reader must handle both formats.

Adding Atom support introduces an extra branch to our snippet:

require "open-uri"
require "rss"

URI.open("https://jvns.ca/atom.xml") do |raw|
  feed = RSS::Parser.parse(raw)

  title = case feed
  when RSS::Rss
    feed.channel.title
  when RSS::Atom::Feed
    feed.title.content
  end
end

The need to handle both standards independently is kind of frustrating.

That said, it does make sense from a library perspective. The RSS gem is principally concerned with parsing XML per the RSS and Atom standards, returning objects that correspond one-to-one. Any conveniences for general feed reading are left to the application.

Wrapping the RSS gem in another class helps encapsulate differences in standards:

class FeedReader
  attr_reader :title

  def initialize(url)
    @url = url
  end

  def fetch
    feed = URI.open(@url) { |r| RSS::Parser.parse(r) }

    case feed
    when RSS::Rss
      @title = feed.channel.title
    when RSS::Atom::Feed
      @title = feed.title.content
    end
  end
end

Worse than dealing with competing standards is the fact that not everyone publishes the content of an article as part of their feed. Many bloggers only use RSS as a link aggregator that points subscribers to their webpage, omitting the content entirely:

<rss version="2.0">
  <channel>
    <title>Redacted Blog</title>
    <link>https://www.redacted.io</link>
    <description>This is my blog</description>
    <item>
      <title>Article title goes here</title>
      <link>https://www.redacted.io/this-is-my-blog</link>
      <pubDate>Thu, 25 Jul 2024 00:00:00 GMT</pubDate>
      <!-- No content! -->
    </item>
  </channel>
</rss>

How do RSS readers handle this situation? The solution varies based on the app.

The two I've tested, NetNewsWire and Readwise Reader, manage to include the entire article content in the app, despite the RSS feed omitting it (assuming no paywalls). My guess is these services make an HTTP request to the source, scraping the resulting HTML for the article content and ignoring everything else.

Firefox users are likely familiar with a feature called Reader View that transforms a webpage into its bare-minimum content. All of the layout elements are removed in favor of highlighting the text of the page. The JS library that Firefox uses is open source on their Github: mozilla/readability.

On the Ruby side of things there's a handy port called ruby-readability that we can use to extract omitted article content directly from the associated website:

require "open-uri"
require "rss"
require "ruby-readability"

URI.open("https://jvns.ca/atom.xml") do |raw|
  feed = RSS::Parser.parse(raw)

  url = case feed
  when RSS::Rss
    feed.items.first.link
  when RSS::Atom::Feed
    feed.entries.first.link.href
  end

  # Raw HTML content
  source = URI.parse(url).read
  # Just the article HTML content
  article_content = Readability::Document.new(source).content
end

So far the results are good, but I haven't tested it on many blogs.

26 Feb, 2025: Zod refinements are complicated

Today I found myself at the bottom of a rabbit hole, exploring how Zod's refine method interacts with form validations. As with most things in programming, reality is never as clear-cut as the types make it out to be.

Today's issue concerns zod/issues/479, where refine validations aren't executed until all fields in the associated object are present. Here's a reframing of the problem:

The setup:

  • I have a form with fields A and B. Both are required fields, say required_a and required_b.
  • I have a validation that depends on the values of both A and B, say complex_a_b.

The problem:

If one of A or B is not filled out, the form parses with errors: [required_a], not [required_a, complex_a_b]. In other words, complex_a_b only pops up as an error when both A and B are filled out.

Here's an example schema that demonstrates the problem:

const schema = z
  .object({
    a: z.string(),
    b: z.string(),
  })
  .refine((values) => !complexValidation(values.a, values.b), {
    message: 'complex_a_b error',
  })

This creates an experience where a user fills in A, submits, sees a validation error pointing at B, fills in B, and sees another validation error pointing at complex_a_b. The user has to play whack-a-mole with the form inputs to make sure all of the fields pass validation.

As a programmer, we're well-acquainted with error messages that work like this. And we hate them! Imagine a compiler that suppresses certain errors before prerequisite ones are fixed.

If you dig deep into the aforementioned issue thread, you'll come across the following solution (credit to jedwards1211):

const base = z.object({
  a: z.string(),
  b: z.string(),
})

const schema = z.preprocess((input, ctx) => {
  const parsed = base.pick({ a: true, b: true }).safeParse(input)
  if (parsed.success) {
    const { a, b } = parsed.data
    if (complexValidation(a, b)) {
      ctx.addIssue({
        code: z.ZodIssueCode.custom,
        path: ['a'],
        message: 'complex_a_b error',
      })
    }
  }
  return input
}, base)

Look at all of that extra logic! Tragic.

From a type perspective, I understand why Zod doesn't endeavor to fix this particular issue. How can we assert the types of A or B when running the complex_a_b validation, if types A or B are implicitly optional? To evaluate them optionally in complex_a_b would defeat the type, z.string(), that asserts that the field is required.

How did I fix it for my app? I didn't. I instead turned to the form library, applying my special validation via the form API instead of the Zod API. I concede defeat.

16 Feb, 2025: Async IO in Emacs

Stumbled on the emacs-aio library today and it's introduction post. What a great exploration into how async/await works under the hood! I'm not sure I totally grok the details, but I'm excited to dive more into Emacs generators and different concurrent programming techniques.

The article brings to mind Wiegley's async library, which is probably the more canonical library for handling async in Emacs. From a brief look at the README, async looks like it actually spawns independent processes, whereas emacs-aio is really just a construct for handling non-blocking I/O more conveniently.

Karthink on reddit comments on the usability of generators in Emacs:

I've written small-medium sized packages -- 400 to 2400 lines of elisp -- that use generators and emacs-aio (async/await library built on generator.el) for their async capabilities. I've regretted it each time: generators in their current form in elisp are obfuscated, opaque and not introspectable -- you can't debug/edebug generator calls. Backtraces are impossible to read because of the continuation-passing macro code. Their memory overhead is large compared to using simple callbacks. I'm not sure about the CPU overhead.

That said, the simplicity of emacs-aio promises is very appealing:

(defun aio-promise ()
  "Create a new promise object."
  (record 'aio-promise nil ()))

(defsubst aio-promise-p (object)
  (and (eq 'aio-promise (type-of object))
       (= 3 (length object))))

(defsubst aio-result (promise)
  (aref promise 1))

03 Feb, 2025: Pulling Puzzles from Lichess

Lichess is an awesome website, made even more awesome by the fact that it is free and open source. Perhaps lesser known is that the entire Lichess puzzle database is available for free download under the Creative Commons CC0 license. Every puzzle that you normally find under lichess.org/training is available for your perusal.

This is a quick guide for pulling that CSV and seeding a SQLite database so you can do something cool with it. You will need zstd.

First, wget the file from Lichess.org open database and save it into a temporary directory. Run zstd to uncompress it into a CSV that we can read via Ruby.

wget https://database.lichess.org/lichess_db_puzzle.csv.zst -P tmp/
zstd -d tmp/lichess_db_puzzle.csv.zst

CSV pulled down and uncompressed, it's time to read it into the application. I'm using Ruby on Rails, so I generate a database model like so:

bin/rails g model Puzzle \
  puzzle_id:string fen:string moves:string rating:integer \
  rating_deviation:integer popularity:integer nb_plays:integer \
  themes:string game_url:string opening_tags:string

Which creates the following migration:

class CreatePuzzles < ActiveRecord::Migration
  def change
    create_table :puzzles do |t|
      t.string :puzzle_id
      t.string :fen
      t.string :moves
      t.integer :rating
      t.integer :rating_deviation
      t.integer :popularity
      t.integer :nb_plays
      t.string :themes
      t.string :game_url
      t.string :opening_tags

      t.timestamps
    end
  end
end

A separate seed script pulls items from the CSV and bulk-inserts them into SQLite. I have the following in my db/seeds.rb, with a few omitted additions that check whether or not the puzzles have already been migrated.

csv_path = Rails.root.join("tmp", "lichess_db_puzzle.csv")
raise "CSV not found" unless File.exist?(csv_path)

buffer = []
buffer_size = 500
flush = ->() do
  Puzzle.insert_all(buffer)
  buffer.clear
end

CSV.foreach(csv_path, headers: true) do |row|
  buffer << {
    puzzle_id: row["PuzzleId"],
    fen: row["FEN"],
    moves: row["Moves"],
    rating: row["Rating"],
    rating_deviation: row["RatingDeviation"],
    popularity: row["Popularity"],
    nb_plays: row["NbPlays"],
    themes: row["Themes"],
    game_url: row["GameUrl"],
    opening_tags: row["OpeningTags"]
  }

  if buffer.count >= buffer_size
    flush.()
  end
end

flush.()

And with that you have the entire Lichess puzzle database available at your fingertips. The whole process takes less than a minute.

Puzzle.where("rating < 1700").count
# => 3035233

24 Dec, 2024: Automating Quick Notes with iOS Shortcuts

I've blogged before about why I really dislike apps like Notion for taking quick notes since they're so slow to open. The very act of opening the app to take said note often takes 10 or more seconds, typically with a whole bunch of JavaScript-inflicted loading states and blank screens. By the time I get to the note, I've already lost my train of thought.

As it turns out, this painpoint is a perfect candidate for the iOS Shortcuts app. I can create an automated workflow that captures my text input instantly but pushes to Notion in the background, allowing me to benefit from Notion's database-like organization but without dealing with the pitiful app performance.

Here's my Shortcut:

Notion Shortcut Workflow

Super simple but it gets the job done.

03 Dec, 2024: Type predicates to avoid casting

Type predicates have been around but today I found a particularly nice application. The situation is this: I have an interface that has an optional field, where the presence of that field means I need to create a new object on the server, and the lack of the field means the object has already been created and I'm just holding on to it for later. Here's what it looked like:

interface Thing {
  name: string
  blob?: File
}

const things: Thing[] = [
  /* ... */
]

const uploadNewThings = (things: (Thing & { blob: File })[]) =>
  Promise.all(things.map((thing) => createThing(thing.name, thing.blob)))

The intersection type Thing & { blob: File } means that uploadNewThings only accepts things that have the field blob. In other words, things that need to be created on the server because they have blob content.

However, TypeScript struggles if you try to simply filter the list of things before passing it into uploadNewThings:

uploadNewThings(things.filter((thing) => !!thing.blob))

The resulting error is this long stream of text:

Argument of type 'Thing[]' is not assignable to parameter of type '(Thing & { blob: File; })[]'.
  Type 'Thing' is not assignable to type 'Thing & { blob: File; }'.
    Type 'Thing' is not assignable to type '{ blob: File; }'.
      Types of property 'blob' are incompatible.
        Type 'File | undefined' is not assignable to type 'File'.
          Type 'undefined' is not assignable to type 'File'.

The tl;dr being that despite filtering things by thing => !!thing.blob, TypeScript does not recognize that the return value is actually Thing & { blob: File }.

Now you could just cast it,

things.filter((thing) => !!thing.blob) as (Thing & { blob: File })[]

But casting is bad! It's error-prone and doesn't really solve the problem that TypeScript is hinting at. Instead, use a type predicate:

const hasBlob = (t: Thing): t is Thing & { blob: File } => !!t.blob

uploadNewThings(things.filter(hasBlob))

With the type predicate (t is Thing & ...) I can inform TypeScript that I do in fact know what I'm doing, and that the call to filter results in a different interface.

15 Nov, 2024: Running and writing

Most runners run not because they want to live longer, but because they want to live life to the fullest. If you're going to while away the years, it's far better to live them with clear goals and fully alive than in a fog, and I believe running helps you do that. Exerting yourself to the fullest within your individual limits: that's the essence of running, and a metaphor for life—and for me, writing as well. - Haruki Murakami

13 Nov, 2024: Data migrations with data-migrate

What I traditionally would've used Rake tasks for has been replaced with data-migrate, a little gem that handles data migrations in the same way as Rails schema migrations. It's the perfect way to automate data changes in production, offering a single pattern for handling data backfills, seed scripts, and the like.

The pros are numerous:

  • Data migrations are easily generated via CLI and are templated with an up and down case so folks think about rollbacks.
  • Just like with Rails schema migrations, there's a migration ID kept around that ensures data migrations are run in order. Old PRs will have merge conflicts.
  • You can conditionally run data migrations alongside schema migrations with bin/rails db:migrate:with_data.

It's a really neat gem. I'll probably still rely on the good ol' Rake task for my personal projects, but will doubtless keep data-migrate in the toolbox for teams.

09 Nov, 2024: Cool Rails concerns

There's something super elegant about Writebook's use of concerns. I especially like Book:Sluggable:

module Book::Sluggable
  extend ActiveSupport::Concern

  included do
    before_save :generate_slug, if: -> { slug.blank? }
  end

  def generate_slug
    self.slug = title.parameterize
  end
end

Here's a few reasons:

  • Nesting concerns in a model folder is neat when that concern is an encapsulation of model-specific functionality: app/models/book/sluggable.rb.
  • Concerns don't have to be big. They do have to be single-purpose.
  • Reminds me of a great article by Jorge Manrubla: Vanilla Rails is plenty. Down with service objects!

26 Oct, 2024: Kafka on the Shore

On the inside cover of Kafka on the Shore Murakami explains how his idea for the book started with its title. This approach is opposite to anything I've ever written, though I recognize there's a notable difference between fiction and technical writing. But what a powerful idea: a simple phrase shapes the entire story.

I dug up this quote from an interview:

When I start to write, I don’t have any plan at all. I just wait for the story to come. I don’t choose what kind of story it is or what’s going to happen. I just wait.

I think that's pretty cool.