Today I Learned
RSS feed
I've been digging into Ruby's stdlib RSS parser for a side project and am very
impressed by the overall experience. Here's how easy it is to get started:
require "open-uri"
require "rss"
feed = URI.open("https://jvns.ca/atom.xml") do |raw|
RSS::Parser.parse(raw)
end
That said, doing something interesting with the resulting feed is not quite so
simple.
For one, you can't just support RSS. Atom is a more recent standard used by many
blogs (although I think irrelevant in the world of podcasts). There's about a
50% split in the use of RSS and Atom in the tiny list of feeds that I follow, so
a feed reader must handle both formats.
Adding Atom support introduces an extra branch to our snippet:
require "open-uri"
require "rss"
URI.open("https://jvns.ca/atom.xml") do |raw|
feed = RSS::Parser.parse(raw)
title = case feed
when RSS::Rss
feed.channel.title
when RSS::Atom::Feed
feed.title.content
end
end
The need to handle both standards independently is kind of frustrating.
That said, it does make sense from a library perspective. The RSS gem is
principally concerned with parsing XML per the RSS and Atom standards,
returning objects that correspond one-to-one. Any conveniences for general
feed reading are left to the application.
Wrapping the RSS gem in another class helps encapsulate differences in
standards:
class FeedReader
attr_reader :title
def initialize(url)
@url = url
end
def fetch
feed = URI.open(@url) { |r| RSS::Parser.parse(r) }
case feed
when RSS::Rss
@title = feed.channel.title
when RSS::Atom::Feed
@title = feed.title.content
end
end
end
Worse than dealing with competing standards is the fact that not everyone
publishes the content of an article as part of their feed. Many bloggers only
use RSS as a link aggregator that points subscribers to their webpage, omitting
the content entirely:
<rss version="2.0">
<channel>
<title>Redacted Blog</title>
<link>https://www.redacted.io</link>
<description>This is my blog</description>
<item>
<title>Article title goes here</title>
<link>https://www.redacted.io/this-is-my-blog</link>
<pubDate>Thu, 25 Jul 2024 00:00:00 GMT</pubDate>
</item>
</channel>
</rss>
How do RSS readers handle this situation? The solution varies based on the app.
The two I've tested, NetNewsWire and Readwise Reader, manage to include the entire
article content in the app, despite the RSS feed omitting it (assuming no paywalls).
My guess is these services make an HTTP request to the source, scraping the resulting
HTML for the article content and ignoring everything else.
Firefox users are likely familiar with a feature called
Reader View
that transforms a webpage into its bare-minimum content. All of the layout
elements are removed in favor of highlighting the text of the page. The JS library
that Firefox uses is open source on their Github:
mozilla/readability.
On the Ruby side of things there's a handy port called
ruby-readability that we can use to
extract omitted article content directly from the associated website:
require "open-uri"
require "rss"
require "ruby-readability"
URI.open("https://jvns.ca/atom.xml") do |raw|
feed = RSS::Parser.parse(raw)
url = case feed
when RSS::Rss
feed.items.first.link
when RSS::Atom::Feed
feed.entries.first.link.href
end
source = URI.parse(url).read
article_content = Readability::Document.new(source).content
end
So far the results are good, but I haven't tested it on many blogs.
Today I found myself at the bottom of a rabbit hole, exploring how
Zod's refine method interacts with form validations. As with
most things in programming, reality is never as clear-cut as the types make it
out to be.
Today's issue concerns
zod/issues/479, where refine
validations aren't executed until all fields in the associated object are
present. Here's a reframing of the problem:
The setup:
- I have a form with fields A and B. Both are required fields, say
required_a
and required_b
.
- I have a validation that depends on the values of both A and B, say
complex_a_b
.
The problem:
If one of A or B is not filled out, the form parses with errors: [required_a]
,
not [required_a, complex_a_b]
. In other words, complex_a_b
only pops up as
an error when both A and B are filled out.
Here's an example schema that demonstrates the problem:
const schema = z
.object({
a: z.string(),
b: z.string(),
})
.refine((values) => !complexValidation(values.a, values.b), {
message: 'complex_a_b error',
})
This creates an experience where a user fills in A, submits, sees a validation
error pointing at B, fills in B, and sees another validation error pointing at
complex_a_b
. The user has to play whack-a-mole with the form inputs to make
sure all of the fields pass validation.
As a programmer, we're well-acquainted with error messages that work like this.
And we hate them! Imagine a compiler that suppresses certain errors before
prerequisite ones are fixed.
If you dig deep into the aforementioned issue thread, you'll come across the
following solution (credit to
jedwards1211):
const base = z.object({
a: z.string(),
b: z.string(),
})
const schema = z.preprocess((input, ctx) => {
const parsed = base.pick({ a: true, b: true }).safeParse(input)
if (parsed.success) {
const { a, b } = parsed.data
if (complexValidation(a, b)) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
path: ['a'],
message: 'complex_a_b error',
})
}
}
return input
}, base)
Look at all of that extra logic! Tragic.
From a type perspective, I understand why Zod doesn't endeavor to fix this
particular issue. How can we assert the types of A or B when running the
complex_a_b
validation, if types A or B are implicitly optional? To evaluate
them optionally in complex_a_b
would defeat the type, z.string()
, that
asserts that the field is required.
How did I fix it for my app? I didn't. I instead turned to the form library,
applying my special validation via the form API instead of the Zod API. I
concede defeat.
Stumbled on the emacs-aio library today
and it's introduction post. What a
great exploration into how async/await works under the hood! I'm not sure I
totally grok the details, but I'm excited to dive more into Emacs generators and
different concurrent programming techniques.
The article brings to mind Wiegley's
async library, which is probably the
more canonical library for handling async in Emacs. From a brief look at the
README, async
looks like it actually spawns independent processes, whereas
emacs-aio
is really just a construct for handling non-blocking I/O more
conveniently.
Karthink on reddit
comments on the usability of generators in Emacs:
I've written small-medium sized packages -- 400 to 2400 lines of elisp -- that
use generators and emacs-aio (async/await library built on generator.el) for
their async capabilities. I've regretted it each time: generators in their
current form in elisp are obfuscated, opaque and not introspectable -- you
can't debug/edebug generator calls. Backtraces are impossible to read because
of the continuation-passing macro code. Their memory overhead is large
compared to using simple callbacks. I'm not sure about the CPU overhead.
That said, the simplicity of emacs-aio
promises is very appealing:
(defun aio-promise ()
"Create a new promise object."
(record 'aio-promise nil ()))
(defsubst aio-promise-p (object)
(and (eq 'aio-promise (type-of object))
(= 3 (length object))))
(defsubst aio-result (promise)
(aref promise 1))
Lichess is an awesome website, made even more awesome by
the fact that it is free and open source. Perhaps lesser known is that the
entire Lichess puzzle database is available for free download under the Creative
Commons CC0 license. Every puzzle that you normally find under
lichess.org/training is available for your
perusal.
This is a quick guide for pulling that CSV and seeding a SQLite database so you
can do something cool with it. You will need
zstd.
First, wget
the file from
Lichess.org open database and save it
into a temporary directory. Run zstd
to uncompress it into a CSV that we can
read via Ruby.
wget https://database.lichess.org/lichess_db_puzzle.csv.zst -P tmp/
zstd -d tmp/lichess_db_puzzle.csv.zst
CSV pulled down and uncompressed, it's time to read it into the application. I'm
using Ruby on Rails, so I generate a database model like so:
bin/rails g model Puzzle \
puzzle_id:string fen:string moves:string rating:integer \
rating_deviation:integer popularity:integer nb_plays:integer \
themes:string game_url:string opening_tags:string
Which creates the following migration:
class CreatePuzzles < ActiveRecord::Migration
def change
create_table :puzzles do |t|
t.string :puzzle_id
t.string :fen
t.string :moves
t.integer :rating
t.integer :rating_deviation
t.integer :popularity
t.integer :nb_plays
t.string :themes
t.string :game_url
t.string :opening_tags
t.timestamps
end
end
end
A separate seed script pulls items from the CSV and bulk-inserts them into
SQLite. I have the following in my db/seeds.rb
, with a few omitted additions
that check whether or not the puzzles have already been migrated.
csv_path = Rails.root.join("tmp", "lichess_db_puzzle.csv")
raise "CSV not found" unless File.exist?(csv_path)
buffer = []
buffer_size = 500
flush = ->() do
Puzzle.insert_all(buffer)
buffer.clear
end
CSV.foreach(csv_path, headers: true) do |row|
buffer << {
puzzle_id: row["PuzzleId"],
fen: row["FEN"],
moves: row["Moves"],
rating: row["Rating"],
rating_deviation: row["RatingDeviation"],
popularity: row["Popularity"],
nb_plays: row["NbPlays"],
themes: row["Themes"],
game_url: row["GameUrl"],
opening_tags: row["OpeningTags"]
}
if buffer.count >= buffer_size
flush.()
end
end
flush.()
And with that you have the entire Lichess puzzle database available at your
fingertips. The whole process takes less than a minute.
Puzzle.where("rating < 1700").count
I've blogged before about why I really dislike apps like Notion for
taking quick notes since they're so slow to
open. The very act of opening the app to take said note often takes 10 or more
seconds, typically with a whole bunch of JavaScript-inflicted loading states and
blank screens. By the time I get to the note, I've already lost my train of
thought.
As it turns out, this painpoint is a perfect candidate for the iOS Shortcuts
app. I can create an automated workflow that captures my text input instantly
but pushes to Notion in the background, allowing me to benefit from Notion's
database-like organization but without dealing with the pitiful app performance.
Here's my Shortcut:

Super simple but it gets the job done.
Type predicates
have been around but today I found a particularly nice application. The
situation is this: I have an interface that has an optional field, where the
presence of that field means I need to create a new object on the server, and
the lack of the field means the object has already been created and I'm just
holding on to it for later. Here's what it looked like:
interface Thing {
name: string
blob?: File
}
const things: Thing[] = [
]
const uploadNewThings = (things: (Thing & { blob: File })[]) =>
Promise.all(things.map((thing) => createThing(thing.name, thing.blob)))
The intersection type Thing & { blob: File }
means that uploadNewThings
only
accepts things
that have the field blob
. In other words, things that need to
be created on the server because they have blob content.
However, TypeScript struggles if you try to simply filter the list of things
before passing it into uploadNewThings
:
uploadNewThings(things.filter((thing) => !!thing.blob))
The resulting error is this long stream of text:
Argument of type 'Thing[]' is not assignable to parameter of type '(Thing & { blob: File; })[]'.
Type 'Thing' is not assignable to type 'Thing & { blob: File; }'.
Type 'Thing' is not assignable to type '{ blob: File; }'.
Types of property 'blob' are incompatible.
Type 'File | undefined' is not assignable to type 'File'.
Type 'undefined' is not assignable to type 'File'.
The tl;dr being that despite filtering things
by thing => !!thing.blob
,
TypeScript does not recognize that the return value is actually
Thing & { blob: File }
.
Now you could just cast it,
things.filter((thing) => !!thing.blob) as (Thing & { blob: File })[]
But casting is bad! It's error-prone and doesn't really solve the problem that
TypeScript is hinting at. Instead, use a type predicate:
const hasBlob = (t: Thing): t is Thing & { blob: File } => !!t.blob
uploadNewThings(things.filter(hasBlob))
With the type predicate (t is Thing & ...
) I can inform TypeScript that I do
in fact know what I'm doing, and that the call to filter
results in a
different interface.
Most runners run not because they want to live longer, but because they want
to live life to the fullest. If you're going to while away the years, it's far
better to live them with clear goals and fully alive than in a fog, and I
believe running helps you do that. Exerting yourself to the fullest within
your individual limits: that's the essence of running, and a metaphor for
life—and for me, writing as well. - Haruki Murakami
What I traditionally would've used Rake tasks for has been replaced with
data-migrate, a little gem that
handles data migrations in the same way as Rails schema migrations. It's the
perfect way to automate data changes in production, offering a single pattern
for handling data backfills, seed scripts, and the like.
The pros are numerous:
- Data migrations are easily generated via CLI and are templated with an
up
and down
case so folks think about rollbacks.
- Just like with Rails schema migrations, there's a migration ID kept around
that ensures data migrations are run in order. Old PRs will have merge
conflicts.
- You can conditionally run data migrations alongside schema migrations with
bin/rails db:migrate:with_data
.
It's a really neat gem. I'll probably still rely on the good ol' Rake task for
my personal projects, but will doubtless keep data-migrate
in the toolbox for
teams.
There's something super elegant about Writebook's
use of concerns. I especially like Book:Sluggable
:
module Book::Sluggable
extend ActiveSupport::Concern
included do
before_save :generate_slug, if: -> { slug.blank? }
end
def generate_slug
self.slug = title.parameterize
end
end
Here's a few reasons:
- Nesting concerns in a model folder is neat when that concern is an
encapsulation of model-specific functionality:
app/models/book/sluggable.rb
.
- Concerns don't have to be big. They do have to be single-purpose.
- Reminds me of a great article by Jorge Manrubla:
Vanilla Rails is plenty.
Down with service objects!
On the inside cover of Kafka on the Shore Murakami explains how his idea for
the book started with its title. This approach is opposite to anything I've ever
written, though I recognize there's a notable difference between fiction and
technical writing. But what a powerful idea: a simple phrase shapes the entire
story.
I dug up this quote from an interview:
When I start to write, I don’t have any plan at all. I just wait for the story
to come. I don’t choose what kind of story it is or what’s going to happen. I
just wait.
I think that's pretty cool.