Sometime in the last ten years, while you were mourning the loss of Google Reader, we entered the golden age of content syndication. Our social media overlords hit the syndication Comstock Lode. For all their dystopic visioneering, I doubt the feed-accelerationists at Microsoft and Netscape in the mid 90’s foresaw these particular macroeconomics.
Arguably, though, we’re also in a golden age of nondystopic author-managed syndication. Free and nearly-free tools for hosting static sites are only outnumbered by static site generators; new ones are released every week. These tools, like blogosphere-era blogging platforms, can generate feeds as side effects of the routine publishing activity of their users; many do so by default. Even if it’s only a feed of content previews (to draw users onto the publisher’s site), each feed is a contribution to the digital commons.1
Syndicated feeds — for which RSS, Atom, and JSON Feed are specifications — are essentially different from the feeds turning social media users into blue-app-anxiety foie gras. Rather than an algorithmically ranked and collated series of texts from a variety of sources, syndicated feeds just list items as a single source; the categorizing, collating, and display of those items is left up to feeds’ consumers. This has accessibility upsides, makes feeds easy to process programmatically, and provides a neat interface for users waiting on sparse updates (e.g. a blog that only updates once in a blue moon).
Providing a feed might mean content loses ad impressions to feed readers, but feeds generally align the interests of author-publishers who want their work read with the folks doing the reading.
The social challenge: maintaining and checking a feed reader will reward users only if their favorite sources of content provide feeds to be followed. For those sources of content, maintaining a feed (trickiest during site migrations!) is only worthwhile if readers would not otherwise follow them.
An introductory note on feed formats — RSS has the longest history and is the most widely-known, but its XML specification is pretty deeply janky. I would not recommend writing code for working with RSS feeds. Atom, RSS’s successor in the XML feed tradition, is a strict improvement. Most feed readers support both.
JSON Feed is a relative
newcomer, introduced by the authors of NetNewsWire and Micro.blog. It
has less client support than Atom/RSS, but it’s a sweet format to tinker
with. I find JSON easier to read than XML, and my languages of choice
these days (Go, Python, Typescript) have much nicer support for parsing
and writing JSON objects than for XML (even with Python’s
feedparser
).
JSON feeds make syndication so simple that I’ve written a cluster of interrelated tools for working with them. Here’s a narrative breakdown of how they came to be and how I use them together.
Habitually collecting feeds makes one very aware of how many sites don’t (but should!) have them; how many have feeds but don’t prominently list links to them; and how many publications offer central aggregate feeds but not feeds broken down by category or author. I’ve built myself a few tools to help with this.
feedscan
is a bash utility for discovering feeds by checking the routes that
commonly host them: /feed
, /atom.xml
, and so
on. If I find a sweet blog at lukasschwab.me/blog, I try
feedscan https://lukasschwab.me/blog
before digging for a
link on the site itself. It’s totally disconnected from the other
projects discussed here, but it has saved me a lot of frantic
searches.
jsonfeed
is a JSON feed parser and constructor package written in Python, the
backbone to most of my other JSON feed tools. I wrote a Go equivalent,
go-jsonfeed
,
but haven’t used it much. This very blog is generated with a fork of pandoc-blog
,
which generates a JSON feed
using the jsonfeed
package I authored.
I discovered a little running project for myself in building and
hosting public feeds for sites that don’t offer them. I got my start
with arxiv-feeds
,
which converts Atom to JSON using jsonfeed
, but it’s a
relatively boring wrapper. Blogs and news sites are more fun because
they involve scraping feed items from the sites on demand. I wrote
separate Python scraper/generator apps for a couple of sites, then
realized those generators shared a certain procedural structure:
Steps 1, 2, and 4 were essentially shared, so I factored them out
into jsonfeed-wrapper
,
which takes the site-specific HTML-to-feed transform and wraps it with
the standard fetching and feed-serving logic. I originally designed it
for use with Google App Engine, but last weekend I rewrote it to expose
a Google Cloud Function target.2 Cloud Functions save me
a couple bucks a month.
I generate and host feeds for It’s Nice
That, Bandcamp
artists, The
Baffler, and Atlas of Places.
Generating feeds from scraped HTML is somewhat brittle, but these have
been reliable enough for the last few months. Adding a new site to the
list takes about an hour of filling out a jsonfeed-wrapper
template; shortening that time is the jsonfeed-wrapper
project’s north star. Everything deserves a feed.
The next frontier: feed filters with CEL
.
cel-go
works neatly with raw JSON, but the JSON Feed schema
is well defined — why not create a CEL
environment with
types and macros for filtering feeds?
I have a Cloud Function that does nothing but parse Bruce Schneier’s RSS feed, filter out the feed items involing squid (Bruce’s hobby outside of security), and re-host the feed. There’s no reason this filtered feed should be re-hosted on its own when it could instead compile a CEL expression it receives from a client:
!item.tags.exists_one(t, t == "squid")
…and just return items where that expression returns
true
. User-defined CEL
expressions are
non-Turing-complete and safe to execute, so I can use them in lieu of
parsing and documenting some feed-specific filter API. Different
requests, passing differenct CEL
expressions, can fetch
differently filtered feeds from the same endpoint.
I will probably never convince anyone to host a feed that behaves this way, but that’s the neat thing about syndication: I can mirror or aggregate other feeds in a feed of my own that provides the interface I want. No need to ask anyone else to implement anything.