I really wanted to get this cache turned into an RSS feed so I can listen to episodes alongside my podcasts. Kevan's Fourble service can do this pretty easily, and in fact it already has, but what I'd really really like is an RSS feed that incorporates the information about broadcast dates and shaggy dog stories found in this particular item's carefully written description. That will dramatically improve the usability of the "podcast" and allow me to listen to the episodes in rough chronological order, rather than alphabetically according to the first vocabulary word lobbed at the panelists.
This is, in fact, a job for The Syndication Automat, a project I created in 2004, the semi-early days of RSS. Back then it was sometimes necessary to employ vigilante justice to make RSS feeds for websites that didn't have them. This was actually the original use case for Beautiful Soup!
Of course, hard times soon struck the Automat as every website got its own RSS feed, RSS feeds themselves were ditched in favor of Twitter and Facebook, and then Twitter and Facebook melted down, leaving us with nothing. (I'm extrapolating a little here.) 2009 was the last time any of the Automat's old feeds were updated. But podcasts still stand, the cockroaches of syndication, so it makes perfect sense to bring back the Automat one more time to host The Doubly-Unofficial, Partially Chronological "My Word!" Podcast Feed. Painstakingly hand-crafted by a script I painstakingly hand-crafted to deal with tons of edge cases like "two shows that use the same vocabulary word" and "shows where the filename doesn't precisely match the vocabulary word" and "shows where the general era of the show is known but not the exact broadcast date". I took care of all that stuff; all you have to do is listen.
If you just want to make a podcast out of the MP3 files in an Internet Archive item, and not do any other processing, you can use my very tidy, edge-case-free Python script, which depends on the modules While producing this post I discovered that not only is there another, smaller, differently organized collection on the Internet Archive, but there's a significantly larger (but less well described) archive on RadioEchoes, which also has an even bigger archive of My Word!'s inevitable but lesser companion, My Music!. Thu Dec 15 2022 18:31 My RSS!:
Since listening to KUSC in college I've been a fan of the old BBC radio program(me) My Word!, an ur-quiz show with a focus on chin-stroking erudition, shameless bluffing when erudition fails, and cornball shaggy dog stories. About ten years ago my fandom took a big hit when the BBC stopped pouring decades-old My Word! reruns down whatever transcontinental pipe eventually got it broadcast on American radio stations' streaming websites. But recently I discovered a large cache of episodes uploaded to the Internet Archive in 2020, including a bunch of episodes I'd never heard. Jackpot!
internetarchive
, feedgen
, and pytz
To generate a fast and cheap version of my DUPCMW!PF, I'd invoke the script with this command line:
from datetime import datetime
from feedgen.feed import FeedGenerator
from internetarchive import get_item
import pytz
import sys
import time
def utc(dt):
return dt.replace(tzinfo=pytz.utc)
class IACollectionFeed(object):
def __init__(self, ia_item, destination_url):
self.item = self.fetch_item(ia_item)
self.feed = FeedGenerator()
self.feed.link(href=destination_url)
self.feed.description(self.item.metadata['description'])
self.feed.title(self.item.metadata['title'])
for file in self.item.get_files():
if file.format != 'VBR MP3':
continue
self.add_entry(file)
def fetch_item(self, ia_item):
return get_item(ia_item)
def add_entry(self, file):
entry = self.feed.add_entry(order='append')
entry.id(file.url)
entry.title(file.metadata['name'])
mtime = utc(datetime.fromtimestamp(int(file.metadata['mtime'])))
entry.updated(mtime)
entry.enclosure(file.url, str(file.size), "audio/mpeg")
return entry
def __str__(self):
return self.feed.rss_str(pretty=True).decode("utf8")
if __name__ == '__main__':
print(IACollectionFeed(sys.argv[1], sys.argv[2]))
$ python roughdraft.py bbcmyword https://www.crummy.com/automat/feeds/myword.xml