A plan for podcasts

25 July 2020

Addendum June 2023

This is now up on github, written in Go: https://github.com/algrt-hm/gopodder

Previously I have used bashpodder for podcasts. It is simple and has been effective software; bashpodder simply checks the RSS feeds and downloads any files it has not already downloaded and logged.

But it was written in 2008 and the world of the internet and podcasting has become more complex since then. In particular it assumes URLs will be in the traditional sensible format, which is an assumption violated by many RSS feeds these days.

I made a couple of tweaks to it to work around edge cases, but this is not a particularly satisfying solution. Further, because many podcasts have good data in the RSS feed on the episode and series names etc, but poor metadata in the mp3 file itself, improved metadata could be written to the mp3 at the time of download.

An outline of what a replacement might do:

Parse configuration file which is a list of RSS URLs. Grab RSS feeds, parse for URLs of the mp3s of the podcast episodes, create list of URLs
For each URL in list of URLs

Hash the URL string with md5 e.g. 006a7e228032bdbf046e6a29638b7c1a
See if e.g. 006a7e228032bdbf046e6a29638b7c1a.mp3 exists in the podcast download directory; if not, download the mp3 and save using the hash as the filename

For each downloaded mp3

Hash the file and retain this hash
Extract the metadata from the RSS feed which relates to the mp3 and write this metadata to the mp3
Do some logging. It would be nice to be able to do aggregate stats or to look up a filename to see where it was originally from etc, so while we’re at it let’s log:
- The URL of the RSS feed
- The URL of the podcast episode
- The hash of the URL
- The hash of the file before the addition of metadata (retained from the first step); and
- A fresh hash of the file with updated metadata