A plan for podcasts

25 July 2020

Addendum June 2023

This is now up on github, written in Go: https://github.com/algrt-hm/gopodder

Previously I have used bashpodder for podcasts. It is simple and has been effective software; bashpodder simply checks the RSS feeds and downloads any files it has not already downloaded and logged.

But it was written in 2008 and the world of the internet and podcasting has become more complex since then. In particular it assumes URLs will be in the traditional sensible format, which is an assumption violated by many RSS feeds these days.

I made a couple of tweaks to it to work around edge cases, but this is not a particularly satisfying solution. Further, because many podcasts have good data in the RSS feed on the episode and series names etc, but poor metadata in the mp3 file itself, improved metadata could be written to the mp3 at the time of download.

An outline of what a replacement might do:

  1. Parse configuration file which is a list of RSS URLs. Grab RSS feeds, parse for URLs of the mp3s of the podcast episodes, create list of URLs
  2. For each URL in list of URLs
  • Hash the URL string with md5 e.g. 006a7e228032bdbf046e6a29638b7c1a
  • See if e.g. 006a7e228032bdbf046e6a29638b7c1a.mp3 exists in the podcast download directory; if not, download the mp3 and save using the hash as the filename
  1. For each downloaded mp3
  • Hash the file and retain this hash

  • Extract the metadata from the RSS feed which relates to the mp3 and write this metadata to the mp3

  • Do some logging. It would be nice to be able to do aggregate stats or to look up a filename to see where it was originally from etc, so while we’re at it let’s log:

    • The URL of the RSS feed
    • The URL of the podcast episode
    • The hash of the URL
    • The hash of the file before the addition of metadata (retained from the first step); and
    • A fresh hash of the file with updated metadata