Addendum September 2022
Since I wrote this, things have changed considerably and it looks like I will have to completely eat my words on this one. It seems like Catalina was a blip, and the move to Apple silicon (the M1 chip, etc.) has been smooth, both in terms of MacOS itself and third-party libraries. At least at the time of writing, using a 2020 M1 MacBook Air and running MacOS Monterey (12.1), using MacOS for development is great, thanks in no large part to the amazing efforts of the maintainers of homebrew.
The rise of Google colab and other offerings also means that hosted notebooks are now an attractive option too. Wrong on both counts …
Mac laptops are standard issue for technologists for the understandable reasons that they offer very impressive OS/hardware coupling and build quality which is second to none. To develop on though, there is a downside — which is that a lot can break with an upgrade to the latest MacOS version. If you’re working with a cloud platform like AWS/GCP/Azure this is all fine because you are only controlling things remotely, but if you’re trying to do things locally it is a pain.
In part this represents the divergence between the current and historic strategic priorities of Apple. OSX rose like a beautiful phoenix from the ashes of NeXT, where Unix with extra knobs on was married to exquisitely expensive and powerful hardware. When OSX — as MasOS used to be called — came out it was genuinely revolutionary, offering a fantastic GUI look and feel with real BSD Unix underpin. An operating system which appealed to technical/developer users as well as designers and creatives was at that time a strategic advantage to Apple. And of course the open-source world has helped Apple a lot; for example the package manager Homebrew makes installing open-source software on MacOS fantastically straightforward.
Over time though, we have come to the current situation, which is that Apple has other products to focus on — iPhone and iPads, Apple Music, Apple TV, not to mention the App Store and attendant economies of scale. As a result the priorities of power users and developers are less well catered for and the orientation seems to be more and more towards the median consumer. A lot of ink has been spilt on this on Hacker News and elsewhere.1
For general data/ML or related development work it probably makes sense to have another machine to SSH into and/or serve up Jupyter Notebooks, run PostgreSQL etc from, in order to keep the development environment separate and isolated. Small, cheap and quiet servers mass produced for the small business market, like the HP Microserver line, are perfect for this as long as a decent amount of RAM can be shoe-horned in.2
Another alternative is to just fire up an EC2 instance on AWS and use that, but the beauty of having it on the LAN is that you are behind the NAT of your router and so less care can be put into security. Further, moving large amounts of data over a wired LAN is still faster in most environments than moving large amounts of data over the internet.3
Still a further alternative is to develop within e.g. Docker images on the MacOS machine itself, which requires less hardware at the cost of additional configuration complexity and potentially also lower performance.
- See e.g. Greg Hurrell, ‘Grieving for Apple’ or Riccardo Mori, ‘Mac OS Catalina: more trouble than it’s worth (Part 2)’↩
- As for the choice of OS, I would say that Ubuntu is probably the obvious choice here. For local work with neural networks and GPU use, Nvidia hardware and Ubuntu are dominant, so if you ever intend to do this it makes sense to have some familiarity with it. Ubuntu is also one of the widely used Linux distributions.↩
- NFS makes this quite easy, as does manually editing
/etc/hosts
on some small number of machines in order to not have to set up proper DNS on the LAN.↩