Caching a function with unhashable arguments

2 April 2022

It’s very useful to cache data when working in the Jupyter notebook so that you can repeatedly call a function and get near instantaneous results.

Consider the case where you have a function which is used to query an API from some third-party data provider:

  • API calls may be logged, in which case you may which to minimise these in order to conform to a fair use policy, or simply for politeness
  • Even trivial API calls may take a moment or so to return, which can add up over multiple calls

The answer here is to cache the function with the help of functools.cache or its older sibling functools.lru_cache. functool provides a number of helpful decorators and is part of the standard library, so a quick glance at its documentation is well worth a look for those less familiar with this module.

There are some cases where this will not work out of the box though: if you try to call a function with the functools.lru_cache decorator and an argument which is a dict for instance …

@functools.lru_cache(maxsize=None)
def cant_cache_me(x):
    return x

cant_cache_me(dict(x='y'))
TypeError: unhashable type: 'dict'

The reason why e.g. a dict is not hashable is because it is mutable. We can of course get around this by using an unmutable type in its place. The frozendict is a quick pip install frozendict away, and for a list where the order does not matter we can use the built-in type frozenset.

And if the function we want to cache insists on taking e.g. lists and dicts, we can always coerce them back to those types …

import functools
from frozendict import frozendict


def uncachable(*args, **kwargs):
    print("In uncachable()")
    print(type(args), args)
    print(type(kwargs), kwargs)

    return "cache_me"


@functools.lru_cache(maxsize=None)
def cached(*args, **kwargs):
    print("In cached()")

    def unfreezeit(x):
        if isinstance(x, frozenset):
            return list(x)

        if isinstance(x, frozendict):
            return dict(x)

        return x

    print(type(args), args)
    print(type(kwargs), kwargs)

    args = tuple([unfreezeit(x) for x in args])
    kwargs = {k: unfreezeit(v) for k, v in kwargs.items()}
    return uncachable(*args, **kwargs)


def test(*args, **kwargs):
    print("In test()")

    def freezeit(x):
        if isinstance(x, list):
            return frozenset(x)

        if isinstance(x, dict):
            return frozendict(x)

        return x

    print(type(args), args)
    print(type(kwargs), kwargs)

    args = tuple([freezeit(x) for x in args])
    kwargs = {k: freezeit(v) for k, v in kwargs.items()}
    return cached(*args, **kwargs)

The first time this is run there is no cache and we get the below output, with 'cache_me' being returned.

In test()
 (['arg_a'],)
 {'arg_b': {'key': 'value'}, 'arg_c': 'x'}
In cached()
 (frozenset({'arg_a'}),)
 {'arg_b': frozendict.frozendict({'key': 'value'}), 'arg_c': 'x'}
In uncachable()
 (['arg_a'],)
 {'arg_b': {'key': 'value'}, 'arg_c': 'x'}

Subsequent times, however, we can see the cache works as expected, with 'cache_me' being returned and the below output.

In test()
 (['arg_a'],)
 {'arg_b': {'key': 'value'}, 'arg_c': 'x'}