It’s very useful to cache data when working in the Jupyter notebook so that you can repeatedly call a function and get near instantaneous results.
Consider the case where you have a function which is used to query an API from some third-party data provider:
- API calls may be logged, in which case you may which to minimise these in order to conform to a fair use policy, or simply for politeness
- Even trivial API calls may take a moment or so to return, which can add up over multiple calls
The answer here is to cache the function with the help of functools.cache or its older sibling functools.lru_cache. functool provides a number of helpful decorators and is part of the standard library, so a quick glance at its documentation is well worth a look for those less familiar with this module.
There are some cases where this will not work out of the box though: if you try to call a function with the functools.lru_cache decorator and an argument which is a dict for instance …
@functools.lru_cache(maxsize=None)
def cant_cache_me(x):
    return x
cant_cache_me(dict(x='y'))TypeError: unhashable type: 'dict'The reason why e.g. a dict is not hashable is because it is mutable. We can of course get around this by using an unmutable type in its place. The frozendict is a quick pip install frozendict away, and for a list where the order does not matter we can use the built-in type frozenset.
And if the function we want to cache insists on taking e.g. lists and dicts, we can always coerce them back to those types …
import functools
from frozendict import frozendict
def uncachable(*args, **kwargs):
    print("In uncachable()")
    print(type(args), args)
    print(type(kwargs), kwargs)
    return "cache_me"
@functools.lru_cache(maxsize=None)
def cached(*args, **kwargs):
    print("In cached()")
    def unfreezeit(x):
        if isinstance(x, frozenset):
            return list(x)
        if isinstance(x, frozendict):
            return dict(x)
        return x
    print(type(args), args)
    print(type(kwargs), kwargs)
    args = tuple([unfreezeit(x) for x in args])
    kwargs = {k: unfreezeit(v) for k, v in kwargs.items()}
    return uncachable(*args, **kwargs)
def test(*args, **kwargs):
    print("In test()")
    def freezeit(x):
        if isinstance(x, list):
            return frozenset(x)
        if isinstance(x, dict):
            return frozendict(x)
        return x
    print(type(args), args)
    print(type(kwargs), kwargs)
    args = tuple([freezeit(x) for x in args])
    kwargs = {k: freezeit(v) for k, v in kwargs.items()}
    return cached(*args, **kwargs)
The first time this is run there is no cache and we get the below output, with 'cache_me' being returned.
In test()
 (['arg_a'],)
 {'arg_b': {'key': 'value'}, 'arg_c': 'x'}
In cached()
 (frozenset({'arg_a'}),)
 {'arg_b': frozendict.frozendict({'key': 'value'}), 'arg_c': 'x'}
In uncachable()
 (['arg_a'],)
 {'arg_b': {'key': 'value'}, 'arg_c': 'x'}      Subsequent times, however, we can see the cache works as expected, with 'cache_me' being returned and the below output.
In test()
 (['arg_a'],)
 {'arg_b': {'key': 'value'}, 'arg_c': 'x'}