It’s very useful to cache data when working in the Jupyter notebook so that you can repeatedly call a function and get near instantaneous results.
Consider the case where you have a function which is used to query an API from some third-party data provider:
- API calls may be logged, in which case you may which to minimise these in order to conform to a fair use policy, or simply for politeness
- Even trivial API calls may take a moment or so to return, which can add up over multiple calls
The answer here is to cache the function with the help of functools.cache
or its older sibling functools.lru_cache
. functool
provides a number of helpful decorators and is part of the standard library, so a quick glance at its documentation is well worth a look for those less familiar with this module.
There are some cases where this will not work out of the box though: if you try to call a function with the functools.lru_cache
decorator and an argument which is a dict
for instance …
@functools.lru_cache(maxsize=None)
def cant_cache_me(x):
return x
cant_cache_me(dict(x='y'))
TypeError: unhashable type: 'dict'
The reason why e.g. a dict
is not hashable is because it is mutable. We can of course get around this by using an unmutable type in its place. The frozendict
is a quick pip install frozendict
away, and for a list
where the order does not matter we can use the built-in type frozenset
.
And if the function we want to cache insists on taking e.g. lists and dicts, we can always coerce them back to those types …
import functools
from frozendict import frozendict
def uncachable(*args, **kwargs):
print("In uncachable()")
print(type(args), args)
print(type(kwargs), kwargs)
return "cache_me"
@functools.lru_cache(maxsize=None)
def cached(*args, **kwargs):
print("In cached()")
def unfreezeit(x):
if isinstance(x, frozenset):
return list(x)
if isinstance(x, frozendict):
return dict(x)
return x
print(type(args), args)
print(type(kwargs), kwargs)
args = tuple([unfreezeit(x) for x in args])
kwargs = {k: unfreezeit(v) for k, v in kwargs.items()}
return uncachable(*args, **kwargs)
def test(*args, **kwargs):
print("In test()")
def freezeit(x):
if isinstance(x, list):
return frozenset(x)
if isinstance(x, dict):
return frozendict(x)
return x
print(type(args), args)
print(type(kwargs), kwargs)
args = tuple([freezeit(x) for x in args])
kwargs = {k: freezeit(v) for k, v in kwargs.items()}
return cached(*args, **kwargs)
The first time this is run there is no cache and we get the below output, with 'cache_me'
being returned.
In test()
(['arg_a'],)
{'arg_b': {'key': 'value'}, 'arg_c': 'x'}
In cached()
(frozenset({'arg_a'}),)
{'arg_b': frozendict.frozendict({'key': 'value'}), 'arg_c': 'x'}
In uncachable()
(['arg_a'],)
{'arg_b': {'key': 'value'}, 'arg_c': 'x'}
Subsequent times, however, we can see the cache works as expected, with 'cache_me'
being returned and the below output.
In test()
(['arg_a'],)
{'arg_b': {'key': 'value'}, 'arg_c': 'x'}