A reentrant context manager in Python

A Python context manager caters for the boilerplate wrapping a resource to offer safety and convenient (re-)use. This protocol ensures that once the context is initialised, it will be torn down whatever happens. Examples of resource handling are input/output operations, session management, thread locking, etc. In this article, we will take a focus peek at one "honking great idea": a reentrant context manager.

There has been many articles and pages of documentation to explain why context managers are a good thing, and why you would want to always handle your resources with them.

Drawing from classic computer science principles, we propose here to extend the context manager with the reentrant principle to achieve a simple, yet powerful, resource management.

Our case study: a data store

Suppose you are writing a library for a (remote) service. It could be that you want to offer a nice Python interface for your own service, or simply that you want to abstract the service itself from your business code — a sound idea.

Our example service is a data store. A user can push, and then pull, arbitrary data from it. Any data that is stored there is associated to a unique identifier. Later, and at one's request, the identifier can be used to pull the data out of the store once and for all. The store will forget about that piece of data.

# datastore.py
from abc import ABC, abstractmethod


Content = bytes
Identifier = str


class Datastore(ABC):

    @abstractmethod
    def push(self, content: Content) -> Identifier:
        ...

    @abstractmethod
    def pull(self, identifier: Identifier) -> Content:
        ...

The interface is simple and clean. Datastore is an abstract class with two public (abstract) methods.

Let's create an in-memory datastore to mock the service and trace the interface calls:

# mock_datastore.py
import logging
from uuid import uuid4

from datastore import Content, Datastore, Identifier


class InMemoryDatastore(Datastore):

    _log = logging.getLogger('InMemoryDatastore')
    _store = []

    def push(self, content):
        identifier = str(uuid4())
        self._log.debug('PUSH: %s with %s bytes', identifier, len(content))
        self._store[identifier] = content
        return identifier

    def pull(self, identifier):
        if identifier not in self._store:
            raise KeyError(f"Unknown identifier {identifier!r}")
        self._log.debug(f'PULL: %s', identifier)
        content = self._store.pop(identifier)
        return content

We can now store and retrieve content from a datastore:

>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> from mock_datastore import InMemoryDatastore as MockDatastore
>>> client = MockDatastore()
>>> i = client.push(b'an entry')
DEBUG:InMemoryDatastore:PUSH: 53d387c9-8d96-4ca8-a1c5-f6d5efffc572 with 8 bytes
>>> client.pull(i)
DEBUG:InMemoryDatastore:PULL: 53d387c9-8d96-4ca8-a1c5-f6d5efffc572
b'an entry'

… for authenticated users

Say there is a new requirement: all actions undertook against the data store must be authenticated. The client interface should handle this and provide a construct to deal with sessions. There is no need for the interface consumer to have to manually log in and out, we will not change the Datastore interface. The library we are writing is supposed to help the consumer, not clutter its interface.

We will define the authentication methods as "private/protected" and use them to push and pull data to the data store, from the implementation classes themselves. To avoid repeating ourselves, we simply define a new interface:

# authenticated_datastore.py

from abc import abstractmethod
from contextlib import contextmanager

from datastore import Datastore


AuthToken = str


class AuthenticatedDatastore(Datastore):

    @abstractmethod
    def push(self, content):
        ...

    @abstractmethod
    def pull(self, identifier):
        ...

    @abstractmethod
    def _login(self, credentials) -> AuthToken:
        ...

    @abstractmethod
    def _logout(self, token: AuthToken):
        ...

Good programmers are lazy, they say. Programmers always want to make things silly simple and efficient. I could not agree more, especially if there is an exposed interface at play.

The implementation classes have to remember to call _logout() every time they call _login(). What happens to us when we burden ourselves with hand-woven resource management? Things break, memory leaks, and we become sad.

The authentication as we just defined it is a resource that we manage: we have to create (to log in) and destroy (to log out) each instance we handle (a session). Let's take a look at a nifty solution Python offers us: the context manager.

What a context manager is, by example

The Python documentation describes a context manager as:

[…] an object that defines the runtime context to be established when executing a with statement. The context manager handles the entry into, and the exit from, the desired runtime context for the execution of the block of code.

Said differently, the scope of a context manager is tied to an object life-cycle. But instead of the regular initialisation and destruction of the object, a different protocol is used (aptly named… Context Manager). It defines the interface for entry and exit of the scope.

We described earlier our resource as an authentication session. Its actions were to log in and to log out. Simply translated to Python into a new mock:

from authenticated_datastore import AuthToken, AuthenticatedDatastore


class InMemoryAuthenticatedDatastore(InMemoryDatastore):

    _log = logging.getLogger('InMemoryAuthenticatedDatastore')
    credentials = 'foo:bar'

    @contextmanager
    def _session(self):
        session_token = self._login(self.credentials)
        try:
            yield
        finally:
            self._logout(session_token)

    @abstractmethod
    def _login(self, credentials) -> AuthToken:
        token = hash(credentials)
        self._log.debug('LOGIN: %s', token)
        return token

    @abstractmethod
    def _logout(self, token: AuthToken):
        self._log.debug('LOGOUT: %s', token)

    def push(self, content):
        with self._session():
            return super().push(content)

    def pull(self, identifier):
        with self._session():
            return super().push(content)

The session context manager is built so that once we are successfully logged in — when we enter the scope of the with block — we will always log out no matter what happens. Indeed, thanks to the generator-like approach in our context manager, we yield within the scope of the try block. We exit this block only when the yield returns — that is, when the with scope is exited from the calling block.

Basically, that means that even if an exception is raised when we are within the scope of the session context manager, the session will be torn down before the exception ripples up:

>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> from mock_datastore import InMemoryAuthenticatedDatastore as MockDatastore
>>> client = MockDatastore()
>>> i = client.push(b'an entry')
DEBUG:InMemoryAuthenticatedDatastore:LOGIN: 5444710130385190768
DEBUG:InMemoryAuthenticatedDatastore:PUSH: f0d4dc6b-38fa-485d-b96d-0ced4c10c382 with 8 bytes
DEBUG:InMemoryAuthenticatedDatastore:LOGOUT: 5444710130385190768
>>> c = client.pull(i)
DEBUG:InMemoryAuthenticatedDatastore:LOGIN: 5444710130385190768
DEBUG:InMemoryAuthenticatedDatastore:PULL: f0d4dc6b-38fa-485d-b96d-0ced4c10c382
DEBUG:InMemoryAuthenticatedDatastore:LOGOUT: 5444710130385190768
>>> print(c)
b'an entry
>>> client.pull(i)
DEBUG:InMemoryAuthenticatedDatastore:LOGIN: 5444710130385190768
DEBUG:InMemoryAuthenticatedDatastore:LOGOUT: 5444710130385190768
Traceback (most recent call last):
  ...
    raise KeyError(f"Unknown identifier {identifier!r}")
KeyError: "Unknown identifier 'f0d4dc6b-38fa-485d-b96d-0ced4c10c382'"

Performing many operations

As we just saw, each operation handles its own session scope. From an interface point of view, this is great, the datastore library is responsible for its own success and the library clients do not have to know about the session details. This also allows us to configure a datastore service (Datastore and its children) at instantiation time and then forget about the service details. Perfect for dependency injection.

But calling many operations sequentially is far from ideal. Each operation will initiate a new session, perform its action, and then close the session. In our mono-threaded example, we are taking the network hit every single time we request a service operation.

Conversely, when a consumer wants to perform batch operations, it would be beneficial to open only one session, perform all operations, and only then close the session.

Let us see what trace we get if we try this with our current session context manager:

>>> from mock_datastore import InMemoryDatastore as MockDatastore
>>> client = MockDatastore()
>>> with client._session():
...     i = client.push('another entry')
...     client.pull(i)
... 
DEBUG:InMemoryAuthenticatedDatastore:LOGIN: 5444710130385190768
DEBUG:InMemoryAuthenticatedDatastore:LOGIN: 5444710130385190768
DEBUG:InMemoryAuthenticatedDatastore:PUSH: 3034f5e6-3ee8-42cb-b22e-48901afb1097 with 13 bytes
DEBUG:InMemoryAuthenticatedDatastore:LOGOUT: 5444710130385190768
DEBUG:InMemoryAuthenticatedDatastore:LOGIN: 5444710130385190768
DEBUG:InMemoryAuthenticatedDatastore:PULL: 3034f5e6-3ee8-42cb-b22e-48901afb1097
DEBUG:InMemoryAuthenticatedDatastore:LOGOUT: 5444710130385190768
'another entry'
DEBUG:InMemoryAuthenticatedDatastore:LOGOUT: 5444710130385190768

The current behaviour is clearly not what we want, we now even have one more session wrapping everything else!

In a lucky scenario, the remote service behaves nicely and re-use the existing session whenever we try to authenticate ourselves again. But unless this is an explicit service feature, we really should cater for this locally, in the library.

Re-using the existing session, or the reentrant context manager

The idea driving our use-case is that we keep a session open until it has served its purpose. We can only achieve this by remembering that there is an open session already. And when a request for a session pops up, through a call to the session context manager, we can safely re-use the existing session instead of asking for a new one.

In computer science, this concept is famously applied to locks. A simple lock can be acquired once. Before any new acquiring, even by the current holder, the simple lock must be released. On the other hand, a reentrant lock can be acquired time and again by its current holder.

Our dummy in-memory datastore needs an update:

from typing import Optional


class InMemoryReentrantDatastore(InMemoryAuthenticatedDatastore):

    _log = logging.getLogger('InMemoryReentrantDatastore')
    _session_token: Optional[AuthToken] = None

    @contextmanager
    def _session(self):
        if self._session_token:
            yield
            return

        self._session_token = self._login(self.credentials)
        try:
            yield
        finally:
            self._logout(self._session_token)

If a session token exists, it means that we are already within the scope of the session context manager. Thus we simply yield to allow the caller to complete, and then return from the context manager. The context manager that initiated the session token is the only one allowed to log out.

>>> from mock_datastore import InMemoryReentrantDatastore as MockDatastore
>>> client = MockDatastore()
>>> with client._session():
...     i = client.push('another entry')
...     client.pull(i)
...
DEBUG:InMemoryReentrantDatastore:LOGIN: -1271783380944662680
DEBUG:InMemoryReentrantDatastore:PUSH: db578045-674b-463f-9475-156315fcce29 with 13 bytes
DEBUG:InMemoryReentrantDatastore:PULL: db578045-674b-463f-9475-156315fcce29
'another entry'
DEBUG:InMemoryReentrantDatastore:LOGOUT: -1271783380944662680

The session scope is now respected: there is only one session created for the InMemoryReentrantDatastore._session() context manager scope, no matter how many calls to methods requiring a session.

We should note that the current interface does not expose the session context manager as public to the library clients. Considering our contrived example, we can simply flatten the whole hierarchy and keep only one interface and one in-memory implementation. This is left as an exercise.

Conclusion

Python context managers are an easy way to abstract resource management in libraries. Their simple form flows and is enough as long as they remain independent. Upgrading a context manager to be reentrant is a simple and efficient way to allow a scope to be shared, without changing the context semantics.

Decorators are another powerful concept in Python. Our example uses the contextlib.contextmanager decorator to wrap a function and make it into a context manager.

Would you like to use the session context manager as a decorator too? Go on, try it! Try both approaches and see which one reads and writes better from the library client.