Categories
Coding Data Science Python

Singleton Fails with Multiprocessing in Python

A singleton is a class designed to only permit a single instance. They have a bad reputation, but do have (limited) valid uses. Singletons present lots of headaches, and may throw errors when used with multiprocessing in Python. This article will explain why, and what you can do to work around it.

A Singleton in the Wild

Singleton usage is exceedingly rare in Python. I’ve been writing Python code for 5 years and never came across one until last week. Having never studied the singleton design pattern, I was perplexed by the convoluted logic. I was also frustrated by the errors that kept popping up when I was forced to use it.

The most frustrating aspect of using a singleton for me came when I tried to run some code in parallel with joblib. Inside the parallel processes, the singleton always acted like it hadn’t been instantiated yet. My parallel code only worked when I added another instantiation of the singleton inside the function called by the process. It took me a long time to figure out why.

Why Singletons Fail with Multiprocessing

The best explanation for why singletons throw errors with multiprocessing in Python is this answer from StackOverflow.

Each of your child processes runs its own instance of the Python interpreter, hence the singleton in one process doesn’t share its state with those in another process.

https://stackoverflow.com/questions/45077043/make-singleton-class-in-multiprocessing

Your singleton instance won’t be shared across processes.

Working Around Singleton Errors in Multiprocessing

There are several ways to work around this problem. Let’s start with a basic singleton class and see how a simple parallel process will fail.

import time
from joblib import Parallel, delayed
class OnlyOne:
"""Singleton Class, inspired by
https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Singleton.html"""
class __OnlyOne:
def __init__(self, arg):
if arg is None:
raise ValueError("Pretend empty instantiation breaks code")
self.val = arg
def __str__(self):
return repr(self) + self.val
instance = None
def __init__(self, arg=None):
if not self.instance:
self.instance = self.__OnlyOne(arg)
else:
self.instance.val = arg
def __getattr__(self, name):
return getattr(self.instance, name)
def worker(num):
"""Single worker function to run in parallel.
Assume that this function has to do an empty
instantiation of the singleton.
"""
one = OnlyOne()
time.sleep(0.1)
one.val += num
return one.val
# Instantiate singleton
one = OnlyOne(0)
print(one.val)
# Try to run in parallel
# Will hit the ValueError that raises with
# empty instantiation
res = Parallel(n_jobs=-1, verbose=10)(
delayed(worker)(i) for i in range(10)
)
print(res)

In this example, the singleton needs to do an empty instantiation inside your worker function because we want access to some attribute stored in the singleton. We don’t know what value to instantiate it with because that’s the very thing we’re trying to access from the attribute.

Environment Variables

Here’s a simple solution I came up with that worked for me, and might for you as well. The solution here uses environment variables to store state across processes.

import time
from joblib import Parallel, delayed
import os
class OnlyOne:
"""Singleton Class, inspired by
https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Singleton.html
Modified to work with parallel processes using environment
variables to store state across processes.
"""
class __OnlyOne:
def __init__(self, arg):
if arg is None:
raise ValueError("Pretend empty instantiation breaks code")
self.val = arg
def __str__(self):
return repr(self) + self.val
instance = None
def __init__(self, arg=None):
if not self.instance:
if arg is None:
# look up val from env var
arg = os.getenv('SINGLETON_VAL')
else:
# set env var so all workers use the same val
os.environ['SINGLETON_VAL'] = arg
self.instance = self.__OnlyOne(arg)
else:
self.instance.val = arg
def __getattr__(self, name):
return getattr(self.instance, name)
def worker(num):
"""Single worker function to run in parallel.
Assume that this function has to do an empty
instantiation of the singleton.
"""
one = OnlyOne()
time.sleep(0.1)
one.val += num
return one.val
# Instantiate singleton
one = OnlyOne(0)
print(one.val)
# Run in parallel worry-free
res = Parallel(n_jobs=-1, verbose=10)(
delayed(worker)(i) for i in range(10)
)
print(res)

Pass Singleton as Argument

Another solution is to simply pass the instantiated singleton instance as an argument to the worker function.

import time
from joblib import Parallel, delayed
class OnlyOne:
"""Singleton Class, inspired by
https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Singleton.html"""
class __OnlyOne:
def __init__(self, arg):
if arg is None:
raise ValueError("Pretend empty instantiation breaks code")
self.val = arg
def __str__(self):
return repr(self) + self.val
instance = None
def __init__(self, arg=None):
if not self.instance:
self.instance = self.__OnlyOne(arg)
else:
self.instance.val = arg
def __getattr__(self, name):
return getattr(self.instance, name)
def worker(num, one):
"""Single worker function to run in parallel.
"""
time.sleep(0.1)
one.val += num
return one.val
# Instantiate singleton
one = OnlyOne(0)
print(one.val)
# Run in parallel succeeds when one is passed
# as arg to worker
res = Parallel(n_jobs=-1, verbose=10)(
delayed(worker)(i, one) for i in range(10)
)
print(res)
Jared Rand

By Jared Rand

Jared Rand is a data scientist specializing in natural language processing. He also has an MBA and is a serial entrepreneur. He is a Principal NLP Data Scientist at Everstream Analytics and founder of Skillenai. Connect with Jared on LinkedIn.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: