Wrapping your workflows in __main__ protection
26 Feb 2025 - Ben Clifford
Starting soon, you will probably need to start wrapping the main body of your parsl workflows with a test, if __name__ == "__main__"
. I’ll talk about what I mean first, and then explain why later in this post.
Many people do this already, and many users get this done for them by a higher-level framework.
I expect that the affected users are those who directly write scripts.
Here’s an example code change:
Before:
def my_app(x):
return x+1
with parsl.load():
print(my_app(10).result())
Becomes:
def my_app(x):
return x+1
if __name__ == "__main__":
with parsl.load():
print(my_app(10).result())
Roughly speaking, imports and definitions should live outside the protected block. Actions should live inside the protected block.
I opened issue #3723 for discussion of this change a few weeks ago, after generally positive support on Parsl’s Slack.
What will happen if I don’t make this change?
As we make non-backward-compatible changes to Parsl, you will start seeing your workflow mysteriously starting multiple times, inside the different helper processes that Parsl starts up.
Why?
Parsl makes extensive use of the Python multiprocessing
module, which helps you (or rather, Parsl) run Python
code in multiple operating system processes: for example, there are processes to support task submission (the
interchange in the high throughput executor and the Work Queue submit process) and several parts of the monitoring
system to manage message routing and database access.
Python contains multiple implementations of the multiprocessing
primitives (processes, queues, locks, etc.)
with each implementation having a different mechanism for starting a new Python process that looks “right”.
The options are detailed in the multiprocessing section of the Python manual.
The traditional default for Python on Linux has been the fork
start method, and a lot of Parsl was built assuming
that is the case. PR #2099 made this more explicit in
the codebase.
Unfortunately, the fork
start method doesn’t work very well in the situations Parsl wants it to, and it is one
of the leading causes of “mysterious” hangs in our test system (and so probably for users who silently endure
those hangs in the real world.)
I’m not alone in this opinion: Python will move to a new linux default, spawn
, with Python 3.14; MacOS has used
spawn
as a default for a long time, and Parsl specifically overrides this to get fork
behaviour; Windows
does not support fork
at all (and this is one of the immediate blockers if you try to use Parsl on Windows).
Moving Parsl away from fork
multiprocessing
Parsl has already moved away from fork
multiprocessing in a couple of places:
PR #2983 switched the High Throughput Executor worker pool to use the spawn context internally;
and PR #3463 starts
the interchange as a new command line process avoiding multiprocessing
entirely.
Those were the relatively easy pieces.
What remains is more challenging to switch primarily because those processes are in sections of the code that need users to add the
boilerplate __name__ == "__main__"
test that I discussed at the start of this post.
That requirement comes down to how the different multiprocessing start methods make new processes come into existence.
The fork
method uses the unix fork()
to make a quasi-duplicate of the currently running process: for example,
that means all of the Python objects in memory and imported modules are duplicated in the new process.
This does not compose well with threads.
A common hang in Parsl is when some thread is logging a message using Python’s logging
module at the point that
some other thread forks a new multiprocessing
process - the new process launches with a copy of the logging locks,
locked because some code is doing logging. Then, any log statement in the new process will hang, waiting for that
logging lock to become unlocked: the thread that was doing logging isn’t running in this new process, and it will
never unlock the copy of the lock in this new process.
I’ve also seen this lock-related behaviour at libc level, with name service resolution, and it is a fundamental architectural property (or flaw) of trying to use multiprocessing and threads at the same time.
Since Python 3.12, Python has raised deprecation warnings when a fork happens in a process which also has multiple threads - although this has always been a problem, but previously less aggressively reported.
The spawn
method does not have this behaviour: it starts a fresh Python process and initialises everything from
scratch. So there’s a completely new logging system instance with fresh, clean locks. But in order to do that,
everything needs to be reloaded: modules need to be re-imported, and most relevant here, the original
workflow script needs to be reloaded in that new process.
And so, that means when using spawn
, the original workflow script needs to not always run the workflow: when it
is in the original process, it should run the workflow. When loaded in other multiprocessing processes,
it shouldn’t run the workflow - instead it should only do imports and definitions.
That’s what if __name__ == "__main__":
asks: are we running in the original process (where you would expect
this code to run) or are we being re-imported elsewhere?
Conclusion
Add that if
statement. It won’t hurt with fork
multiprocessing, and it will reduce the surprise as Parsl
moves to spawn
multiprocessing, something that will probably happen to smash down on our last remaining known
race condition hangs.