Spooky Python File Descriptors
After a bit of a recent bug hunt I wanted to share what I found because it caught me up thinking too
much in Python abstractions and not enough in what's actually happening. I found most people I
explained it too were also kind of surprised so I thought it might be of interest more broadly. It
also helped me understand that context blocks (
with open(…) as file:) when handling
open files are generally unnecessary. Also, since it's Halloween here in Canada, I couldn't help but
pick a silly title for this post.
When you open a file in Python, it returns an object. That object proxies between Python's and your
operating system's concept of a file. On POSIX-like operating systems, that takes the form of an
open(2) function call using libC. That function returns an integer, the so called file
descriptor. It's an ID the system gives your code to index into the file description table for your
process within the kernel which holds all the actual file specific metadata it needs to operate on
that file for you.
<_io.TextIOWrapper name='/dev/zero' mode='r' encoding='UTF-8'>
Nothing new there for anyone who's used Python to read files before. The spooky thing is what happens if you close the file descriptor and reuse the variable.
import os file = open("/dev/zero") file.read(5) os.close(file.fileno()) file = open("/dev/zero") file.read(5) os.close(file.fileno())
Take a second to think about what happens if you run this in the Python
REPL. Once you've thought about what should happen,
go ahead and run it line by line. If you don't have a
/dev/zero (like if you're running
on Windows) you can replace those filenames with any file on your system.
The first block does pretty much what you'd expect. It opens
/dev/zero and then reads
up to five characters from it. The REPL shows us
the return value of
file.read(5) which is the string
The next line is also not too special.
file.fileno() returns the integer file
os.close() calls the operating system's
passing that integer to it. This is something you probably shouldn't do in Python. But why?
Well, the next block, identical to the first shows us what can happen. If you ran the code above,
when you reached the second
file.read(5), you probably saw it raised an exception.
Traceback (most recent call last): File "<stdin>", line 1, in <module> OSError: [Errno 9] Bad file descriptor
Why? In Python, file like object destructors automatically close their file descriptor. The bug wasn't actually on the line where we read the file, but on the line above where we open a new file and simultaneously release the last reference we have to the previous file object by assigning the new file to the variable holding the old reference.
What happens is you first asked the operating system for a file descriptor in the first block. The
operating system returns some integer for the file, let's say
3. Python creates a new
object and sets that number as a property of the object so it can perform operations on that file
using the normal libC file function calls (
When you close the file descriptor directly, the Python object doesn't know the file is now closed.
Not only that, but the operating system is now free to reuse that file descriptor. That's exactly
what happens when you open the file again the second time. The operating system sees the lowest
available file descriptor is
3 and issues it to Python which creates another object and
stores it as a property. Unfortunately for us, now there are two objects that both think they own
Now the assignment takes place. Python decrements the reference count on the old object and assigns
the new one to the variable file. At this point, the runtime can garbage collect the old
object. To help prevent resource leaks, Python's file objects automatically call the
close(2) function on the descriptor for us since it thinks it's still open. One benefit
is this means you can't leak file descriptors by not using the context blocks on files (
open(…):). When the function that opened a file returns, the file will be destructed and
The stage is set. The problem is our new object has suddenly had it's descriptor closed by a different object because they both think they own the same resource. At this point, when you try and read from the new file object, the operating system will return an error because the file is already closed.
Creepy Crawly Bugs
In the above it's pretty simple to avoid an issue. Honestly it's kind of a weird set of
operations to both manually close the file descriptor (instead of calling
and reuse a variable with an unrelated reference like this. I've never run into this issue before.
What I did have to figure out though shares the same file descriptor alias problem. It's async!
Python async has become all the rage it seems. There are a bunch of gotchas with Python's asyncio that can often trip people up. One of them is that forking and async don't really mix well. First, it's often not that you're forking, but some other library or tool is doing it for you. Second, many seem to think Python's event loop is automatically managed for them. That there's always an event loop available to schedule work. Third, many people writing libraries want to support both async and sync code. They often do so shipping the code as async with a set of thin wrapper functions around their async code.
import os import asyncio async def foo(): print("bar") loop = asyncio.new_event_loop() loop.run_until_complete(foo()) if os.fork(): exit() loop.run_until_complete(foo())
If you copy this to a file and run it, you'll see you end up with the same OSError: [Errno 9] Bad file descriptor exception. What's going on?
Well, asyncio is built on top of your operating system's socket file descriptor event queue. On
BSD based systems (like MacOS) that's
kqueue(2). On Linux,
epoll(2). For everything else, there's
select(2). You can see how it all works looking at the source of the cPython select module.
What this means though is that the python async loop is built on top of a file descriptor. One that can be closed. In fact, the select module explicitly sets it up to be closed when the program forks. It likely does this to prevent unaware children being given events they probably don't expect and prevent resource leaks. The problem is when libraries that previously weren't async suddenly start using it in a synchronous codebase that uses forks.
Often I see these libraries have a pattern that in it's simplest form looks something like this:
import asyncio async def async_foo(): print("bar") def foo(): loop = asyncio.get_event_loop() return loop.run_until_complete(async_foo())
The problem is that the default event loop may not exist. One way to prevent this as a library
author is to not assume there's a global event loop. Definitely don't use
asyncio.run(). That function will close the global event loop when it finishes.
Instead, create a new loop of your own.
import asyncio async def async_foo(): print("bar") def foo(): loop = asyncio.new_event_loop() try: return loop.run_until_complete(async_foo()) finally: loop.close()
A bit of a tip if you're maintaining both sync and async classes or submodules is to generate the entire synchronous set at import time so you don't have to maintain two copies of the same setup. For example:
import asyncio import inspect import functools class AsyncFoo: """ Main way to interact with the Foo. """ async def foo(self): print("bar") class Foo: """ Syncronous wrapper class for interacting with the Foo. """ def __init__(self): self._async = AsyncFoo() @staticmethod def _setup_async_proxy(): """ Statically proxy async methods of AsyncFoo in Foo at import so unittest.mock.patch() can use autospec. """ def async_proxy(name, method): @functools.wraps(method) def wrapper(self, *args, **kwargs): loop = asyncio.new_event_loop() try: func = getattr(self._async, name) return loop.run_until_complete(func(*args, **kwargs)) finally: loop.close() return wrapper for name, method in vars(AsyncFoo).items(): if name != "_" and inspect.iscoroutinefunction(method): setattr(Foo, name, async_proxy(name, method)) Foo._setup_async_proxy()
It'd be really great if coroutine objects could get a method like
.run_sync() that made
it easier for synchronous code to execute asynchronous code, possibly bypassing all the asynchronous
queuing altogether and just blocking. Then you could run
val = foo().run_sync() in
synchronous code and we could skip all this.
Anyway, all the best. Hope this helped you understand much more of what's happening inside asyncio and avoid a bit of a puzzling situation. Maybe even simplify your work maintaining dual ecosystems.