Getting to the bottom of the Python import system
Mon Jan 16 2023E.W.Ayers
What happens when you type import foo.bar.baz
in Python? The answer is really complicated! Read this if you've ever found yourself asking:
"Oh my goodness why can't python find my project?"
"Argh how do I import stuff in test files?"
"Why am I getting inscrutible import errors?"
The complexity comes from:
Modules don't have to be backed by a python file.
Modules can have names that are different to their path on disk.
The same module can be broken across multiple packages.
There is no standard way of thinking about python environments
There is no standard way to package python projects into reusable libraries.
A lot of the implementation details of the module importing system are changed between different versions of python. All of the deprecated constructs are still in there cluttering up
importlib
and the docs. In this guide I'm going to pretend the deprecated stuff doesn't exist.
Recommended reading is Chapter 5 of the Python Language Reference.
0.1. What is a module?
A python module is a python object with type ModuleType
. Every module has a __name__
attribute. Modules live in a dictionary called sys.modules
.
0.2. What is a package?
A package is a module with a __path__
attribute. The idea is that a package is a module that can contain other modules. If a module m
is a member of a package p
, we have m.__package__ == p
.
1. What happens when you import?
We'll come back to relative imports.
When you type import foo.bar.baz as x
, this is semantic sugar for x = importlib.import_module('foo.bar.baz')
. If we were to reimplement import_module
, it would look something like this:
Check the
sys.modules
cache to see if it's already there.Resolve the module by calling
importlib.util.find_spec(name)
, to return a thing called aModuleSpec
. A module spec is a load of metadata about the module and aLoader
object that decides how the module object is created and initialised.Create the module using the given
Loader
objectAdd metadata attributes like
__name__
to the moduleAdd it to
sys.modules
Initialise the module.
Return the module
In pseudo-python:
Caveats:
If
sys.modules.get(name) is None
, then it will always throw aModuleNotFound
errorIf
spec.loader.exec_module(m)
raises an exception, we delete the module fromsys.modules
before reraising.It's possible to make a spec without a loader or without the loader having
create_module
. Eg. legacy loaders useload_module
. There is some omitted logic for dealing with these cases. If you need to create a module from a spec (ie everything before thesys.modules[name] = module
line), you should useimportlib.util.module_from_spec
(source).
1.1. What is importlib.util.find_spec
doing?
How this works is really complicated. The basic task is to take a module name and spit out a ModuleSpec
, which is all of the information needed to load a module into the python runtime.
1.1.1. Summary
Let's start by stating the usual path that find_spec
takes:
Start with the module name
"foo.bar.baz"
Make sure parent modules
foo
andfoo.bar
are imported.If there is a parent module set
paths = foo.bar.__path__
or usesys.path
otherwise. The paths are directories that the import system should look in to find modules. Eg for menumpy.__path__ = ['~/.pyenv/versions/3.10.6/lib/python3.10/site-packages/numpy']
.sys.path
is your site-packages directory and the paths of any folders you have donepip -e
on.The system looks in all of the
paths
directories for eitherbaz.py
orbaz/__init__.py
.If it finds one of those it returns a
ModuleSpec
with the loader being aSourceFileLoader
.In the case of
__init__.py
, the module is a package (ie the module's__path__
attribute is set to be the directory of the file)
1.1.2. Longer Summary
Start with the module name
"foo.bar.baz"
Make sure parent modules
foo
andfoo.bar
are imported.If there is a parent module set
paths = foo.bar.__path__
orsys.path
otherwise.For each 'meta finder' in
sys.meta_path
, runfind_spec("foo.bar.baz", paths)
.Usually, this falls through to the last finder in the
sys.meta_path
list calledPathFinder
.PathFinder
runs for eachp in paths
and each 'hook'hook in sys.path_hooks
:hook(p).find_spec("foo.bar.baz")
and returns the first one that doesn't throw anImportError
or returnNone
.Usually, this falls through to a
FileFinder(p).find_spec('foo.bar.baz')
which does the following.Get the tail module:
"baz"
. We succeed if any of the following files exist in thep
directory:baz.py
orbaz/__init__.py
(or a directorybaz/
(called a 'namespace module'), we'll come back to this case)A
ModuleSpec
is returned with the loader being aSourceFileLoader
. If the extension above was.pyc
then aSourcelessFileLoader
is used.
1.1.3. The Gory Details
There is a list of MetaPathFinder
objects living in sys.meta_path
. You can modify sys.meta_path
to include your own things. A MetaPathFinder
has one method find_spec
that returns a module spec given a module name and an optional list of filepaths to look at to find the module
importlib.util.find_spec
will run through all of the finders in sys.meta_path
, making sure that parent packages (ie, modules with a __path__
attribute) are imported first. If there is a parent module (eg foo
is the parent package of foo.bar
), foo.__path__
is passed as the path
argument to the finder. The pseudocode for this is below.
There are lots of MetaPathFinder
s in sys.meta_path
that do various things, and libraries like to add their own too. The main, fallback finder is called PathFinder
(source) and essentially does the following (+ caching + error handling + legacy + 'namespaces'):
So, there is a list of functions called sys.path_hooks
of type List[Callable[[str], PathEntryFinder]
where each returned PathEntryFinder
is yet another abstract class that you have to call find_spec
on, this time with no path
argument.
In sys.path_hooks
, the default two of these 'path hooks' are a zip importer and a FileFinder
(source). FileFinder
is the main one. A FileFinder
is initialised with a path : str
which is the directory that the finder is in charge of searching. FileFinder
is also initialised with a list of extension suffixes (x = ".py"
, ".pyc"
) and loaders (SourceFileLoader
, SourcelessFileLoader
). FileFinder
looks for a file p/baz.x
or p/baz/__init__.x
and returns the ModuleSpec
with the relevant loader.
1.1.4. How to extend find_spec
?
So, if you want to extend the module loading system with your own stuff, you can:
set
sys.path_hooks
to use your ownPathEntryFinder
s. Do this when you want to be given a pathp
to the package, but do some extra logic beyond looking forbaz/__init__.py
orbaz.py
. Or if you want to return custom loaders for your own fancy extension.set
sys.meta_path
to use your ownMetaPathFinder
. Do this when you want to add custom logic for finding modules. Eg if you wanted to make a finder that downloaded from URLs instead of files.
1.2. Why is this so complicated?
Caching: Each of the stages I outlined above also has a caching stage. Additionally, you need mechanisms to invalidate the cache so you can do live-reload operations.
Legacy: there used to just be one finder class called
Finder
, but this wasn't good enough because you need to be able to use different finders for different cases, so an extra layer of meta-finders was added to find the finders.Nitpicky edge cases:
namespace modules
packages
loading modules from non-python source
loading modules direct from archives
lots of different places where packages can be stored: environments, conda, the internet etc.
2. How does the import system decide to add __path__
?
Given any module, you can make it a package by simply adding a __path__
attribute. However if your module is an __init__.py
file, it will automatically add __path__
to be the parent directory.
3. What about relative imports?
A relative import is an import where the module name being imported starts with a dot. For example import .foo
.
In the above case, you take the current module m
that is running import .foo
; and you take the parent module name: m.__package__
(caveats); and you prepend that to .foo
and do an absolute import.
If there are multiple dots as in import ..foo
, you repeat the parent-finding process for the number of dots present.
This definition of relative import sucks because it means that in order to use them your python files need to be inside a package in order to import from each other. The shortcut way to do this is to just add __init__.py
folders everywhere.
I recommend never using relative imports except inside of __init__.py
files. It's just not worth it.
4. What are namespace packages?
A namespace package is a python package that doesn't have an associated module (ie no __init__.py
). The idea is you can split a package across multiple files. See this Stack Overflow answer for more detail. Adding namespace packages complicates the logic for find_spec
.
5. Sadly, __main__
.
When you execute a python file with python foo.py
, the given file is not loaded as the module foo
. Instead, it is loaded as a special module called __main__
. The main problem that this causes is that it breaks relative imports, since the __main__
module does not have a __package__
attribute set. The main recommendation seems to be that you should just avoid using relative imports.
6. Importing resources
[todo] this section is still under construction [todo]
Another cool thing that you can do with the Python import system is 'import' files that are not Python files. You can import data files or executable binaries.
Usually, if you want to get a file from a Python script you will call open('path/to/file')
, but this assumes that you know where the file is on disk. By 'importing' files, you can ensure that the files are present wherever your Python package is called from, even if it is downloaded from PyPI.
There are two sites that told me this existed:
importlib-resources which looks semi-official. I think what happened is it used to be its own library that got integrated into core.
I'll try to keep with the example given in 'importlib-resources'. We have some folder structure:
Now in foo.py
I can write:
7. Module resolution failures that always get me
7.1. Basic importing from a directory is broken
Suppose our working directory looks like this:
If I run python asdf/b.py
, it will refuse to resolve c.py
(no module named asdf
).
If I run python a.py
, it will be ok!
One answer is to replace the import in b.py
with from c import X
. Then you can run python asdf/b.py
and it's ok. But, now, if add a line from asdf.b import Y
to a.py
, we will get "no module named c
.
I can't see how this is anything other than a flaw in Python. There is no way to import between the directories that doesn't break.
I usually get around this by making the root project folder a package with a pyproject.toml
, and then running pip install -e .
. But it's so miserable that I have to do that.