Getting to the bottom of the Python import system
Mon Jan 16 2023E.W.Ayers
What happens when you type
import foo.bar.baz in Python? The answer is really complicated! Read this if you've ever found yourself asking:
"Oh my goodness why can't python find my project?"
"Argh how do I import stuff in test files?"
"Why am I getting inscrutible import errors?"
The complexity comes from:
Modules don't have to be backed by a python file.
Modules can have names that are different to their path on disk.
The same module can be broken across multiple packages.
There is no standard way of thinking about python environments
There is no standard way to package python projects into reusable libraries.
A lot of the implementation details of the module importing system are changed between different versions of python. All of the deprecated constructs are still in there cluttering up
importliband the docs. In this guide I'm going to pretend the deprecated stuff doesn't exist.
Recommended reading is Chapter 5 of the Python Language Reference.
0.1. What is a module?
A python module is a python object with type
ModuleType. Every module has a
__name__ attribute. Modules live in a dictionary called
0.2. What is a package?
A package is a module with a
__path__ attribute. The idea is that a package is a module that can contain other modules. If a module
m is a member of a package
p, we have
m.__package__ == p .
1. What happens when you import?
We'll come back to relative imports.
When you type
import foo.bar.baz as x, this is semantic sugar for
x = importlib.import_module('foo.bar.baz'). If we were to reimplement
import_module, it would look something like this:
sys.modulescache to see if it's already there.
Resolve the module by calling
importlib.util.find_spec(name), to return a thing called a
ModuleSpec. A module spec is a load of metadata about the module and a
Loaderobject that decides how the module object is created and initialised.
Create the module using the given
Add metadata attributes like
__name__to the module
Add it to
Initialise the module.
Return the module
sys.modules.get(name) is None, then it will always throw a
spec.loader.exec_module(m)raises an exception, we delete the module from
It's possible to make a spec without a loader or without the loader having
create_module. Eg. legacy loaders use
load_module. There is some omitted logic for dealing with these cases. If you need to create a module from a spec (ie everything before the
sys.modules[name] = moduleline), you should use
1.1. What is
How this works is really complicated. The basic task is to take a module name and spit out a
ModuleSpec, which is all of the information needed to load a module into the python runtime.
Let's start by stating the usual path that
Start with the module name
Make sure parent modules
If there is a parent module set
paths = foo.bar.__path__or use
sys.pathotherwise. The paths are directories that the import system should look in to find modules. Eg for me
numpy.__path__ = ['~/.pyenv/versions/3.10.6/lib/python3.10/site-packages/numpy'].
sys.pathis your site-packages directory and the paths of any folders you have done
The system looks in all of the
pathsdirectories for either
If it finds one of those it returns a
ModuleSpecwith the loader being a
In the case of
__init__.py, the module is a package (ie the module's
__path__attribute is set to be the directory of the file)
1.1.2. Longer Summary
Start with the module name
Make sure parent modules
If there is a parent module set
paths = foo.bar.__path__or
For each 'meta finder' in
Usually, this falls through to the last finder in the
PathFinderruns for each
p in pathsand each 'hook'
hook in sys.path_hooks:
hook(p).find_spec("foo.bar.baz")and returns the first one that doesn't throw an
Usually, this falls through to a
FileFinder(p).find_spec('foo.bar.baz')which does the following.
Get the tail module:
"baz". We succeed if any of the following files exist in the
baz/__init__.py(or a directory
baz/(called a 'namespace module'), we'll come back to this case)
ModuleSpecis returned with the loader being a
SourceFileLoader. If the extension above was
1.1.3. The Gory Details
There is a list of
MetaPathFinder objects living in
sys.meta_path. You can modify
sys.meta_path to include your own things. A
MetaPathFinder has one method
find_spec that returns a module spec given a module name and an optional list of filepaths to look at to find the module
importlib.util.find_spec will run through all of the finders in
sys.meta_path, making sure that parent packages (ie, modules with a
__path__ attribute) are imported first. If there is a parent module (eg
foo is the parent package of
foo.__path__ is passed as the
path argument to the finder. The pseudocode for this is below.
There are lots of
sys.meta_path that do various things, and libraries like to add their own too. The main, fallback finder is called
PathFinder (source) and essentially does the following (+ caching + error handling + legacy + 'namespaces'):
So, there is a list of functions called
sys.path_hooks of type
List[Callable[[str], PathEntryFinder] where each returned
PathEntryFinder is yet another abstract class that you have to call
find_spec on, this time with no
sys.path_hooks, the default two of these 'path hooks' are a zip importer and a
FileFinder is the main one. A
FileFinder is initialised with a
path : str which is the directory that the finder is in charge of searching.
FileFinder is also initialised with a list of extension suffixes (x =
".pyc") and loaders (
FileFinder looks for a file
p/baz/__init__.x and returns the
ModuleSpec with the relevant loader.
1.1.4. How to extend
So, if you want to extend the module loading system with your own stuff, you can:
sys.path_hooksto use your own
PathEntryFinders. Do this when you want to be given a path
pto the package, but do some extra logic beyond looking for
baz.py. Or if you want to return custom loaders for your own fancy extension.
sys.meta_pathto use your own
MetaPathFinder. Do this when you want to add custom logic for finding modules. Eg if you wanted to make a finder that downloaded from URLs instead of files.
1.2. Why is this so complicated?
Caching: Each of the stages I outlined above also has a caching stage. Additionally, you need mechanisms to invalidate the cache so you can do live-reload operations.
Legacy: there used to just be one finder class called
Finder, but this wasn't good enough because you need to be able to use different finders for different cases, so an extra layer of meta-finders was added to find the finders.
Nitpicky edge cases:
loading modules from non-python source
loading modules direct from archives
lots of different places where packages can be stored: environments, conda, the internet etc.
2. How does the import system decide to add
Given any module, you can make it a package by simply adding a
__path__ attribute. However if your module is an
__init__.py file, it will automatically add
__path__ to be the parent directory.
3. What about relative imports?
A relative import is an import where the module name being imported starts with a dot. For example
In the above case, you take the current module
m that is running
import .foo; and you take the parent module name:
m.__package__ (caveats); and you prepend that to
.foo and do an absolute import.
If there are multiple dots as in
import ..foo, you repeat the parent-finding process for the number of dots present.
This definition of relative import sucks because it means that in order to use them your python files need to be inside a package in order to import from each other. The shortcut way to do this is to just add
__init__.py folders everywhere.
I recommend never using relative imports except inside of
__init__.py files. It's just not worth it.
4. What are namespace packages?
A namespace package is a python package that doesn't have an associated module (ie no
__init__.py). The idea is you can split a package across multiple files. See this Stack Overflow answer for more detail. Adding namespace packages complicates the logic for
When you execute a python file with
python foo.py, the given file is not loaded as the module
foo. Instead, it is loaded as a special module called
__main__. The main problem that this causes is that it breaks relative imports, since the
__main__ module does not have a
__package__ attribute set. The main recommendation seems to be that you should just avoid using relative imports.
6. Importing resources
[todo] this section is still under construction [todo]
Another cool thing that you can do with the Python import system is 'import' files that are not Python files. You can import data files or executable binaries.
Usually, if you want to get a file from a Python script you will call
open('path/to/file'), but this assumes that you know where the file is on disk. By 'importing' files, you can ensure that the files are present wherever your Python package is called from, even if it is downloaded from PyPI.
There are two sites that told me this existed:
importlib-resources which looks semi-official. I think what happened is it used to be its own library that got integrated into core.
I'll try to keep with the example given in 'importlib-resources'. We have some folder structure:
foo.py I can write:
7. Module resolution failures that always get me
7.1. Basic importing from a directory is broken
Suppose our working directory looks like this:
If I run
python asdf/b.py, it will refuse to resolve
c.py (no module named
If I run
python a.py, it will be ok!
One answer is to replace the import in
from c import X. Then you can run
python asdf/b.py and it's ok. But, now, if add a line
from asdf.b import Y to
a.py, we will get "no module named
I can't see how this is anything other than a flaw in Python. There is no way to import between the directories that doesn't break.
I usually get around this by making the root project folder a package with a
pyproject.toml, and then running
pip install -e .. But it's so miserable that I have to do that.