Details
-
Type:
RFC
-
Status: Adopted
-
Resolution: Unresolved
-
Component/s: DM
-
Labels:None
Description
When a package-level __init__.py file lifts all symbols from a subpackage (not just a simple, one-file module) with from .subpackage import * , the symbols lifted will include not just symbols explicitly imported in the subpackage's __init__.py, but any modules in the subpackage that do not start with a leading underscore.
(See a recent discussion on slack as an example).
This is rarely a serious problem, but I'd maintain that it's always at least a minor one: our publicly-exported symbols include many that were intended to be private, and most of our classes and functions have at least two (often more) valid import paths, with no clear indication which is preferred (e.g. lsst.foo.SomeThing or lsst.foo.someThing.SomeThing). The result is a proliferation of names that refer to the same thing, and more fragile code as names that include implementation details are used instead of their intended public counterparts. It's possible this also causes duplication in Sphinx, which we've worked around by using automodapi at a lower level than we otherwise might be able to, though I'm not certain of that.
The recommended way to do this in Python is clear: one should use a leading underscore for symbols that are intended to be private. What our conventions currently miss is that this should apply to any module or subpackage whose symbols are fully lifted to package scope via a from X import *. Symbols with leading underscores will be automatically skipped by import *, and while it's unfortunately more difficult (and probably not worth the effort) to remove private modules from <type>.__module__ strings, the presence of underscores provide a visual cue to humans that some modules should not be considered part of the public package path of that symbol.{{}}
Because the damage has largely already been done in existing modules that should have, but lack, a leading underscore, this RFC proposes that we add leading underscores to modules and subpackages whose symbols are lifted with __init__.py in new and significantly refactored code only. Trying to add underscores to lots of old files is unnecessary churn that hurts the productivity of even developers who are not involved in the "repair" work, but this is something we've clearly been doing wrong all along, and we should start doing it right.
I don't like renaming the files with leading underscores. I now have to start looking two places when I'm scanning a list to figure out the file I'm looking for.
What are the places where we intentionally use from .submodule import * to intentionally lift symbols to elide where they are defined? (as opposed to unintentionally lifting the symbols because the code was developed in a get-it-to-work mode and never fully cleaned up)
The _all__ mechanism seems a much clearer way of documenting what we are asserting the public API to the module is. I see the tension here between
1. documenting in place: in the name of the file or the name of the method
2. documenting clearly at the module level: in _init.py all_.
(1) is more likely to result in code that is consistent with itself (whether or not it's consistent with what the developer intended), while (2) is much more discoverable to someone new to the code.