Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-653

Use underscores in module names whose symbols are entirely lifted in new code

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Adopted
    • Resolution: Unresolved
    • Component/s: DM
    • Labels:
      None

      Description

      When a package-level __init__.py file lifts all symbols from a subpackage (not just a simple, one-file module) with from .subpackage import * , the symbols lifted will include not just symbols explicitly imported in the subpackage's __init__.py, but any modules in the subpackage that do not start with a leading underscore.

      (See a recent discussion on slack as an example).

      This is rarely a serious problem, but I'd maintain that it's always at least a minor one: our publicly-exported symbols include many that were intended to be private, and most of our classes and functions have at least two (often more) valid import paths, with no clear indication which is preferred (e.g. lsst.foo.SomeThing or lsst.foo.someThing.SomeThing).  The result is a proliferation of names that refer to the same thing, and more fragile code as names that include implementation details are used instead of their intended public counterparts.  It's possible this also causes duplication in Sphinx, which we've worked around by using automodapi at a lower level than we otherwise might be able to, though I'm not certain of that.

      The recommended way to do this in Python is clear: one should use a leading underscore for symbols that are intended to be private.  What our conventions currently miss is that this should apply to any module or subpackage whose symbols are fully lifted to package scope via a from X import *. Symbols with leading underscores will be automatically skipped by import *, and while it's unfortunately more difficult (and probably not worth the effort) to remove private modules from <type>.__module__ strings, the presence of underscores provide a visual cue to humans that some modules should not be considered part of the public package path of that symbol.{{}}

      Because the damage has largely already been done in existing modules that should have, but lack, a leading underscore, this RFC proposes that we add leading underscores to modules and subpackages whose symbols are lifted with __init__.py in new and significantly refactored code only.  Trying to add underscores to lots of old files is unnecessary churn that hurts the productivity of even developers who are not involved in the "repair" work, but this is something we've clearly been doing wrong all along, and we should start doing it right.

        Attachments

          Issue Links

            Activity

            Hide
            wmwood-vasey Michael Wood-Vasey added a comment - - edited

            I don't like renaming the files with leading underscores. I now have to start looking two places when I'm scanning a list to figure out the file I'm looking for.

            What are the places where we intentionally use from .submodule import * to intentionally lift symbols to elide where they are defined? (as opposed to unintentionally lifting the symbols because the code was developed in a get-it-to-work mode and never fully cleaned up)

            The _all__ mechanism seems a much clearer way of documenting what we are asserting the public API to the module is. I see the tension here between
            1. documenting in place: in the name of the file or the name of the method
            2. documenting clearly at the module level: in _init.py all_.

            (1) is more likely to result in code that is consistent with itself (whether or not it's consistent with what the developer intended), while (2) is much more discoverable to someone new to the code.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - - edited I don't like renaming the files with leading underscores. I now have to start looking two places when I'm scanning a list to figure out the file I'm looking for. What are the places where we intentionally use from .submodule import * to intentionally lift symbols to elide where they are defined? (as opposed to unintentionally lifting the symbols because the code was developed in a get-it-to-work mode and never fully cleaned up) The _ all __ mechanism seems a much clearer way of documenting what we are asserting the public API to the module is. I see the tension here between 1. documenting in place: in the name of the file or the name of the method 2. documenting clearly at the module level: in _ init.py all _ . (1) is more likely to result in code that is consistent with itself (whether or not it's consistent with what the developer intended ), while (2) is much more discoverable to someone new to the code.
            Hide
            jbosch Jim Bosch added a comment -

            What are the places where we intentionally use from .submodule import * to intentionally lift symbols to elide where they are defined? (as opposed to unintentionally lifting the symbols because the code was developed in a get-it-to-work mode and never fully cleaned up)

            The only major package that I can think of where we don't mostly lift packages to module scope is pipe_tasks. And I don't think any of that was the result of being in a hurry and not following through; our policy has always been to lift most symbols to common level (usually lsst.x.y) after implementing them at a level (or, rarely, two) below that.

            If I was starting from scratch, I might want to give using __all__ in __init__.py a try instead (or in addition) to using more leading underscores. It'd certainly make us think more about what symbols we consider public to a module or package's interface and what symbols we consider private, and I think that's a good thing. But there's essentially no way to do that incrementally in any existing package: any new __all__ in an existing __init__.py would have to start by listing all of the symbols in all of the existing modules that package does from X import * on.

            Show
            jbosch Jim Bosch added a comment - What are the places where we intentionally use from .submodule import * to intentionally lift symbols to elide where they are defined? (as opposed to unintentionally lifting the symbols because the code was developed in a get-it-to-work mode and never fully cleaned up) The only major package that I can think of where we don't mostly lift packages to module scope is pipe_tasks. And I don't think any of that was the result of being in a hurry and not following through; our policy has always been to lift most symbols to common level (usually lsst.x.y) after implementing them at a level (or, rarely, two) below that. If I was starting from scratch, I might want to give using __all__ in __init__.py a try instead (or in addition) to using more leading underscores. It'd certainly make us think more about what symbols we consider public to a module or package's interface and what symbols we consider private, and I think that's a good thing. But there's essentially no way to do that incrementally in any existing package: any new __all__ in an existing __init__.py would have to start by listing all of the symbols in all of the existing modules that package does from X import * on.
            Hide
            ktl Kian-Tat Lim added a comment -

            I think Jim has made reasonable arguments here, although I still think the vast majority of the Python ecosystem either lives with the duplication that he can't stand or avoids it by other means. In any case, I think we're ready for concrete changes to the Dev Guide.

            Show
            ktl Kian-Tat Lim added a comment - I think Jim has made reasonable arguments here, although I still think the vast majority of the Python ecosystem either lives with the duplication that he can't stand or avoids it by other means. In any case, I think we're ready for concrete changes to the Dev Guide.
            Hide
            jbosch Jim Bosch added a comment -

            While I still like this idea (for new code!), the lukewarm reception here has made me wary of just adopting this RFC.

            How about a dev guide section that explains the problem, notes both the leading-underscore solution and the __all__ in __init__.py solution, and says developers MAY apply one of those if they wish to discourage direct imports from a lower-level module.

            Show
            jbosch Jim Bosch added a comment - While I still like this idea (for new code!), the lukewarm reception here has made me wary of just adopting this RFC. How about a dev guide section that explains the problem, notes both the leading-underscore solution and the __all__ in __init__.py solution, and says developers MAY apply one of those if they wish to discourage direct imports from a lower-level module.
            Hide
            jbosch Jim Bosch added a comment -

            Adopted with an intent to describe the problem and solutions in the dev guide, while only making "MAY"-level recommendations.

            Show
            jbosch Jim Bosch added a comment - Adopted with an intent to describe the problem and solutions in the dev guide, while only making "MAY"-level recommendations.

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Watchers:
              Gabriele Comoretto [X] (Inactive), Jim Bosch, John Swinbank, Jonathan Sick, Kian-Tat Lim, Michael Wood-Vasey, Russell Owen, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Planned End:

                  Jenkins

                  No builds found.