Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-795

Reorganize general purpose python utility code in utils/daf_butler/pipe_base

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Implemented
    • Resolution: Done
    • Component/s: DM, TCT
    • Labels:
      None

      Description

      To simplify dependencies of daf_butler and pipe_base on C++ packages we propose:

      • Move the C++ code out of utils and into a cpputils package.
      • Have utils depend on cpputils from an EUPS perspective for backwards compatibility during transition but have no code in utils depending on cpputils. utils itself will not depend on any other lsst packages (not base or pex_exceptions).
      • Write python version of getPackageDir to be retained in utils package (only afw uses the C++ version)
      • Move the utility code out of daf_butler and pipe_base into utils. Leave forwarding and deprecation messages behind in pipe_base – the lsst.daf.butler.core module is not meant to be public anyhow so no forwarding/deprecation is required.
      • Change license of utils to BSD three clause.
      • Move ButlerURI to entirely new (BSD3) package. Name not yet certain but likely to be butleruri, nor whether it has an "lsst." module prefix (astro_metadata_translator does not).

      Original Text

      daf_butler and pipe_base include general purpose utility code that would benefit from being factored out into separate packages. daf_butler also uses one routine from utils (it was originally in daf_butler).

      utils is not a good dependency fit for pipe_base or daf_butler because it has C++ code and therefore also brings in pex_exceptions and base. Splitting utils into a C++ package and a python package would help with this but more specific packages would be better.

      • Put logging into log_utils. Now that Task has a special LoggerAdapter it would be nice if anyone could have access to .verbose and .trace without needing pipe_base. daf_butler currently defines its own VERBOSE constant. There are also the timer log utilities in pipe_base and separate ones in daf_butler.
      • test_utils – there are a number of test routines in utils and daf_butler that could be combined.

      Should these be imported as lsst.testutils and lsst.logutils – there are issues with namespace clashes for lsst.log.utils and lsst.utils.test.

      There has also been a proposal for a pybind11_utils package from Jim Bosch.

      The doImport routine in utils also needs a place outside of utils. I would welcome suggestions for a name.

      If there is resistance to multiple small packages being created, one option is to have a new python-only utils package (pyutils ? Although import lsst.pyutils seems redundant) containing the python-only code from utils and the utility code from butler and pipe_base.

      Additionally, I would like to extract the general purpose ButlerURI code from daf_butler and put it in its own package. This code forms the basis of all the datastore abstractions in butler but is a general purpose class for reading from and writing to object stores and local files without the caller code caring about the details (it supports WebDAV, S3, local files, pkg_resources). There are places here it would be convenient to be able to use this functionality without having to depend on daf_butler. By default I would call the package butlerurifrom lsst.butleruri import ButlerURI. It should be standalone (not part of another utils package) and I intend to put it on PyPI. I would also like permission to relicense it as BSD 3 clause.

      Any functions that exist in a formal release would have deprecated wrappers added in the original location that would forward to the new name.

        Attachments

          Issue Links

            Activity

            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            Tatiana Goldina is away, but I will take a look at this from our outside-reuse viewpoint. Superficially it is certainly going in the right direction!

            Show
            gpdf Gregory Dubois-Felsmann added a comment - Tatiana Goldina is away, but I will take a look at this from our outside-reuse viewpoint. Superficially it is certainly going in the right direction!
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            Can you be more specific about what would be moved where (preferably in the form of a list)? utils is kind of a mixed bag, and I don't know if anybody knows about everything that's in there, and the references to utility code in the higher-level packages are likewise ambiguous.

            I'm not sure a pybind11_utils would be worth the effort, given that our pybind11 support at the moment is 4 C++ headers that have not been expanded in several years.

            Show
            krzys Krzysztof Findeisen added a comment - - edited Can you be more specific about what would be moved where (preferably in the form of a list)? utils is kind of a mixed bag, and I don't know if anybody knows about everything that's in there, and the references to utility code in the higher-level packages are likewise ambiguous. I'm not sure a pybind11_utils would be worth the effort, given that our pybind11 support at the moment is 4 C++ headers that have not been expanded in several years.
            Hide
            tjenness Tim Jenness added a comment - - edited

            This is the content that I am talking about specifically:

            utils

            • Python
            • deprecated (deprecate_pybind11 and suppress_deprecations)
            • doImport
            • get_caller_name
            • inheritDoc
            • test support code
            • C++ (pulls in base and pex_exceptions)
            • getPackageDir
            • demangle
            • backtrace

            daf_butler

            lsst.daf.butler.core.ButlerURI

            Abstracted access to file resources. Automatically understands what to do with file:, s3:, https: and also python pkg_resources.
            Would be useful outside of butler and would be even more useful if it was on pypi and had BSD license.

            lsst.daf.butler.core.utils

            • safeMakeDir (no longer needed now that os.makedirs takes exist_ok parameter)
            • iterable (turn argument into an iterable, protecting str)
            • allSlots (combines all slots in all parents)
            • getFullTypeName
            • getClassOf
            • getInstanceOf
            • Singleton (force class to be a singleton)
            • transactional (decorator to call `with self.transaction`) – may be too specific
            • stripIfNotNone (for stripping strings that might be None)
            • immutable (decorator for class)
            • cached_getter
            • globToRegex
            • isplit (str.split but using iterator to minimize memory footprint)

            lsst.daf.butler.core.config

            Dict-like class that supports hierarchical keys. Supports yaml and json serialization, and merging of configs.

            lsst.daf.butler.core.time_utils

            Code for converting to and from Astropy Time to integer nanoseconds and also supporting serialization of Astropy Time to YAML.

            lsst.daf.butler.core.progress

            Support for adding progress bars to command line applications.

            lsst.daf.butler.core.logging

            Logging support for Butler commands.

            • VERBOSE log level
            • ButlerMDC (allow additional content to be inserted into log records)
            • ButlerLogRecord (a LogRecord-like class that accepts MDC)
            • ButlerLogRecords (a collection of ButlerLogRecord)
            • JsonLogFormatter (log formatter to write out JSON log records)
            • ButlerLogRecordHandler (logging handler to accumulate ButlerLogRecord records)
            • PrecisionLogFormatter (log formatter that reports timestamps in milliseconds)

            pipe_base

            lsst.pipe.base.task_logging

            • Constants for TRACE and VERBOSE Logging
            • TaskLogAdapter – logger for Task that supports .trace and .verbose
            • `getLogger` equivalent that returns `TaskLogAdapter` rather than `Logger`

            lsst.pipe.base.timer

            The timer functions are completely generic and not Task-specific.

            So one option is something like:

            • Split out python from utils package to new package (name unknown: lsst.rubin.utils?).
            • Have utils forward for a time for backwards compatibility.
            • Move most of the code discussed above into that new pyutils package. (including lsst.rubin.utils.logging?)
            • Pull ButlerURI into a brand new package (not sure what people would want to call that).
            • Put lsst.rubin.utils onto pypi and allow BSD 3 clause license to allow it to be used by others.
            Show
            tjenness Tim Jenness added a comment - - edited This is the content that I am talking about specifically: utils Python deprecated (deprecate_pybind11 and suppress_deprecations) doImport get_caller_name inheritDoc test support code C++ (pulls in base and pex_exceptions) getPackageDir demangle backtrace daf_butler lsst.daf.butler.core.ButlerURI Abstracted access to file resources. Automatically understands what to do with file: , s3:, https: and also python pkg_resources. Would be useful outside of butler and would be even more useful if it was on pypi and had BSD license. lsst.daf.butler.core.utils safeMakeDir (no longer needed now that os.makedirs takes exist_ok parameter) iterable (turn argument into an iterable, protecting str) allSlots (combines all slots in all parents) getFullTypeName getClassOf getInstanceOf Singleton (force class to be a singleton) transactional (decorator to call `with self.transaction`) – may be too specific stripIfNotNone (for stripping strings that might be None) immutable (decorator for class) cached_getter globToRegex isplit (str.split but using iterator to minimize memory footprint) lsst.daf.butler.core.config Dict-like class that supports hierarchical keys. Supports yaml and json serialization, and merging of configs. lsst.daf.butler.core.time_utils Code for converting to and from Astropy Time to integer nanoseconds and also supporting serialization of Astropy Time to YAML. lsst.daf.butler.core.progress Support for adding progress bars to command line applications. lsst.daf.butler.core.logging Logging support for Butler commands. VERBOSE log level ButlerMDC (allow additional content to be inserted into log records) ButlerLogRecord (a LogRecord-like class that accepts MDC) ButlerLogRecords (a collection of ButlerLogRecord) JsonLogFormatter (log formatter to write out JSON log records) ButlerLogRecordHandler (logging handler to accumulate ButlerLogRecord records) PrecisionLogFormatter (log formatter that reports timestamps in milliseconds) pipe_base lsst.pipe.base.task_logging Constants for TRACE and VERBOSE Logging TaskLogAdapter – logger for Task that supports .trace and .verbose `getLogger` equivalent that returns `TaskLogAdapter` rather than `Logger` lsst.pipe.base.timer The timer functions are completely generic and not Task-specific. So one option is something like: Split out python from utils package to new package (name unknown: lsst.rubin.utils?). Have utils forward for a time for backwards compatibility. Move most of the code discussed above into that new pyutils package. (including lsst.rubin.utils.logging?) Pull ButlerURI into a brand new package (not sure what people would want to call that). Put lsst.rubin.utils onto pypi and allow BSD 3 clause license to allow it to be used by others.
            Hide
            mfisherlevine Merlin Fisher-Levine added a comment -

            Given discussions starting here it sounds like we should have a nice interface for easily configuring tasks instantiated in notebooks/standalone scripts to work again, as right now you need to do something along the lines of

            import logging
            logformat = "[%(levelname)s] %(name)s: %(message)s"
            logging.basicConfig(format=logformat, level=logging.INFO)

            before logging levels in the task and its subtasks are respected, and this is quite a mouthful to expect users to remember.

            Show
            mfisherlevine Merlin Fisher-Levine added a comment - Given discussions starting here it sounds like we should have a nice interface for easily configuring tasks instantiated in notebooks/standalone scripts to work again, as right now you need to do something along the lines of import logging logformat = "[%(levelname)s] %(name)s: %(message)s" logging.basicConfig(format=logformat, level=logging.INFO) before logging levels in the task and its subtasks are respected, and this is quite a mouthful to expect users to remember.
            Hide
            ktl Kian-Tat Lim added a comment -

            I dislike the name rubin_utils, especially after an initial lsst., as it seems to be claiming greater scope than is warranted.

            I suggest that it might be possible to move the C++ code in the lsst.utils package out into a new package, maybe lsst.cpputils, reserving lsst.utils for Python-only utilities. Some C++ code like Demangle and Backtrace also might belong in base instead.

            lsst.butleruri is not great as a name for a package intended for uses outside the Butler that is also about more than just URI manipulation but also retrieval. lsst.uridata might be better, but I understand that it might require significant code modification to implement.

            Show
            ktl Kian-Tat Lim added a comment - I dislike the name rubin_utils , especially after an initial lsst. , as it seems to be claiming greater scope than is warranted. I suggest that it might be possible to move the C++ code in the lsst.utils package out into a new package, maybe lsst.cpputils , reserving lsst.utils for Python-only utilities. Some C++ code like Demangle and Backtrace also might belong in base instead. lsst.butleruri is not great as a name for a package intended for uses outside the Butler that is also about more than just URI manipulation but also retrieval. lsst.uridata might be better, but I understand that it might require significant code modification to implement.
            Hide
            tjenness Tim Jenness added a comment -

            I was not thinking of putting a `lsst` in front of the butleruri package name.

            Looking at getPackageDir – the only place it is used in C++ is for some afw tests and some examples in order to find afw and afwdata. It might be worth moving that code to a private helper in afw and rewriting it in python so it could stay in utils.

            utils currently has some C++ functions for unit conversion for magnitudes (jointcal uses them) and some pybind11 helpers.

            Show
            tjenness Tim Jenness added a comment - I was not thinking of putting a `lsst` in front of the butleruri package name. Looking at getPackageDir – the only place it is used in C++ is for some afw tests and some examples in order to find afw and afwdata. It might be worth moving that code to a private helper in afw and rewriting it in python so it could stay in utils. utils currently has some C++ functions for unit conversion for magnitudes (jointcal uses them) and some pybind11 helpers.
            Hide
            tjenness Tim Jenness added a comment -

            I think we are ending up with the following proposal:

            • Move the C++ code out of utils and into a cpputils package.
            • Have utils depend on cpputils from an EUPS perspective for backwards compatibility during transition but have no code in utils depending on cpputils. utils itself will not depend on any other lsst packages (not base or pex_exceptions).
            • Write python version of getPackageDir to be retained in utils package (only afw uses the C++ version)
            • Move the utility code out of daf_butler and pipe_base into utils. Leave forwarding and deprecation messages behind in pipe_base – the lsst.daf.butler.core module is not meant to be public anyhow so no forwarding/deprecation is required.
            • Change license of utils to BSD three clause.
            • Move ButlerURI to entirely new (BSD3) package. Name not yet certain, nor whether it has an "lsst." module prefix (astro_metadata_translator does not).

            We also need to talk about lsst.base.Packages – it is used by pipe_base and ctrl_mpexec for package provenance but because it's part of base it brings in unwanted compilers/lsst dependencies. Can it be moved to lsst.utils as part of this RFC or require a different RFC?

            Show
            tjenness Tim Jenness added a comment - I think we are ending up with the following proposal: Move the C++ code out of utils and into a cpputils package. Have utils depend on cpputils from an EUPS perspective for backwards compatibility during transition but have no code in utils depending on cpputils. utils itself will not depend on any other lsst packages (not base or pex_exceptions). Write python version of getPackageDir to be retained in utils package (only afw uses the C++ version) Move the utility code out of daf_butler and pipe_base into utils. Leave forwarding and deprecation messages behind in pipe_base – the lsst.daf.butler.core module is not meant to be public anyhow so no forwarding/deprecation is required. Change license of utils to BSD three clause. Move ButlerURI to entirely new (BSD3) package. Name not yet certain, nor whether it has an "lsst." module prefix (astro_metadata_translator does not). We also need to talk about lsst.base.Packages – it is used by pipe_base and ctrl_mpexec for package provenance but because it's part of base it brings in unwanted compilers/lsst dependencies. Can it be moved to lsst.utils as part of this RFC or require a different RFC?

              People

              Assignee:
              tjenness Tim Jenness
              Reporter:
              tjenness Tim Jenness
              Watchers:
              Colin Slater, Gregory Dubois-Felsmann, Jim Bosch, John Parejko, Kian-Tat Lim, Krzysztof Findeisen, Merlin Fisher-Levine, Tatiana Goldina, Tim Jenness, Wil O'Mullane
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Planned End:

                  Jenkins

                  No builds found.