Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-33622

Add support for numexpr to disable implicit threading

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: utils
    • Labels:
      None
    • Story Points:
      2
    • Team:
      Architecture
    • Urgent?:
      No

      Description

      Running `pipetask run -j 24` on an interactive node gave a warning saying

      numexpr.utils ()(utils.py:147) - Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.

      it's not clear if that will be a problem, assigning 8 threads per core.

      We should check that's not the case, and either way, set the value to one that doesn't raise a scary warning for the average user.

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            Looking at the numexpr code it looks like this warning is issued and then numexpr falls back to using OMP_NUM_THREADS which we do explicitly set to 1.

            The fix is to also set NUMEXP_MAX_THREADS in base disableImplicitThreading.

            Show
            tjenness Tim Jenness added a comment - Looking at the numexpr code it looks like this warning is issued and then numexpr falls back to using OMP_NUM_THREADS which we do explicitly set to 1. The fix is to also set NUMEXP_MAX_THREADS in base disableImplicitThreading.
            Hide
            tjenness Tim Jenness added a comment -

            The numexpr warning can be triggered just by importing pandas (eg from meas_base):

            >>> import lsst.meas.base
              File "/Users/timj/work/lsstsw3/stack/lsst-scipipe-1.0.0/Darwin/meas_base/gfc624380f7+2d59205392/python/lsst/meas/base/__init__.py", line 55, in <module>
                from .diaCalculation import *
              File "/Users/timj/work/lsstsw3/stack/lsst-scipipe-1.0.0/Darwin/meas_base/gfc624380f7+2d59205392/python/lsst/meas/base/diaCalculation.py", line 24, in <module>
                import pandas as pd
              File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/__init__.py", line 48, in <module>
                from pandas.core.api import (
              File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/api.py", line 29, in <module>
                from pandas.core.arrays import Categorical
              File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/arrays/__init__.py", line 1, in <module>
                from pandas.core.arrays.base import (
              File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/arrays/base.py", line 68, in <module>
                from pandas.core import (
              File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/arraylike.py", line 21, in <module>
                from pandas.core.ops.common import unpack_zerodim_and_defer
              File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/ops/__init__.py", line 33, in <module>
                from pandas.core.ops.array_ops import (  # noqa:F401
              File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 48, in <module>
                import pandas.core.computation.expressions as expressions
              File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/computation/expressions.py", line 19, in <module>
                from pandas.core.computation.check import NUMEXPR_INSTALLED
              File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/computation/check.py", line 3, in <module>
                ne = import_optional_dependency("numexpr", errors="warn")
              File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/compat/_optional.py", line 126, in import_optional_dependency
                module = importlib.import_module(name)
              File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/importlib/__init__.py", line 127, in import_module
                return _bootstrap._gcd_import(name[level:], package, level)
              File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/numexpr/__init__.py", line 44, in <module>
                nthreads = _init_num_threads()
            

            It seems like lsst.base.disableImplicitThreading() does nothing on my laptop (it returns False) so I can't see whether setting OMP_NUM_THREADS would work.

            Regardless, we would have to set the environment variable before pandas is imported.

            Show
            tjenness Tim Jenness added a comment - The numexpr warning can be triggered just by importing pandas (eg from meas_base): >>> import lsst.meas.base File "/Users/timj/work/lsstsw3/stack/lsst-scipipe-1.0.0/Darwin/meas_base/gfc624380f7+2d59205392/python/lsst/meas/base/__init__.py" , line 55 , in <module> from .diaCalculation import * File "/Users/timj/work/lsstsw3/stack/lsst-scipipe-1.0.0/Darwin/meas_base/gfc624380f7+2d59205392/python/lsst/meas/base/diaCalculation.py" , line 24 , in <module> import pandas as pd File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/__init__.py" , line 48 , in <module> from pandas.core.api import ( File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/api.py" , line 29 , in <module> from pandas.core.arrays import Categorical File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/arrays/__init__.py" , line 1 , in <module> from pandas.core.arrays.base import ( File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/arrays/base.py" , line 68 , in <module> from pandas.core import ( File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/arraylike.py" , line 21 , in <module> from pandas.core.ops.common import unpack_zerodim_and_defer File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/ops/__init__.py" , line 33 , in <module> from pandas.core.ops.array_ops import ( # noqa:F401 File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/ops/array_ops.py" , line 48 , in <module> import pandas.core.computation.expressions as expressions File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/computation/expressions.py" , line 19 , in <module> from pandas.core.computation.check import NUMEXPR_INSTALLED File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/core/computation/check.py" , line 3 , in <module> ne = import_optional_dependency( "numexpr" , errors = "warn" ) File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/pandas/compat/_optional.py" , line 126 , in import_optional_dependency module = importlib.import_module(name) File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/importlib/__init__.py" , line 127 , in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/Users/timj/work/lsstsw3/miniconda/envs/lsst-scipipe-1.0.0/lib/python3.8/site-packages/numexpr/__init__.py" , line 44 , in <module> nthreads = _init_num_threads() It seems like lsst.base.disableImplicitThreading() does nothing on my laptop (it returns False) so I can't see whether setting OMP_NUM_THREADS would work. Regardless, we would have to set the environment variable before pandas is imported.
            Hide
            tjenness Tim Jenness added a comment -

            Also, there is an API to set the number of threads but simply importing that API triggers the thread-setting code which issues the INFO message.

            Show
            tjenness Tim Jenness added a comment - Also, there is an API to set the number of threads but simply importing that API triggers the thread-setting code which issues the INFO message.
            Hide
            tjenness Tim Jenness added a comment -

            I've added some code to utils.threads that can make this go away if called in the right place. The problem is that I think ctrl_mpexec has to do this very early on. Currently it does it just before it's about to run a multi-processing job and that is far too late for pandas import. It seems wrong to disable it when we are building a qgraph without threads but the pipelines importing pandas will trigger the problem.

            Maybe the command line tools should force the numexpr logger to WARNING by default?

            Show
            tjenness Tim Jenness added a comment - I've added some code to utils.threads that can make this go away if called in the right place. The problem is that I think ctrl_mpexec has to do this very early on. Currently it does it just before it's about to run a multi-processing job and that is far too late for pandas import. It seems wrong to disable it when we are building a qgraph without threads but the pipelines importing pandas will trigger the problem. Maybe the command line tools should force the numexpr logger to WARNING by default?
            Hide
            tjenness Tim Jenness added a comment -

            This creates a new disable_implicit_threading function in utils and uses threadpoolctl to force the thread count to 1 and also explicitly forces numexpr to do the same.

            I have modified ctrl_mpexec to use the new routine (removing the base dependency).

            Show
            tjenness Tim Jenness added a comment - This creates a new disable_implicit_threading function in utils and uses threadpoolctl to force the thread count to 1 and also explicitly forces numexpr to do the same. I have modified ctrl_mpexec to use the new routine (removing the base dependency).
            Hide
            salnikov Andy Salnikov added a comment -

            Looks good.

            Show
            salnikov Andy Salnikov added a comment - Looks good.

              People

              Assignee:
              tjenness Tim Jenness
              Reporter:
              mfisherlevine Merlin Fisher-Levine
              Reviewers:
              Andy Salnikov
              Watchers:
              Andy Salnikov, Jim Bosch, Lee Kelvin, Merlin Fisher-Levine, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.