Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-13299

Make pipe_tasks/tests/nopytest_test_coadds.py valgrind-clean

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Invalid
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Team:
      External

      Description

      I've been getting segfaults when running pipe_tasks/tests/nopytest_test_coadds.py on OSX (with clang). lldb backtrace points to a python destructor, so I suspected memory problems. I ran valgrind on Linux (yes: different OS, different compiler, but I thought it might help anyway) and identified several problems in LSST code. There are also many python issues (invalid read, Conditional jump or move depends on uninitialised value, etc), but there are LSST-specific problems in ast, lsst::afw::fits, lsst::afw::table, and lsst::meas::base:: GaussianCentroid.

        Attachments

          Issue Links

            Activity

            Hide
            price Paul Price added a comment -

            Valgrind command, for posterity:

            pprice@lsst-dev01:/scratch/pprice/lsstsw/build/pipe_tasks[tickets/DM-12995] $ valgrind --log-file=valgrind.log --suppressions=$HOME/Software/valgrind-python.supp --leak-check=no python tests/nopytest_test_coadds.py 
            

            Show
            price Paul Price added a comment - Valgrind command, for posterity: pprice@lsst-dev01:/scratch/pprice/lsstsw/build/pipe_tasks[tickets/DM-12995] $ valgrind --log-file=valgrind.log --suppressions=$HOME/Software/valgrind-python.supp --leak-check=no python tests/nopytest_test_coadds.py
            Hide
            price Paul Price added a comment -

            Kian-Tat Lim points out that this isn't as bad as I first feared, since many of the references to lsst are in the python memory manager. They may be listed because I've used the py2 suppressions file with py3. There are some legitimate entries:

            ==1013148== Invalid read of size 8
            ==1013148==    at 0x4C2E02E: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1018)
            ==1013148==    by 0x920EEF72: astStore_ (memory.c:3690)
            ==1013148==    by 0x9223DEEC: Copy (unitnormmap.c:1053)
            ==1013148==    by 0x920F450D: astCopy_ (object.c:1435)
            ==1013148==    by 0x91E48A58: Copy (cmpmap.c:3921)
            ==1013148==    by 0x920F450D: astCopy_ (object.c:1435)
            ==1013148==    by 0x91BDC2D8: std::shared_ptr<ast::Mapping> ast::Object::fromAstObject<ast::Mapping>(AstObject*, bool) (Object.cc:134)
            ==1013148==    by 0x91179E64: ast::Mapping::simplify() const (Mapping.h:250)
            ==1013148==    by 0x91198814: lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint>::Transform(ast::Mapping const&, bool) (Transform.cc:48)
            ==1013148==    by 0x912373EA: construct<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint>, ast::Mapping&> (new_allocator.h:120)
            ==1013148==    by 0x912373EA: construct<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint>, ast::Mapping&> (alloc_traits.h:475)
            ==1013148==    by 0x912373EA: _Sp_counted_ptr_inplace<ast::Mapping&> (shared_ptr_base.h:520)
            ==1013148==    by 0x912373EA: __shared_count<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint>, std::allocator<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> >, ast::Mapping&> (shared_ptr_base.h:615)
            ==1013148==    by 0x912373EA: __shared_ptr<std::allocator<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> >, ast::Mapping&> (shared_ptr_base.h:1100)
            ==1013148==    by 0x912373EA: shared_ptr<std::allocator<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> >, ast::Mapping&> (shared_ptr.h:319)
            ==1013148==    by 0x912373EA: allocate_shared<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint>, std::allocator<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> >, ast::Mapping&> (shared_ptr.h:620)
            ==1013148==    by 0x912373EA: make_shared<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint>, ast::Mapping&> (shared_ptr.h:636)
            ==1013148==    by 0x912373EA: lsst::afw::geom::makeRadialTransform(std::vector<double, std::allocator<double> > const&) (transformFactory.cc:177)
            ==1013148==    by 0xA1815F3F: void pybind11::cpp_function::initialize<std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> > (*&)(std::vector<double, std::allocator<double> > const&), std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> >, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::scope, pybind11::sibling, pybind11::arg>(std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> > (*&)(std::vector<double, std::allocator<double> > const&), std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> > (*)(std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) (cast.h:1411)
            ==1013148==    by 0xA1812C11: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (pybind11.h:558)
            ==1013148==  Address 0x8b34fbc8 is 0 bytes after a block of size 40 alloc'd
            ==1013148==    at 0x4C29B83: malloc (vg_replace_malloc.c:299)
            ==1013148==    by 0x920EDCE5: astMalloc_ (memory.c:2508)
            ==1013148==    by 0x920EEF8C: astStore_ (memory.c:3687)
            ==1013148==    by 0x9223DEEC: Copy (unitnormmap.c:1053)
            ==1013148==    by 0x920F450D: astCopy_ (object.c:1435)
            ==1013148==    by 0x91BCEEFB: ast::Mapping::getInverse() const (Mapping.cc:42)
            ==1013148==    by 0x91BE045D: ast::makeRadialMapping(std::vector<double, std::allocator<double> > const&, ast::Mapping const&) (functional.cc:60)
            ==1013148==    by 0x9123739B: lsst::afw::geom::makeRadialTransform(std::vector<double, std::allocator<double> > const&) (transformFactory.cc:177)
            ==1013148==    by 0xA1815F3F: void pybind11::cpp_function::initialize<std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> > (*&)(std::vector<double, std::allocator<double> > const&), std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> >, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::scope, pybind11::sibling, pybind11::arg>(std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> > (*&)(std::vector<double, std::allocator<double> > const&), std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> > (*)(std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) (cast.h:1411)
            ==1013148==    by 0xA1812C11: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (pybind11.h:558)
            ==1013148==    by 0x4EF8301: _PyCFunction_FastCallDict (methodobject.c:231)
            ==1013148==    by 0x4F7DB8B: call_function (ceval.c:4809)
            

            and a bunch in lsst::meas::base::GaussianCentroid.

            Show
            price Paul Price added a comment - Kian-Tat Lim points out that this isn't as bad as I first feared, since many of the references to lsst are in the python memory manager. They may be listed because I've used the py2 suppressions file with py3. There are some legitimate entries: ==1013148== Invalid read of size 8 ==1013148== at 0x4C2E02E: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1018) ==1013148== by 0x920EEF72: astStore_ (memory.c:3690) ==1013148== by 0x9223DEEC: Copy (unitnormmap.c:1053) ==1013148== by 0x920F450D: astCopy_ (object.c:1435) ==1013148== by 0x91E48A58: Copy (cmpmap.c:3921) ==1013148== by 0x920F450D: astCopy_ (object.c:1435) ==1013148== by 0x91BDC2D8: std::shared_ptr<ast::Mapping> ast::Object::fromAstObject<ast::Mapping>(AstObject*, bool) (Object.cc:134) ==1013148== by 0x91179E64: ast::Mapping::simplify() const (Mapping.h:250) ==1013148== by 0x91198814: lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint>::Transform(ast::Mapping const&, bool) (Transform.cc:48) ==1013148== by 0x912373EA: construct<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint>, ast::Mapping&> (new_allocator.h:120) ==1013148== by 0x912373EA: construct<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint>, ast::Mapping&> (alloc_traits.h:475) ==1013148== by 0x912373EA: _Sp_counted_ptr_inplace<ast::Mapping&> (shared_ptr_base.h:520) ==1013148== by 0x912373EA: __shared_count<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint>, std::allocator<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> >, ast::Mapping&> (shared_ptr_base.h:615) ==1013148== by 0x912373EA: __shared_ptr<std::allocator<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> >, ast::Mapping&> (shared_ptr_base.h:1100) ==1013148== by 0x912373EA: shared_ptr<std::allocator<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> >, ast::Mapping&> (shared_ptr.h:319) ==1013148== by 0x912373EA: allocate_shared<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint>, std::allocator<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> >, ast::Mapping&> (shared_ptr.h:620) ==1013148== by 0x912373EA: make_shared<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint>, ast::Mapping&> (shared_ptr.h:636) ==1013148== by 0x912373EA: lsst::afw::geom::makeRadialTransform(std::vector<double, std::allocator<double> > const&) (transformFactory.cc:177) ==1013148== by 0xA1815F3F: void pybind11::cpp_function::initialize<std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> > (*&)(std::vector<double, std::allocator<double> > const&), std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> >, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::scope, pybind11::sibling, pybind11::arg>(std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> > (*&)(std::vector<double, std::allocator<double> > const&), std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> > (*)(std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) (cast.h:1411) ==1013148== by 0xA1812C11: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (pybind11.h:558) ==1013148== Address 0x8b34fbc8 is 0 bytes after a block of size 40 alloc'd ==1013148== at 0x4C29B83: malloc (vg_replace_malloc.c:299) ==1013148== by 0x920EDCE5: astMalloc_ (memory.c:2508) ==1013148== by 0x920EEF8C: astStore_ (memory.c:3687) ==1013148== by 0x9223DEEC: Copy (unitnormmap.c:1053) ==1013148== by 0x920F450D: astCopy_ (object.c:1435) ==1013148== by 0x91BCEEFB: ast::Mapping::getInverse() const (Mapping.cc:42) ==1013148== by 0x91BE045D: ast::makeRadialMapping(std::vector<double, std::allocator<double> > const&, ast::Mapping const&) (functional.cc:60) ==1013148== by 0x9123739B: lsst::afw::geom::makeRadialTransform(std::vector<double, std::allocator<double> > const&) (transformFactory.cc:177) ==1013148== by 0xA1815F3F: void pybind11::cpp_function::initialize<std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> > (*&)(std::vector<double, std::allocator<double> > const&), std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> >, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::scope, pybind11::sibling, pybind11::arg>(std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> > (*&)(std::vector<double, std::allocator<double> > const&), std::shared_ptr<lsst::afw::geom::Transform<lsst::afw::geom::Point2Endpoint, lsst::afw::geom::Point2Endpoint> > (*)(std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) (cast.h:1411) ==1013148== by 0xA1812C11: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (pybind11.h:558) ==1013148== by 0x4EF8301: _PyCFunction_FastCallDict (methodobject.c:231) ==1013148== by 0x4F7DB8B: call_function (ceval.c:4809) and a bunch in lsst::meas::base::GaussianCentroid .
            Hide
            rowen Russell Owen added a comment -

            AST performs its own memory management, so I wonder whether most or all of the AST-related "errors" are false alarms. Any idea how to determine that?

            Show
            rowen Russell Owen added a comment - AST performs its own memory management, so I wonder whether most or all of the AST-related "errors" are false alarms. Any idea how to determine that?
            Hide
            rowen Russell Owen added a comment -

            AST unitnormmap should be fixed in starlink_ast as of the merge of DM-13322

            Show
            rowen Russell Owen added a comment - AST unitnormmap should be fixed in starlink_ast as of the merge of DM-13322
            Hide
            price Paul Price added a comment -

            GaussianCentroid was removed (DM-13395), and Russell Owen has fixed ast independently.

            Show
            price Paul Price added a comment - GaussianCentroid was removed ( DM-13395 ), and Russell Owen has fixed ast independently.

              People

              Assignee:
              price Paul Price
              Reporter:
              price Paul Price
              Watchers:
              Jim Bosch, Paul Price, Russell Owen, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.