Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-5105

new conda 'mkl' dependent packages break meas_base tests

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Continuous Integration
    • Labels:
      None

      Description

      Continuum release/rebuilt a number of packages last friday to depend on the the Intel MKL library.

      https://www.continuum.io/blog/developer-blog/anaconda-25-release-now-mkl-optimizations

      There are [new feature named] versions that continue to use openblas but the MKL versions appear to be installed by default. This causes at least multiple meas_base tests to fail.After extensive testing, I have confirmed that the meas_base tests do not fail with the equivalent 'nomkl' package.
      In addition, mkl is closed source software that requires you to accept and download a license file or it is time-bombed to stop working after a trial period.

          docker-centos-7: [ 36/36 ]  meas_base 2015_10.0-9-g6daf04b+7 ...
          docker-centos-7:
          docker-centos-7: ***** error: from /opt/lsst/software/stack/EupsBuildDir/Linux64/meas_base-2015_10.0-9-g6daf04b+7/build.log:
          docker-centos-7: tests/sincPhotSums.py
          docker-centos-7:
          docker-centos-7: tests/measureSources.py
          docker-centos-7:
          docker-centos-7: tests/testApertureFlux.py
          docker-centos-7:
          docker-centos-7: tests/testJacobian.py
          docker-centos-7:
          docker-centos-7: tests/testScaledApertureFlux.py
          docker-centos-7:
          docker-centos-7: The following tests failed:
          docker-centos-7: /opt/lsst/software/stack/EupsBuildDir/Linux64/meas_base-2015_10.0-9-g6daf04b+7/meas_base-2015_10.0-9-g6daf04b+7/tests/.tests/sincPhotSums.py.failed
          docker-centos-7: /opt/lsst/software/stack/EupsBuildDir/Linux64/meas_base-2015_10.0-9-g6daf04b+7/meas_base-2015_10.0-9-g6daf04b+7/tests/.tests/measureSources.py.failed
          docker-centos-7: /opt/lsst/software/stack/EupsBuildDir/Linux64/meas_base-2015_10.0-9-g6daf04b+7/meas_base-2015_10.0-9-g6daf04b+7/tests/.tests/testApertureFlux.py.failed
          docker-centos-7: /opt/lsst/software/stack/EupsBuildDir/Linux64/meas_base-2015_10.0-9-g6daf04b+7/meas_base-2015_10.0-9-g6daf04b+7/tests/.tests/testJacobian.py.failed
          docker-centos-7: /opt/lsst/software/stack/EupsBuildDir/Linux64/meas_base-2015_10.0-9-g6daf04b+7/meas_base-2015_10.0-9-g6daf04b+7/tests/.tests/testScaledApertureFlux.py.failed
          docker-centos-7: 5 tests failed
      

      (the exact cause of the test failures was not investigated as this should not have happened)

      This change has also broken the ability to import an existing conda env from 2016-02-05 or earlier that uses scipy due to some sort of package version resolution problem. Explicit declaring it as the scipy package without mkl fixes the resolution problem.

      There is a new 'nomkl' package, when installed, any subsequent package installations will default to versions without mkl. However, this does not fix any already installed packages.

      I am traumatized by the lack of reproducible build envs even within a few days of each other. After discussion in the Tucson office, I'm going to pin the lsstsw and newinstall.sh conda package versions with a commitment from square to update them on a monthly basis. I already have a test version of lsstsw/bin/deploy that defaults to a bundled package but with a option flag to use bleeding edge.

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            Continuum releasing their speed optimizations to the public seems like a great thing to me Why wouldn't we want our code to be faster? At least they indicate how to disable threading (and I guarantee you that end users will be using MKL anaconda so we really should be supporting it).

            Show
            tjenness Tim Jenness added a comment - Continuum releasing their speed optimizations to the public seems like a great thing to me Why wouldn't we want our code to be faster? At least they indicate how to disable threading (and I guarantee you that end users will be using MKL anaconda so we really should be supporting it).
            Hide
            mjuric Mario Juric added a comment -

            Josh, +1 on the pinning for reproducibility, but the fact our code crashes with a differently built numpy may indicate a problem on our end. Just like when two compilers reveal nasty bugs and unwarranted assumptions, this may be telling us something similar.

            Show
            mjuric Mario Juric added a comment - Josh, +1 on the pinning for reproducibility, but the fact our code crashes with a differently built numpy may indicate a problem on our end. Just like when two compilers reveal nasty bugs and unwarranted assumptions, this may be telling us something similar.
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            Mario Juric I'm still internally debating if we should pin just for the build slaves/binary release builds and let end users act as a canary or pin everything by default. I'm leaning towards the later, with a -b flag (bleed) to disable it, at least initially, because I fear user frustration. What do you think?

            Show
            jhoblitt Joshua Hoblitt added a comment - Mario Juric I'm still internally debating if we should pin just for the build slaves/binary release builds and let end users act as a canary or pin everything by default. I'm leaning towards the later, with a -b flag (bleed) to disable it, at least initially, because I fear user frustration. What do you think?
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            I'm not dismissing that it may be a coding error that's being exposed but this isn't time I've been bitten by builds breaking on the scale of a few days. I don't even want to think about trying to build something from a year ago at this point...

            Show
            jhoblitt Joshua Hoblitt added a comment - I'm not dismissing that it may be a coding error that's being exposed but this isn't time I've been bitten by builds breaking on the scale of a few days. I don't even want to think about trying to build something from a year ago at this point...
            Hide
            tjenness Tim Jenness added a comment -

            Just in case it wasn't clear, I've actually been using MKL Anaconda for months now on my Mac (it never actually expired, it just kept telling me it was about to).

            Show
            tjenness Tim Jenness added a comment - Just in case it wasn't clear, I've actually been using MKL Anaconda for months now on my Mac (it never actually expired, it just kept telling me it was about to).
            Hide
            mjuric Mario Juric added a comment -

            Re pinning, sounds OK to me (in a perfect world, we'd have both). Is there a (not exceedingly difficult) way to pin to a particular set of versions that a some release of Anaconda uses? Most users just download Anaconda and never bother to 'conda update', so that may cover a large number of people out there.

            (off to teach & then some meetings, will disappear for a few hours).

            Show
            mjuric Mario Juric added a comment - Re pinning, sounds OK to me (in a perfect world, we'd have both). Is there a (not exceedingly difficult) way to pin to a particular set of versions that a some release of Anaconda uses? Most users just download Anaconda and never bother to 'conda update', so that may cover a large number of people out there. (off to teach & then some meetings, will disappear for a few hours).
            Hide
            frossie Frossie Economou added a comment -

            We've reached a compromise, we'll pin for the CI, and let float up for the monthlies. We'll try it for a bit see how it works.

            Show
            frossie Frossie Economou added a comment - We've reached a compromise, we'll pin for the CI, and let float up for the monthlies. We'll try it for a bit see how it works.
            Hide
            tjenness Tim Jenness added a comment - - edited

            Dominique Boutigny reports the same problem but he has tracked down the segv:

            Program received signal SIGSEGV, Segmentation fault.
            0x00007ffff4703e00 in fftw_execute ()
               from /home/boutigny/LSST/lsstsw/miniconda/lib/python2.7/site-packages/numpy/core/../../../../libmkl_intel_lp64.so
            

            so maybe an interaction between MKL and our FFTW library.

            Does Linux anaconda ship its own libfftw? (it does not on Mac but on Mac my meas_base builds are fine).

            Show
            tjenness Tim Jenness added a comment - - edited Dominique Boutigny reports the same problem but he has tracked down the segv: Program received signal SIGSEGV, Segmentation fault. 0x00007ffff4703e00 in fftw_execute ()    from /home/boutigny/LSST/lsstsw/miniconda/lib/python2.7/site-packages/numpy/core/../../../../libmkl_intel_lp64.so so maybe an interaction between MKL and our FFTW library. Does Linux anaconda ship its own libfftw ? (it does not on Mac but on Mac my meas_base builds are fine).
            Hide
            jhoblitt Joshua Hoblitt added a comment - - edited

            Tim JennessThere's no conda package named fftw that gets pulled in as a dep and find doesn't turn up anything that looks like a fftw shared object.

            Show
            jhoblitt Joshua Hoblitt added a comment - - edited Tim Jenness There's no conda package named fftw that gets pulled in as a dep and find doesn't turn up anything that looks like a fftw shared object.
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            I've opened a PR on lsstsw that pins the conda/pip package versions along with a -b flag (bleed...) that disables the pinning. The travis config has been updated to run deploy both with and without this flag and to displace a diff on the conda package spec.

            newinstall.sh still needs to be modified but I wanted to get feedback on this approach before continuing.

            Show
            jhoblitt Joshua Hoblitt added a comment - I've opened a PR on lsstsw that pins the conda/pip package versions along with a -b flag (bleed...) that disables the pinning. The travis config has been updated to run deploy both with and without this flag and to displace a diff on the conda package spec. newinstall.sh still needs to be modified but I wanted to get feedback on this approach before continuing.
            Hide
            tjenness Tim Jenness added a comment - - edited

            To summarize the discussion on DM-5123: MKL has its own implementations of fftw/lapack/blas and the linker gets very confused when a separate libfftw has also been loaded.

            Show
            tjenness Tim Jenness added a comment - - edited To summarize the discussion on DM-5123 : MKL has its own implementations of fftw/lapack/blas and the linker gets very confused when a separate libfftw has also been loaded.
            Hide
            tjenness Tim Jenness added a comment - - edited

            Optionally pinning like this seems to be fine. Minor comments on PR.

            Show
            tjenness Tim Jenness added a comment - - edited Optionally pinning like this seems to be fine. Minor comments on PR.
            Hide
            jhoblitt Joshua Hoblitt added a comment - - edited

            Tim Jenness's comments on the lsstsw PR have been addressed. A new PR has been opened on miniconda2 which downloads and applies the conda package spec from the lsstsw repo. https://github.com/lsst/miniconda2/pull/3 Note that the download URL needs to be updated before being merged. There is a published miniconda2 package tagged as 'newinstall-testing'.

            Show
            jhoblitt Joshua Hoblitt added a comment - - edited Tim Jenness 's comments on the lsstsw PR have been addressed. A new PR has been opened on miniconda2 which downloads and applies the conda package spec from the lsstsw repo. https://github.com/lsst/miniconda2/pull/3 Note that the download URL needs to be updated before being merged. There is a published miniconda2 package tagged as 'newinstall-testing'.
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            I've updated the miniconda2 PR to be have the correct URL for merge to master. Tim Jenness am I clear to go ahead and merge? I want to get this out before an end-user gets caught.

            Show
            jhoblitt Joshua Hoblitt added a comment - I've updated the miniconda2 PR to be have the correct URL for merge to master. Tim Jenness am I clear to go ahead and merge? I want to get this out before an end-user gets caught.
            Hide
            tjenness Tim Jenness added a comment -

            Ship it.

            Show
            tjenness Tim Jenness added a comment - Ship it.
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            Merged. A 3.19.0.lsst1 tag has been added to miniconda2.

            Show
            jhoblitt Joshua Hoblitt added a comment - Merged. A 3.19.0.lsst1 tag has been added to miniconda2 .
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            miniconda2-3.19.0.lsst1 has been published from b1913.

            Show
            jhoblitt Joshua Hoblitt added a comment - miniconda2-3.19.0.lsst1 has been published from b1913 .
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            I'm going to leave this ticket open until a sanity check build of lsst_distrib via newinstall.sh from master has completed.

            Show
            jhoblitt Joshua Hoblitt added a comment - I'm going to leave this ticket open until a sanity check build of lsst_distrib via newinstall.sh from master has completed.
            Hide
            jhoblitt Joshua Hoblitt added a comment - - edited

            I've opened a PR to bump the version of miniconda2 in newinstall.sh; will self merge as soon as travis finishes.

            Show
            jhoblitt Joshua Hoblitt added a comment - - edited I've opened a PR to bump the version of miniconda2 in newinstall.sh ; will self merge as soon as travis finishes.
            Hide
            jhoblitt Joshua Hoblitt added a comment - - edited

            Grrrrr. Looks like I broke the PATH construction in miniconda2. (I have no idea why this worked in the pre-merge testing)

            Show
            jhoblitt Joshua Hoblitt added a comment - - edited Grrrrr. Looks like I broke the PATH construction in miniconda2 . (I have no idea why this worked in the pre-merge testing)
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            Published miniconda2-3.19.0.lsst2 from b1914 with a one liner fix for the path construction.

            Show
            jhoblitt Joshua Hoblitt added a comment - Published miniconda2-3.19.0.lsst2 from b1914 with a one liner fix for the path construction.
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            It's confirmed, w_2016_05 now builds from a clean sandbox with newinstall.sh.

            Show
            jhoblitt Joshua Hoblitt added a comment - It's confirmed, w_2016_05 now builds from a clean sandbox with newinstall.sh .

              People

              Assignee:
              jhoblitt Joshua Hoblitt
              Reporter:
              jhoblitt Joshua Hoblitt
              Reviewers:
              Mario Juric, Tim Jenness
              Watchers:
              Frossie Economou, Jonathan Sick, Joshua Hoblitt, Kian-Tat Lim, Mario Juric, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: