Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-25434

Restructure tarball paths with goal to remove miniconda version

    XMLWordPrintable

Details

    • Story
    • Status: To Do
    • Resolution: Unresolved
    • None
    • conda, lsst, lsstsw
    • None
    • Architecture
    • No

    Description

      At a minimum, the miniconda reference in the tarball path is not useful, so it should be removed.

      Currently, the paths look like these:
      https://eups.lsst.codes/stack/osx/10.9/conda-system/miniconda3-4.7.12-1a1d771/

      Which corresponds to the following template:

      stack/{os_family}/{kernel_equivalent}/{compiler}/{miniconda_version}-{environment hash}/
      

      Since an environment effectively encodes kernel equivalent, compiler and the miniconda version is irrelevant, I'd suggest dropping those from the paths to simplify things, although at least dropping miniconda_version is desirable, as that version has no impact on binary or API compatibility.

      Further restructuring

      If we were to simplify this to rely more on the environment repo and hash, we may reconsider the string to be of the following format:

      stack/{arch}/{environment_repo}@{environment_hash}/
      

      e.g.:

      stack/osx-64/github.com/lsst/scipipe_conda_env@1234abc/
      

      This form is a bit more explicit, showing a bit more information about an environment in the URL, such as the repo where that environment is located.

      Using {arch} (linux-64, osx-64) is more consistent with how conda/conda-forge describes systems as well.

      Restructuring according to system

      If we would like to proceed further, it should be noted we conda-forge will be packaging newer versions of glibc. If we want to consider that in the path, we would expect kernel equivalent should also be changed to be either 10.9 (OS X), cos6 (centos 6), cos7 (centos 7), and roughly refers to the system/glibc features that are available to packages. Like the compiler, this can effectively be namespaced by the environment hash as well.

      Currently, the compiler is currently just "conda-system", which is also fine if we are leaving in the environment hash.

      Based on this, If we were to reimagine the path so that binaries might be reused across conda environments, we might choose something similar to the following:

      /stack/{arch}/{kernel_equivalent}/{compiler_equivalent}/
      

      Attachments

        Issue Links

          Activity

            ktl This is basically saying we will modify the paths for EUPS_PKGROOT for now, with newinstall the prime consumer (and hopefully lsst_build at sometime in the future) sorry if that wasn't clear.

            tjennessThe hash was in there historically, supposedly for the reason of matching binary builds (probably more for python versioning reasons). By not breaking it down systematically we end up with duplication of binary builds. I think the hash was added in DM-9526.

            gcomoretto One of options is similar to conda-style, this is close, but with a bit more information in the URL:

            /stack/{arch}/{kernel_equivalent}/{compiler_equivalent}/
            

            I'd also note that conda has noarch for non-specific packages (e.g. pure python), I think conda tends to rely on the channel's metadata files to ensure things are compatible otherwise.

            bvan Brian Van Klaveren added a comment - ktl This is basically saying we will modify the paths for EUPS_PKGROOT for now, with newinstall the prime consumer (and hopefully lsst_build at sometime in the future) sorry if that wasn't clear. tjenness The hash was in there historically, supposedly for the reason of matching binary builds (probably more for python versioning reasons). By not breaking it down systematically we end up with duplication of binary builds. I think the hash was added in DM-9526 . gcomoretto One of options is similar to conda-style, this is close, but with a bit more information in the URL: /stack/{arch}/{kernel_equivalent}/{compiler_equivalent}/ I'd also note that conda has noarch for non-specific packages (e.g. pure python), I think conda tends to rely on the channel's metadata files to ensure things are compatible otherwise.

            Compatibilities should be defined as dependency information.

            If I think at anaconda approach, I would expect the package name in the directory structure.

            I wonder if this is a limitation of using eups.

            I would expect something like:

            /stack/{pkg_name}/{version or build unique id}/{arch}/

            But at this point, I would suggest something like this (close to what maven/nexus does):

            /stack/DM/{SWProduct}/{pkg_name}/{version or build unique id}/{arch}/

            but this implies everybody agreed/accepted the products in the product tree.

            It is quite disturbing to me, as a configuration engineer, to see all packages and all versions in a single folder.

            gcomoretto Gabriele Comoretto [X] (Inactive) added a comment - Compatibilities should be defined as dependency information. If I think at anaconda approach, I would expect the package name in the directory structure. I wonder if this is a limitation of using eups. I would expect something like: /stack/{pkg_name}/{version or build unique id}/{arch}/ But at this point, I would suggest something like this (close to what maven/nexus does): /stack/DM/{SWProduct}/{pkg_name}/{version or build unique id}/{arch}/ but this implies everybody agreed/accepted the products in the product tree. It is quite disturbing to me, as a configuration engineer, to see all packages and all versions in a single folder.

            I realize I may have forgotten to mention that the names of the artifacts will have the same information they currently have,

            e.g.
            https://eups.lsst.codes/stack/redhat/el7/conda-system/miniconda3-4.7.12-1a1d771/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz
            https://eups.lsst.codes/stack/osx/10.9/conda-system/miniconda3-4.7.12-1a1d771/afw-19.0.0-26-g4476391b4+1@DarwinX86.tar.gz

            Which is roughly based on commit/version. That's more or less how conda does it as well, for example:

            https://conda.anaconda.org/conda-forge/linux-64/pytest-5.4.2-py37hc8dfbb8_0.tar.bz2

            Compare to yum format:
            Compare to the yum format:
            http://mirror.centos.org/centos/7/os/x86_64/Packages/gcc-4.8.5-39.el7.x86_64.rpm
            (with a noarch)
            http://mirror.centos.org/centos/7/os/x86_64/Packages/pytest-2.7.0-2.el7.noarch.rpm

            Alt arch (ppc64le):
            http://mirror.centos.org/altarch/7/os/ppc64le/Packages/gcc-4.8.5-39.el7.ppc64le.rpm
            (with a noarch)
            http://mirror.centos.org/altarch/7/os/ppc64le/Packages/pytest-2.7.0-2.el7.noarch.rpm

            In examining the full eups URL, for the current paths, you can see how much information is effectively duplicated.

            To propose a few variants:

            Variant 1:
            https://eups.lsst.codes/stack/linux-64/1a1d771/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz
            https://eups.lsst.codes/stack/osx-64/1a1d771/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz

            Variant 2 (without compiler toolset):
            https://eups.lsst.codes/stack/linux-64/cos6/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz
            https://eups.lsst.codes/stack/osx-64/10.9/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz

            Variant 3 (with conda toolset internal name):
            https://eups.lsst.codes/stack/linux-64/cos7/comp7/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz
            https://eups.lsst.codes/stack/osx-64/10.9/comp7/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz

            I'm happy with Variant 1 for now, but I think variant 3 is the most future-proof.

            I'm not opposed to a maven-style, but I'd note that we don't define the our packages with any kind of equivalent groupId (namespace), and while it could almost always be lsst or possibly lsst.meas, you are proposing the groupId should be based on the SWProduct. I think that has two issues - in maven there's nearly always a 1:1 relationship between the groupId and the package import path/namespace in the code, and I think it would require building a new metadata file to be defined in repositories to declare which software product they are part of (pom.xml equivalent) and some tooling built around that file, so this ticket would be blocked by that work.

            bvan Brian Van Klaveren added a comment - I realize I may have forgotten to mention that the names of the artifacts will have the same information they currently have, e.g. https://eups.lsst.codes/stack/redhat/el7/conda-system/miniconda3-4.7.12-1a1d771/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz https://eups.lsst.codes/stack/osx/10.9/conda-system/miniconda3-4.7.12-1a1d771/afw-19.0.0-26-g4476391b4+1@DarwinX86.tar.gz Which is roughly based on commit/version. That's more or less how conda does it as well, for example: https://conda.anaconda.org/conda-forge/linux-64/pytest-5.4.2-py37hc8dfbb8_0.tar.bz2 Compare to yum format: Compare to the yum format: http://mirror.centos.org/centos/7/os/x86_64/Packages/gcc-4.8.5-39.el7.x86_64.rpm (with a noarch) http://mirror.centos.org/centos/7/os/x86_64/Packages/pytest-2.7.0-2.el7.noarch.rpm Alt arch (ppc64le): http://mirror.centos.org/altarch/7/os/ppc64le/Packages/gcc-4.8.5-39.el7.ppc64le.rpm (with a noarch) http://mirror.centos.org/altarch/7/os/ppc64le/Packages/pytest-2.7.0-2.el7.noarch.rpm In examining the full eups URL, for the current paths, you can see how much information is effectively duplicated. To propose a few variants: Variant 1: https://eups.lsst.codes/stack/linux-64/1a1d771/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz https://eups.lsst.codes/stack/osx-64/1a1d771/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz Variant 2 (without compiler toolset): https://eups.lsst.codes/stack/linux-64/cos6/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz https://eups.lsst.codes/stack/osx-64/10.9/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz Variant 3 (with conda toolset internal name): https://eups.lsst.codes/stack/linux-64/cos7/comp7/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz https://eups.lsst.codes/stack/osx-64/10.9/comp7/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz I'm happy with Variant 1 for now, but I think variant 3 is the most future-proof. I'm not opposed to a maven-style, but I'd note that we don't define the our packages with any kind of equivalent groupId (namespace), and while it could almost always be lsst or possibly lsst.meas , you are proposing the groupId should be based on the SWProduct. I think that has two issues - in maven there's nearly always a 1:1 relationship between the groupId and the package import path/namespace in the code, and I think it would require building a new metadata file to be defined in repositories to declare which software product they are part of (pom.xml equivalent) and some tooling built around that file, so this ticket would be blocked by that work.

            Having namespaces is probably premature, since we still don't have a consolidated SW product tree.

            However, the package itself is uniquely defined. What are the technical issues/problems to have a path like this:

            https://eups.lsst.codes/stack/afw/osx-64/10.9/comp7/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz

             

            gcomoretto Gabriele Comoretto [X] (Inactive) added a comment - Having namespaces is probably premature, since we still don't have a consolidated SW product tree. However, the package itself is uniquely defined. What are the technical issues/problems to have a path like this: https://eups.lsst.codes/stack/ afw /osx-64/10.9/comp7/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz  

            Doing that will break EUPS_PKGROOT

            bvan Brian Van Klaveren added a comment - Doing that will break EUPS_PKGROOT

            People

              Unassigned Unassigned
              bvan Brian Van Klaveren
              Brian Van Klaveren, Chris Walter, Gabriele Comoretto [X] (Inactive), Heather Kelly, Kian-Tat Lim, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Jenkins

                  No builds found.