Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-25434

Restructure tarball paths with goal to remove miniconda version

    XMLWordPrintable

    Details

    • Type: Story
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: conda, lsst, lsstsw
    • Labels:
      None
    • Team:
      Architecture
    • Urgent?:
      No

      Description

      At a minimum, the miniconda reference in the tarball path is not useful, so it should be removed.

      Currently, the paths look like these:
      https://eups.lsst.codes/stack/osx/10.9/conda-system/miniconda3-4.7.12-1a1d771/

      Which corresponds to the following template:

      stack/{os_family}/{kernel_equivalent}/{compiler}/{miniconda_version}-{environment hash}/
      

      Since an environment effectively encodes kernel equivalent, compiler and the miniconda version is irrelevant, I'd suggest dropping those from the paths to simplify things, although at least dropping miniconda_version is desirable, as that version has no impact on binary or API compatibility.

      Further restructuring

      If we were to simplify this to rely more on the environment repo and hash, we may reconsider the string to be of the following format:

      stack/{arch}/{environment_repo}@{environment_hash}/
      

      e.g.:

      stack/osx-64/github.com/lsst/scipipe_conda_env@1234abc/
      

      This form is a bit more explicit, showing a bit more information about an environment in the URL, such as the repo where that environment is located.

      Using {arch} (linux-64, osx-64) is more consistent with how conda/conda-forge describes systems as well.

      Restructuring according to system

      If we would like to proceed further, it should be noted we conda-forge will be packaging newer versions of glibc. If we want to consider that in the path, we would expect kernel equivalent should also be changed to be either 10.9 (OS X), cos6 (centos 6), cos7 (centos 7), and roughly refers to the system/glibc features that are available to packages. Like the compiler, this can effectively be namespaced by the environment hash as well.

      Currently, the compiler is currently just "conda-system", which is also fine if we are leaving in the environment hash.

      Based on this, If we were to reimagine the path so that binaries might be reused across conda environments, we might choose something similar to the following:

      /stack/{arch}/{kernel_equivalent}/{compiler_equivalent}/
      

        Attachments

          Issue Links

            Activity

            Hide
            bvan Brian Van Klaveren added a comment -

            Kian-Tat Lim This is basically saying we will modify the paths for EUPS_PKGROOT for now, with newinstall the prime consumer (and hopefully lsst_build at sometime in the future) sorry if that wasn't clear.

            Tim JennessThe hash was in there historically, supposedly for the reason of matching binary builds (probably more for python versioning reasons). By not breaking it down systematically we end up with duplication of binary builds. I think the hash was added in DM-9526.

            Gabriele Comoretto [X] One of options is similar to conda-style, this is close, but with a bit more information in the URL:

            /stack/{arch}/{kernel_equivalent}/{compiler_equivalent}/
            

            I'd also note that conda has noarch for non-specific packages (e.g. pure python), I think conda tends to rely on the channel's metadata files to ensure things are compatible otherwise.

            Show
            bvan Brian Van Klaveren added a comment - Kian-Tat Lim This is basically saying we will modify the paths for EUPS_PKGROOT for now, with newinstall the prime consumer (and hopefully lsst_build at sometime in the future) sorry if that wasn't clear. Tim Jenness The hash was in there historically, supposedly for the reason of matching binary builds (probably more for python versioning reasons). By not breaking it down systematically we end up with duplication of binary builds. I think the hash was added in DM-9526 . Gabriele Comoretto [X] One of options is similar to conda-style, this is close, but with a bit more information in the URL: /stack/{arch}/{kernel_equivalent}/{compiler_equivalent}/ I'd also note that conda has noarch for non-specific packages (e.g. pure python), I think conda tends to rely on the channel's metadata files to ensure things are compatible otherwise.
            Hide
            gcomoretto Gabriele Comoretto [X] (Inactive) added a comment -

            Compatibilities should be defined as dependency information.

            If I think at anaconda approach, I would expect the package name in the directory structure.

            I wonder if this is a limitation of using eups.

            I would expect something like:

            /stack/{pkg_name}/{version or build unique id}/{arch}/

            But at this point, I would suggest something like this (close to what maven/nexus does):

            /stack/DM/{SWProduct}/{pkg_name}/{version or build unique id}/{arch}/

            but this implies everybody agreed/accepted the products in the product tree.

            It is quite disturbing to me, as a configuration engineer, to see all packages and all versions in a single folder.

            Show
            gcomoretto Gabriele Comoretto [X] (Inactive) added a comment - Compatibilities should be defined as dependency information. If I think at anaconda approach, I would expect the package name in the directory structure. I wonder if this is a limitation of using eups. I would expect something like: /stack/{pkg_name}/{version or build unique id}/{arch}/ But at this point, I would suggest something like this (close to what maven/nexus does): /stack/DM/{SWProduct}/{pkg_name}/{version or build unique id}/{arch}/ but this implies everybody agreed/accepted the products in the product tree. It is quite disturbing to me, as a configuration engineer, to see all packages and all versions in a single folder.
            Hide
            bvan Brian Van Klaveren added a comment -

            I realize I may have forgotten to mention that the names of the artifacts will have the same information they currently have,

            e.g.
            https://eups.lsst.codes/stack/redhat/el7/conda-system/miniconda3-4.7.12-1a1d771/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz
            https://eups.lsst.codes/stack/osx/10.9/conda-system/miniconda3-4.7.12-1a1d771/afw-19.0.0-26-g4476391b4+1@DarwinX86.tar.gz

            Which is roughly based on commit/version. That's more or less how conda does it as well, for example:

            https://conda.anaconda.org/conda-forge/linux-64/pytest-5.4.2-py37hc8dfbb8_0.tar.bz2

            Compare to yum format:
            Compare to the yum format:
            http://mirror.centos.org/centos/7/os/x86_64/Packages/gcc-4.8.5-39.el7.x86_64.rpm
            (with a noarch)
            http://mirror.centos.org/centos/7/os/x86_64/Packages/pytest-2.7.0-2.el7.noarch.rpm

            Alt arch (ppc64le):
            http://mirror.centos.org/altarch/7/os/ppc64le/Packages/gcc-4.8.5-39.el7.ppc64le.rpm
            (with a noarch)
            http://mirror.centos.org/altarch/7/os/ppc64le/Packages/pytest-2.7.0-2.el7.noarch.rpm

            In examining the full eups URL, for the current paths, you can see how much information is effectively duplicated.

            To propose a few variants:

            Variant 1:
            https://eups.lsst.codes/stack/linux-64/1a1d771/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz
            https://eups.lsst.codes/stack/osx-64/1a1d771/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz

            Variant 2 (without compiler toolset):
            https://eups.lsst.codes/stack/linux-64/cos6/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz
            https://eups.lsst.codes/stack/osx-64/10.9/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz

            Variant 3 (with conda toolset internal name):
            https://eups.lsst.codes/stack/linux-64/cos7/comp7/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz
            https://eups.lsst.codes/stack/osx-64/10.9/comp7/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz

            I'm happy with Variant 1 for now, but I think variant 3 is the most future-proof.

            I'm not opposed to a maven-style, but I'd note that we don't define the our packages with any kind of equivalent groupId (namespace), and while it could almost always be lsst or possibly lsst.meas, you are proposing the groupId should be based on the SWProduct. I think that has two issues - in maven there's nearly always a 1:1 relationship between the groupId and the package import path/namespace in the code, and I think it would require building a new metadata file to be defined in repositories to declare which software product they are part of (pom.xml equivalent) and some tooling built around that file, so this ticket would be blocked by that work.

            Show
            bvan Brian Van Klaveren added a comment - I realize I may have forgotten to mention that the names of the artifacts will have the same information they currently have, e.g. https://eups.lsst.codes/stack/redhat/el7/conda-system/miniconda3-4.7.12-1a1d771/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz https://eups.lsst.codes/stack/osx/10.9/conda-system/miniconda3-4.7.12-1a1d771/afw-19.0.0-26-g4476391b4+1@DarwinX86.tar.gz Which is roughly based on commit/version. That's more or less how conda does it as well, for example: https://conda.anaconda.org/conda-forge/linux-64/pytest-5.4.2-py37hc8dfbb8_0.tar.bz2 Compare to yum format: Compare to the yum format: http://mirror.centos.org/centos/7/os/x86_64/Packages/gcc-4.8.5-39.el7.x86_64.rpm (with a noarch) http://mirror.centos.org/centos/7/os/x86_64/Packages/pytest-2.7.0-2.el7.noarch.rpm Alt arch (ppc64le): http://mirror.centos.org/altarch/7/os/ppc64le/Packages/gcc-4.8.5-39.el7.ppc64le.rpm (with a noarch) http://mirror.centos.org/altarch/7/os/ppc64le/Packages/pytest-2.7.0-2.el7.noarch.rpm In examining the full eups URL, for the current paths, you can see how much information is effectively duplicated. To propose a few variants: Variant 1: https://eups.lsst.codes/stack/linux-64/1a1d771/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz https://eups.lsst.codes/stack/osx-64/1a1d771/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz Variant 2 (without compiler toolset): https://eups.lsst.codes/stack/linux-64/cos6/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz https://eups.lsst.codes/stack/osx-64/10.9/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz Variant 3 (with conda toolset internal name): https://eups.lsst.codes/stack/linux-64/cos7/comp7/afw-19.0.0-26-g4476391b4+1@Linux64.tar.gz https://eups.lsst.codes/stack/osx-64/10.9/comp7/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz I'm happy with Variant 1 for now, but I think variant 3 is the most future-proof. I'm not opposed to a maven-style, but I'd note that we don't define the our packages with any kind of equivalent groupId (namespace), and while it could almost always be lsst or possibly lsst.meas , you are proposing the groupId should be based on the SWProduct. I think that has two issues - in maven there's nearly always a 1:1 relationship between the groupId and the package import path/namespace in the code, and I think it would require building a new metadata file to be defined in repositories to declare which software product they are part of (pom.xml equivalent) and some tooling built around that file, so this ticket would be blocked by that work.
            Hide
            gcomoretto Gabriele Comoretto [X] (Inactive) added a comment -

            Having namespaces is probably premature, since we still don't have a consolidated SW product tree.

            However, the package itself is uniquely defined. What are the technical issues/problems to have a path like this:

            https://eups.lsst.codes/stack/afw/osx-64/10.9/comp7/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz

             

            Show
            gcomoretto Gabriele Comoretto [X] (Inactive) added a comment - Having namespaces is probably premature, since we still don't have a consolidated SW product tree. However, the package itself is uniquely defined. What are the technical issues/problems to have a path like this: https://eups.lsst.codes/stack/ afw /osx-64/10.9/comp7/afw-19.0.0-26-g4476391b4+1@Darwin64.tar.gz  
            Hide
            bvan Brian Van Klaveren added a comment -

            Doing that will break EUPS_PKGROOT

            Show
            bvan Brian Van Klaveren added a comment - Doing that will break EUPS_PKGROOT

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              bvan Brian Van Klaveren
              Watchers:
              Brian Van Klaveren, Chris Walter, Gabriele Comoretto [X] (Inactive), Heather Kelly, Kian-Tat Lim, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins

                  No builds found.