Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-24709

Cleanup leftover conda indexes/caches (pkgs directory) for smaller container images

    XMLWordPrintable

    Details

    • Type: Story
    • Status: In Progress
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: conda
    • Labels:
      None
    • Team:
      Architecture
    • Urgent?:
      No

      Description

      Moving more packages to conda has meant a larger footprint of the images due to conda image sizes

      Conda uses the pkgs directory (effectively "$(dirname $(dirname $CONDA_EXE))/pkgs") as both a cache and a place to download binaries.

      conda clean -t cleans up the tarballs, which is a good first step. The tarballs can unequivocally be cleaned up - they shouldn't really be needed ever again.

      conda clean -a goes a step further and removes some extra things but retains the files for their hard links.

      The entire pkgs directory could be wiped out to save 1.5GB, but installing additional environments on top of that environment may end up costing more disk space (no more hardlinks). In the context of a docker container, this might not be likely to happen - I don't believe it's possible to even create your own conda environment on nublado due to permissions, though I might be wrong.

      Here is a comparison of the two strategies:

      [lsst@62ff872ebb79 miniconda3-4.7.12]$ pwd
      /opt/lsst/software/stack/python/miniconda3-4.7.12
      [lsst@62ff872ebb79 miniconda3-4.7.12]$ du -hs . 
      3.9G	.
      [lsst@62ff872ebb79 miniconda3-4.7.12]$ source bin/activate 
      (base) [lsst@62ff872ebb79 miniconda3-4.7.12]$ conda clean -a
      Cache location: /opt/lsst/software/stack/python/miniconda3-4.7.12/pkgs
      Will remove the following tarballs:
      ...
      ...
      removing minuit2-6.18.00-minuit2_standalone
      (base) [lsst@62ff872ebb79 miniconda3-4.7.12]$ 
      LICENSE.txt  bin  compiler_compat  conda-meta  condabin  envs  etc  include  lib  pkgs  share  shell  ssl  x86_64-conda_cos6-linux-gnu
      (base) [lsst@62ff872ebb79 miniconda3-4.7.12]$ du -hs .
      3.1G	.
      

      [lsst@62ff872ebb79 miniconda3-4.7.12]$ pwd
      /opt/lsst/software/stack/python/miniconda3-4.7.12
      [lsst@62ff872ebb79 miniconda3-4.7.12]$ source bin/activate 
      (base) [lsst@62ff872ebb79 miniconda3-4.7.12]$ rm -rf pkgs/*
      (base) [lsst@62ff872ebb79 miniconda3-4.7.12]$ du -hs .
      2.6G	.
      

        Attachments

          Activity

          Hide
          ktl Kian-Tat Lim added a comment -

          You can create your own environment in nublado:

          conda create -n mystack --clone lsst-scipipe-984c9f7
          conda activate mystack
          conda install -c conda-forge eups
          

          Show
          ktl Kian-Tat Lim added a comment - You can create your own environment in nublado: conda create -n mystack --clone lsst-scipipe-984c9f7 conda activate mystack conda install -c conda-forge eups
          Hide
          bvan Brian Van Klaveren added a comment -

          I did some testing on this. Cleaning tarballs is fine. Cleaning all will remove more things than you want (like the compiler package)

          Some packages need to be re-downloaded when doing a clone - but not all of them - about 34 MB as of today.

          I suspect the why is related to this comment:
          https://github.com/conda/conda/issues/7398#issuecomment-395893407

          Here is what needs to be re-downloaded on a clone:

          (base) bvan@PC97504:pkgs$ conda create -y -n clone_test --clone lsst-scipipe-973126a
          Source:      /tmp/newinstall/python/miniconda3-4.7.12/envs/lsst-scipipe-973126a
          Destination: /tmp/newinstall/python/miniconda3-4.7.12/envs/clone_test
          Packages: 245
          Files: 317
           
          Downloading and Extracting Packages
          mpi-1.0              | ###...### | 100% 
          pthread-stubs-0.4    | ###...### | 100% 
          doxygen-1.8.18       | ###...### | 100% 
          libllvm9-9.0.1       | ###...### | 100% 
          libopenblas-0.3.9    | ###...### | 100% 
          minuit2-6.18.00      | ###...### | 100% 
          apr-1.6.5            | ###...### | 100% 
          clangxx-9.0.1        | ###...### | 100% 
          libblas-3.8.0        | ###...### | 100% 
          binutils-meta-1.0.4  | ###...### | 100% 
          libcblas-3.8.0       | ###...### | 100% 
          liblapack-3.8.0      | ###...### | 100% 
          backports-1.0        | ###...### | 100% 
          curl-7.69.1          | ###...### | 100% 
          python_abi-3.7       | ###...### | 100% 
          fortran-compiler-1.0 | ###...### | 100% 
          c-compiler-1.0.4     | ###...### | 100% 
          importlib_metadata-1 | ###...### | 100% 
          cxx-compiler-1.0.4   | ###...### | 100% 
          parquet-cpp-1.5.1    | ###...### | 100% 
          compilers-1.0.4      | ###...### | 100% 
          matplotlib-3.0.3     | ###...### | 100% 
          Preparing transaction: done
          Verifying transaction: done
          Executing transaction: done
          

          Show
          bvan Brian Van Klaveren added a comment - I did some testing on this. Cleaning tarballs is fine. Cleaning all will remove more things than you want (like the compiler package) Some packages need to be re-downloaded when doing a clone - but not all of them - about 34 MB as of today. I suspect the why is related to this comment: https://github.com/conda/conda/issues/7398#issuecomment-395893407 Here is what needs to be re-downloaded on a clone: (base) bvan@PC97504:pkgs$ conda create -y -n clone_test --clone lsst-scipipe-973126a Source: /tmp/newinstall/python/miniconda3-4.7.12/envs/lsst-scipipe-973126a Destination: /tmp/newinstall/python/miniconda3-4.7.12/envs/clone_test Packages: 245 Files: 317   Downloading and Extracting Packages mpi-1.0 | ###...### | 100% pthread-stubs-0.4 | ###...### | 100% doxygen-1.8.18 | ###...### | 100% libllvm9-9.0.1 | ###...### | 100% libopenblas-0.3.9 | ###...### | 100% minuit2-6.18.00 | ###...### | 100% apr-1.6.5 | ###...### | 100% clangxx-9.0.1 | ###...### | 100% libblas-3.8.0 | ###...### | 100% binutils-meta-1.0.4 | ###...### | 100% libcblas-3.8.0 | ###...### | 100% liblapack-3.8.0 | ###...### | 100% backports-1.0 | ###...### | 100% curl-7.69.1 | ###...### | 100% python_abi-3.7 | ###...### | 100% fortran-compiler-1.0 | ###...### | 100% c-compiler-1.0.4 | ###...### | 100% importlib_metadata-1 | ###...### | 100% cxx-compiler-1.0.4 | ###...### | 100% parquet-cpp-1.5.1 | ###...### | 100% compilers-1.0.4 | ###...### | 100% matplotlib-3.0.3 | ###...### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done

            People

            Assignee:
            bvan Brian Van Klaveren
            Reporter:
            bvan Brian Van Klaveren
            Watchers:
            Adam Thornton, Brian Van Klaveren, Kian-Tat Lim
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:

                Jenkins

                No builds found.