Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-12029

recompress jointcal's testdata zeroed images with fpack

    XMLWordPrintable

Details

    Description

      Now that the stack supports tile compression of FITS files, and a number of bugs related to handling of FITS image metadata were fixed (DM-11332), I can recompress the images (really just holders of the metadata) I zeroed out in testdata_jointcal so that they aren't just gzipped files. Once done I can use the new butler calls (DM-9060, DM-9153) to directly access that metadata. This should drastically speed up file ingestion for jointcal.

      The compression step is probably best done via a short bash script that runs gunzip and then fpack, with some carefully chosen parameters (the files are zeroed, so it probably doesn't matter much which compression scheme is selected).

      If this works, we can close DM-6911 as well, as that probably got fixed "for free" as part of DM-11332.

      Attachments

        Issue Links

          Activity

            Doing this should speed up the work on DM-11783.

            Parejkoj John Parejko added a comment - Doing this should speed up the work on DM-11783 .
            Parejkoj John Parejko added a comment -

            Looks like this might chop a third or so out of jointcal's test suite runtimes. I'll do a test run with some larger datasets and do a quick profiling before sending it for review.

            Here's the result of running time pytest -n8 tests with the pre-ticket versions on lsst-dev:

            ============= 58 passed, 2 skipped, 24 warnings in 158.72 seconds ==============
             
            real	2m40.733s
            user	1m35.845s
            sys	0m13.759s
            

            And here's the post-ticket versions on lsst-dev:

            ============== 58 passed, 2 skipped, 8 warnings in 113.85 seconds ==============
             
            real	1m55.833s
            user	1m17.925s
            sys	0m11.900s
            

            Parejkoj John Parejko added a comment - Looks like this might chop a third or so out of jointcal's test suite runtimes. I'll do a test run with some larger datasets and do a quick profiling before sending it for review. Here's the result of running time pytest -n8 tests with the pre-ticket versions on lsst-dev: ============= 58 passed, 2 skipped, 24 warnings in 158.72 seconds ==============   real 2m40.733s user 1m35.845s sys 0m13.759s And here's the post-ticket versions on lsst-dev: ============== 58 passed, 2 skipped, 8 warnings in 113.85 seconds ==============   real 1m55.833s user 1m17.925s sys 0m11.900s
            Parejkoj John Parejko added a comment -

            Looks like this took about 25% off the data load time when processing validation_data_hsc, which is a good start.

            Parejkoj John Parejko added a comment - Looks like this took about 25% off the data load time when processing validation_data_hsc, which is a good start.
            Parejkoj John Parejko added a comment -

            price Do you mind tackling this small review? It's a speed improvement for both running the tests, and loading other data.

            The fpacked calexps take a total of a few MB more space than the gzipped images, but the whole repo is about 1GB, so I'd say its a good trade.

            Parejkoj John Parejko added a comment - price Do you mind tackling this small review? It's a speed improvement for both running the tests, and loading other data. The fpacked calexps take a total of a few MB more space than the gzipped images, but the whole repo is about 1GB, so I'd say its a good trade.
            Parejkoj John Parejko added a comment - Jenkins run: https://ci.lsst.codes/job/stack-os-matrix/26905/
            price Paul Price added a comment -

            Looks good!

            Trivial comments on the GitHub PRs.

            price Paul Price added a comment - Looks good! Trivial comments on the GitHub PRs.
            Parejkoj John Parejko added a comment -

            Thanks, I fixed those two issues.

            Merged and done.

            Parejkoj John Parejko added a comment - Thanks, I fixed those two issues. Merged and done.

            People

              Parejkoj John Parejko
              Parejkoj John Parejko
              Paul Price
              John Parejko, John Swinbank, Paul Price
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.