Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-12029

recompress jointcal's testdata zeroed images with fpack

    Details

      Description

      Now that the stack supports tile compression of FITS files, and a number of bugs related to handling of FITS image metadata were fixed (DM-11332), I can recompress the images (really just holders of the metadata) I zeroed out in testdata_jointcal so that they aren't just gzipped files. Once done I can use the new butler calls (DM-9060, DM-9153) to directly access that metadata. This should drastically speed up file ingestion for jointcal.

      The compression step is probably best done via a short bash script that runs gunzip and then fpack, with some carefully chosen parameters (the files are zeroed, so it probably doesn't matter much which compression scheme is selected).

      If this works, we can close DM-6911 as well, as that probably got fixed "for free" as part of DM-11332.

        Attachments

          Issue Links

            Activity

            Hide
            Parejkoj John Parejko added a comment -

            Doing this should speed up the work on DM-11783.

            Show
            Parejkoj John Parejko added a comment - Doing this should speed up the work on DM-11783 .
            Hide
            Parejkoj John Parejko added a comment -

            Looks like this might chop a third or so out of jointcal's test suite runtimes. I'll do a test run with some larger datasets and do a quick profiling before sending it for review.

            Here's the result of running time pytest -n8 tests with the pre-ticket versions on lsst-dev:

            ============= 58 passed, 2 skipped, 24 warnings in 158.72 seconds ==============
             
            real	2m40.733s
            user	1m35.845s
            sys	0m13.759s
            

            And here's the post-ticket versions on lsst-dev:

            ============== 58 passed, 2 skipped, 8 warnings in 113.85 seconds ==============
             
            real	1m55.833s
            user	1m17.925s
            sys	0m11.900s
            

            Show
            Parejkoj John Parejko added a comment - Looks like this might chop a third or so out of jointcal's test suite runtimes. I'll do a test run with some larger datasets and do a quick profiling before sending it for review. Here's the result of running time pytest -n8 tests with the pre-ticket versions on lsst-dev: ============= 58 passed, 2 skipped, 24 warnings in 158.72 seconds ==============   real 2m40.733s user 1m35.845s sys 0m13.759s And here's the post-ticket versions on lsst-dev: ============== 58 passed, 2 skipped, 8 warnings in 113.85 seconds ==============   real 1m55.833s user 1m17.925s sys 0m11.900s
            Hide
            Parejkoj John Parejko added a comment -

            Looks like this took about 25% off the data load time when processing validation_data_hsc, which is a good start.

            Show
            Parejkoj John Parejko added a comment - Looks like this took about 25% off the data load time when processing validation_data_hsc, which is a good start.
            Hide
            Parejkoj John Parejko added a comment -

            Paul Price Do you mind tackling this small review? It's a speed improvement for both running the tests, and loading other data.

            The fpacked calexps take a total of a few MB more space than the gzipped images, but the whole repo is about 1GB, so I'd say its a good trade.

            Show
            Parejkoj John Parejko added a comment - Paul Price Do you mind tackling this small review? It's a speed improvement for both running the tests, and loading other data. The fpacked calexps take a total of a few MB more space than the gzipped images, but the whole repo is about 1GB, so I'd say its a good trade.
            Hide
            Parejkoj John Parejko added a comment -
            Show
            Parejkoj John Parejko added a comment - Jenkins run: https://ci.lsst.codes/job/stack-os-matrix/26905/
            Hide
            price Paul Price added a comment -

            Looks good!

            Trivial comments on the GitHub PRs.

            Show
            price Paul Price added a comment - Looks good! Trivial comments on the GitHub PRs.
            Hide
            Parejkoj John Parejko added a comment -

            Thanks, I fixed those two issues.

            Merged and done.

            Show
            Parejkoj John Parejko added a comment - Thanks, I fixed those two issues. Merged and done.

              People

              • Assignee:
                Parejkoj John Parejko
                Reporter:
                Parejkoj John Parejko
                Reviewers:
                Paul Price
                Watchers:
                John Parejko, John Swinbank, Paul Price
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: