# recompress jointcal's testdata zeroed images with fpack

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
2
• Sprint:
• Team:

#### Description

Now that the stack supports tile compression of FITS files, and a number of bugs related to handling of FITS image metadata were fixed (DM-11332), I can recompress the images (really just holders of the metadata) I zeroed out in testdata_jointcal so that they aren't just gzipped files. Once done I can use the new butler calls (DM-9060, DM-9153) to directly access that metadata. This should drastically speed up file ingestion for jointcal.

The compression step is probably best done via a short bash script that runs gunzip and then fpack, with some carefully chosen parameters (the files are zeroed, so it probably doesn't matter much which compression scheme is selected).

If this works, we can close DM-6911 as well, as that probably got fixed "for free" as part of DM-11332.

#### Activity

Hide
John Parejko added a comment -

Doing this should speed up the work on DM-11783.

Show
John Parejko added a comment - Doing this should speed up the work on DM-11783 .
Hide
John Parejko added a comment -

Looks like this might chop a third or so out of jointcal's test suite runtimes. I'll do a test run with some larger datasets and do a quick profiling before sending it for review.

Here's the result of running time pytest -n8 tests with the pre-ticket versions on lsst-dev:

 ============= 58 passed, 2 skipped, 24 warnings in 158.72 seconds ==============   real 2m40.733s user 1m35.845s sys 0m13.759s 

And here's the post-ticket versions on lsst-dev:

 ============== 58 passed, 2 skipped, 8 warnings in 113.85 seconds ==============   real 1m55.833s user 1m17.925s sys 0m11.900s 

Show
John Parejko added a comment - Looks like this might chop a third or so out of jointcal's test suite runtimes. I'll do a test run with some larger datasets and do a quick profiling before sending it for review. Here's the result of running time pytest -n8 tests with the pre-ticket versions on lsst-dev: ============= 58 passed, 2 skipped, 24 warnings in 158.72 seconds ==============   real 2m40.733s user 1m35.845s sys 0m13.759s And here's the post-ticket versions on lsst-dev: ============== 58 passed, 2 skipped, 8 warnings in 113.85 seconds ==============   real 1m55.833s user 1m17.925s sys 0m11.900s
Hide
John Parejko added a comment -

Looks like this took about 25% off the data load time when processing validation_data_hsc, which is a good start.

Show
John Parejko added a comment - Looks like this took about 25% off the data load time when processing validation_data_hsc, which is a good start.
Hide
John Parejko added a comment -

Paul Price Do you mind tackling this small review? It's a speed improvement for both running the tests, and loading other data.

The fpacked calexps take a total of a few MB more space than the gzipped images, but the whole repo is about 1GB, so I'd say its a good trade.

Show
John Parejko added a comment - Paul Price Do you mind tackling this small review? It's a speed improvement for both running the tests, and loading other data. The fpacked calexps take a total of a few MB more space than the gzipped images, but the whole repo is about 1GB, so I'd say its a good trade.
Hide
John Parejko added a comment -
Show
John Parejko added a comment - Jenkins run: https://ci.lsst.codes/job/stack-os-matrix/26905/
Hide
Paul Price added a comment -

Looks good!

Trivial comments on the GitHub PRs.

Show
Paul Price added a comment - Looks good! Trivial comments on the GitHub PRs.
Hide
John Parejko added a comment -

Thanks, I fixed those two issues.

Merged and done.

Show
John Parejko added a comment - Thanks, I fixed those two issues. Merged and done.

#### People

Assignee:
John Parejko
Reporter:
John Parejko
Reviewers:
Paul Price
Watchers:
John Parejko, John Swinbank, Paul Price