Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15544

ExposureCatalog should support new photoCalib objects

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: afw
    • Labels:
      None

      Description

      Currently, ExposureCatalog supports old Calib calibration objects.  However, for use with jointcal or fgcmcal these need to support new PhotoCalib calibration objects.  This will also allow multiple PhotoCalib objects to be bundled into one persistence file (per visit, for example, the number of required inodes would be reduced by a factor of 100 for HSC data for fgcmcal outputs).

        Attachments

          Issue Links

            Activity

            Hide
            erykoff Eli Rykoff added a comment -

            Some further notes that might be relevant.  Currently, making individual PhotoCalib outputs for each visit/ccd for HSC RC2 takes the information contained in a ~7Mb table and blows it up to 3.5Gb made up of ~28k tiny files.  Each of these files, even with the fits overhead, seems to be smaller than a block size (or minimum file size) on lsst-dev01, which is a large part of the overhead.  Tarring up all the files drops from 3.5Gb to ~700Mb, and gzipping that gets down to ~7Mb which is the actual information content.  Therefore, being able to persist groups of PhotoCalib objects will result in a substantial savings in both storage space and numbers of inodes.  My guess is that the overhead of having to read in (at least) a full visit just to get the calibration information for 1 ccd would still be minimal (responding to Jim Bosch's comments from a meeting) but this would of course have to be tested.

            Show
            erykoff Eli Rykoff added a comment - Some further notes that might be relevant.  Currently, making individual PhotoCalib outputs for each visit/ccd for HSC RC2 takes the information contained in a ~7Mb table and blows it up to 3.5Gb made up of ~28k tiny files.  Each of these files, even with the fits overhead, seems to be smaller than a block size (or minimum file size) on lsst-dev01 , which is a large part of the overhead.  Tarring up all the files drops from 3.5Gb to ~700Mb, and gzipping that gets down to ~7Mb which is the actual information content.  Therefore, being able to persist groups of PhotoCalib objects will result in a substantial savings in both storage space and numbers of inodes.  My guess is that the overhead of having to read in (at least) a full visit just to get the calibration information for 1 ccd would still be minimal (responding to Jim Bosch 's comments from a meeting) but this would of course have to be tested.
            Hide
            erykoff Eli Rykoff added a comment -

            I assume this is sitting in the not-urgent queue, but I noticed something that I thought would be interesting to John Parejko (if he hasn't noticed already).  As well as for the fgcmcal output above, In the jointcal output for, e.g, DM-15603-jointcal, the jointcal-results directory takes up 11Gb and tens of thousands of inodes. Taking one directory at random (/datasets/hsc/repo/rerun/RC/w_2018_36/DM-15603-jointcal/jointcal-results/HSC-R/9615) the disk usage is 439Mb. Simply tarring the files in this directory reduces the usage to 112Mb – that's 3x overhead just because (as I assumed above) the files are much smaller than the block size. Gzipping this tar file gets it down to 13Mb, which is the information content (as above).
            Being able to store arrays of these would save tremendous overhead.

            Show
            erykoff Eli Rykoff added a comment - I assume this is sitting in the not-urgent queue, but I noticed something that I thought would be interesting to John Parejko (if he hasn't noticed already).  As well as for the fgcmcal output above, In the jointcal  output for, e.g, DM-15603 -jointcal , the jointcal-results directory takes up 11Gb and tens of thousands of inodes. Taking one directory at random ( /datasets/hsc/repo/rerun/RC/w_2018_36/ DM-15603 -jointcal/jointcal-results/HSC-R/9615 ) the disk usage is 439Mb. Simply tarring the files in this directory reduces the usage to 112Mb – that's 3x overhead just because (as I assumed above) the files are much smaller than the block size. Gzipping this tar file gets it down to 13Mb, which is the information content (as above). Being able to store arrays of these would save tremendous overhead.
            Hide
            jbosch Jim Bosch added a comment -

            On the contrary, I think this needs to be in the queue of things we get done before we declare meas_mosaic->jointcal complete (at least the support in ExposureCatalog, maybe not the usage in jointcal outputs). We can't retire Calib until PhotoCalib is everywhere Calib was.  John Swinbank, John Parejko, I may have missed this one when last we spoke.

            Show
            jbosch Jim Bosch added a comment - On the contrary, I think this needs to be in the queue of things we get done before we declare meas_mosaic->jointcal complete (at least the support in ExposureCatalog, maybe not the usage in jointcal outputs). We can't retire Calib until PhotoCalib is everywhere Calib was.  John Swinbank , John Parejko , I may have missed this one when last we spoke.
            Hide
            Parejkoj John Parejko added a comment -

            Why do you say that, Jim Bosch? jointcal isn't doing anything different from what meas_mosaic did in this regard. Replacing Calib with PhotoCalib is also independent of that (though I hope to tackle that soon).

            I do agree this is a good idea, and there are some things I'd like to do to more efficiently persist the jointcal results, but those are longer term goals.

            Show
            Parejkoj John Parejko added a comment - Why do you say that, Jim Bosch ? jointcal isn't doing anything different from what meas_mosaic did in this regard. Replacing Calib with PhotoCalib is also independent of that (though I hope to tackle that soon). I do agree this is a good idea, and there are some things I'd like to do to more efficiently persist the jointcal results, but those are longer term goals.
            Hide
            Parejkoj John Parejko added a comment -

            And now that I look into it, what exactly is an ExposureCatalog? The header file isn't very instructive.

            Show
            Parejkoj John Parejko added a comment - And now that I look into it, what exactly is an ExposureCatalog ? The header file isn't very instructive.
            Hide
            jbosch Jim Bosch added a comment -

            jointcal isn't doing anything different from what meas_mosaic did in this regard. Replacing Calib with PhotoCalib is also independent of that (though I hope to tackle that soon).

            I suppose that's fair.  I guess I had just envisioned Calib->PhotoCalib happening at the same time.

            And now that I look into it, what exactly is an ExposureCatalog? The header file isn't very instructive.

            ExposureRecord is basically a version of ExposureInfo that you can make a catalog out of.  We use them extensively in coadds to be able to store metadata about what into the coadd, including some things that are necessary for coadd functionality (like CoaddPsf).  And like any other kind of record, you can add other fields to them, too.  There's a sad tale of why ExposureRecord and ExposureInfo can't easily be unified that I think involves BBox ownership, but I forget the details.

            Show
            jbosch Jim Bosch added a comment - jointcal isn't doing anything different from what meas_mosaic did in this regard. Replacing Calib with PhotoCalib is also independent of that (though I hope to tackle that soon). I suppose that's fair.  I guess I had just envisioned Calib->PhotoCalib happening at the same time. And now that I look into it, what exactly is an  ExposureCatalog ? The header file isn't very instructive. ExposureRecord is basically a version of ExposureInfo that you can make a catalog out of.  We use them extensively in coadds to be able to store metadata about what into the coadd, including some things that are necessary for coadd functionality (like CoaddPsf).  And like any other kind of record, you can add other fields to them, too.  There's a sad tale of why ExposureRecord and ExposureInfo can't easily be unified that I think involves BBox ownership, but I forget the details.
            Hide
            Parejkoj John Parejko added a comment -

            There's a sad tale of why ExposureRecord and ExposureInfo can't easily be unified that I think involves BBox ownership, but I forget the details.

            So, DM-7565 then?

            Show
            Parejkoj John Parejko added a comment - There's a sad tale of why ExposureRecord and ExposureInfo can't easily be unified that I think involves BBox ownership, but I forget the details. So, DM-7565 then?
            Hide
            jbosch Jim Bosch added a comment -

            That's at least some of it; might be all of it.

            Show
            jbosch Jim Bosch added a comment - That's at least some of it; might be all of it.
            Hide
            Parejkoj John Parejko added a comment -

            Done as part of DM-10156.

            Show
            Parejkoj John Parejko added a comment - Done as part of DM-10156 .
            Hide
            Parejkoj John Parejko added a comment -

            Now that this is done, Eli Rykoff: you and I should talk about better ways to persist PhotoCalibs from jointcal/fgcm runs to save a bundle of disk space.

            Show
            Parejkoj John Parejko added a comment - Now that this is done, Eli Rykoff : you and I should talk about better ways to persist PhotoCalibs from jointcal/fgcm runs to save a bundle of disk space.

              People

              • Assignee:
                Parejkoj John Parejko
                Reporter:
                erykoff Eli Rykoff
                Watchers:
                Eli Rykoff, Jim Bosch, John Parejko, John Swinbank
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: