Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-10904

analyze the storage usage of the output butler repos from S17B reprocessing

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The S17B HSC reprocessing (DM-10404) generated some butler repos with processed data:

      /datasets/hsc/repo/rerun/DM-10404/SFM
      /datasets/hsc/repo/rerun/DM-10404/DEEP
      /datasets/hsc/repo/rerun/DM-10404/UDEEP
      /datasets/hsc/repo/rerun/DM-10404/WIDE
      

      We would like to know, for example, how many files are there, what are the sizes of the files (mostly small files?), what data products are taking up the space (likely images, but which ones?), and so on. Write up a brief summary.

        Attachments

          Issue Links

            Activity

            Hide
            sthrush Samantha Thrush added a comment - - edited

            I have finally finished the histograms on the space taken up by various butler datasets. As you will see below, SFM is fundamentally different from the other three directories (DEEP, UDEEP, WIDE).
            Let's first consider the SFM butler space graph below:

            Here, I found the paths to these files partially through butler and partially just by snooping around the SFM directory. The location of these files are included in the table below, but please take note of my short-hand: if a word in the path is bold, then that means that it is a wildcard for the directories that fit that descriptor. For example, if I specified the path to a file as /SFM/number then that would mean that the file would reside in one of the many subdirectories inside of SFM that is a number.

            Butler file Path
            CORR SFM/number/filter/corr/CORR-number-number.fits
            SRC SFM/number/filter/output/SRC-number-number.fits
            SRCMATCH SFM/number/filter/output/SRCMATCH-number-number.fits
            SRCMATCHFULL SFM/number/filter/output/SRCMATCHFULL-number-number.fits
            deepCoadd-skyMap SFM/deepCoadd/skyMap.pickle
            BKGD SFM/number/filter/corr/BKGD-number-number.fits
            boost files SFM/number/filter/singleFrameDriver_metadata/number.boost
            flattened files SFM/number/filter/thumbs/flattened-number-number.png
            oss files SFM/number/filter/thumbs/oss-number-number.png
            deep_makeSkyMap.boost SFM/metadata/deep_makeSkyMap.boost
            icSrc SFM/schema/icSrc.fits
            src SFM/schema/src.fits

            In a separate comment, I will discuss the other three directories since their structures are all so similar.

            Show
            sthrush Samantha Thrush added a comment - - edited I have finally finished the histograms on the space taken up by various butler datasets. As you will see below, SFM is fundamentally different from the other three directories (DEEP, UDEEP, WIDE). Let's first consider the SFM butler space graph below: Here, I found the paths to these files partially through butler and partially just by snooping around the SFM directory. The location of these files are included in the table below, but please take note of my short-hand: if a word in the path is bold, then that means that it is a wildcard for the directories that fit that descriptor. For example, if I specified the path to a file as /SFM/ number then that would mean that the file would reside in one of the many subdirectories inside of SFM that is a number. Butler file Path CORR SFM/ number / filter /corr/CORR- number - number .fits SRC SFM/ number / filter /output/SRC- number - number .fits SRCMATCH SFM/ number / filter /output/SRCMATCH- number - number .fits SRCMATCHFULL SFM/ number / filter /output/SRCMATCHFULL- number - number .fits deepCoadd-skyMap SFM/deepCoadd/skyMap.pickle BKGD SFM/ number / filter /corr/BKGD- number - number .fits boost files SFM/ number / filter /singleFrameDriver_metadata/ number .boost flattened files SFM/ number / filter /thumbs/flattened- number - number .png oss files SFM/ number / filter /thumbs/oss- number - number .png deep_makeSkyMap.boost SFM/metadata/deep_makeSkyMap.boost icSrc SFM/schema/icSrc.fits src SFM/schema/src.fits In a separate comment, I will discuss the other three directories since their structures are all so similar.
            Hide
            sthrush Samantha Thrush added a comment - - edited

            As stated previously, DEEP, UDEEP, and WIDE all have very similar file structures and similar relations between the relative sizes of most of their butler datasets, with the exception of their schema files, which take up the same space in all three directories.

            In order to better understand where each of these files are coming from, here is a table. This table is formatted like the one above, with the wildcards being bolded. To be more succinct, unlike above, please assume that all of these paths dwell within DEEP, UDEEP, or WIDE

            Butler File Path
            FORCEDSRC number/filter/tract*number*/FORCEDSRC-number-number.fits
            warp deepCoadd/filter/number/number,number/warp-filter-number-number,number-number.fits
            mergeDet deepCoadd-results/merged/number/number,number/mergeDet-filter-number-number,number.fits
            ref deepCoadd-results/merged/number/number,number/ref-filter-number-number,number.fits
            calexp deepCoadd-results/filter/number/number,number/calexp-filter-number-number,number.fits
            det_bkgd deepCoadd-results/filter/number/number,number/det_bkgd-filter-number-number,number.fits
            det deepCoadd-results/filter/number/number,number/det-filter-number-number,number.fits
            forced_src deepCoadd-results/filter/number/number,number/forced_src-filter-number-number,number.fits
            meas deepCoadd-results/filter/number/number,number/meas-filter-number-number,number.fits
            srcMatchFull deepCoadd-results/filter/number/number,number/srcMatchFull-filter-number-number,number.fits
            srcMatch deepCoadd-results/filter/number/number,number/srcMatch-filter-number-number,number.fits
            fcr jointcal-results/number/fcr-number-number.fits
            wcs jointcal-results/number/wcs-number-number.fits
            schema files schema/*
            Show
            sthrush Samantha Thrush added a comment - - edited As stated previously, DEEP, UDEEP, and WIDE all have very similar file structures and similar relations between the relative sizes of most of their butler datasets, with the exception of their schema files, which take up the same space in all three directories. In order to better understand where each of these files are coming from, here is a table. This table is formatted like the one above, with the wildcards being bolded. To be more succinct, unlike above, please assume that all of these paths dwell within DEEP, UDEEP, or WIDE Butler File Path FORCEDSRC number / filter /tract*number*/FORCEDSRC- number - number .fits warp deepCoadd/ filter / number / number , number /warp- filter - number - number,number - number .fits mergeDet deepCoadd-results/merged/ number / number,number /mergeDet- filter - number - number,number .fits ref deepCoadd-results/merged/ number / number,number /ref- filter - number - number,number .fits calexp deepCoadd-results/ filter / number / number,number /calexp- filter - number - number,number .fits det_bkgd deepCoadd-results/ filter / number / number,number /det_bkgd- filter - number - number,number .fits det deepCoadd-results/ filter / number / number,number /det- filter - number - number,number .fits forced_src deepCoadd-results/ filter / number / number,number /forced_src- filter - number - number,number .fits meas deepCoadd-results/ filter / number / number,number /meas- filter - number - number,number .fits srcMatchFull deepCoadd-results/ filter / number / number,number /srcMatchFull- filter - number - number,number .fits srcMatch deepCoadd-results/ filter / number / number,number /srcMatch- filter - number - number,number .fits fcr jointcal-results/ number /fcr- number - number .fits wcs jointcal-results/ number /wcs- number - number .fits schema files schema/*
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            Thank you Samantha Thrush, these look great. Feel free to close the ticket.

            Pinging Paul Domagala [X] Andrew Loftus Greg Daues as they may be interested in this summary too and may have suggestions on what to look in future tickets.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - Thank you Samantha Thrush , these look great. Feel free to close the ticket. Pinging Paul Domagala [X] Andrew Loftus Greg Daues as they may be interested in this summary too and may have suggestions on what to look in future tickets.
            Hide
            sthrush Samantha Thrush added a comment -

            Just as a small update, I was reviewing the Butler SFM plot and noticed that I forgot to include two different butler file types. I have since amended the plot and the table.

            Show
            sthrush Samantha Thrush added a comment - Just as a small update, I was reviewing the Butler SFM plot and noticed that I forgot to include two different butler file types. I have since amended the plot and the table.
            Hide
            sthrush Samantha Thrush added a comment -

            The two scripts I used to gather information for the graphs above can be found at the following links:
            https://github.com/Samantha-Thrush/LSST_codes/blob/master/butlerfind.sh
            https://github.com/Samantha-Thrush/LSST_codes/blob/master/repostats.sh

            Show
            sthrush Samantha Thrush added a comment - The two scripts I used to gather information for the graphs above can be found at the following links: https://github.com/Samantha-Thrush/LSST_codes/blob/master/butlerfind.sh https://github.com/Samantha-Thrush/LSST_codes/blob/master/repostats.sh

              People

              • Assignee:
                sthrush Samantha Thrush
                Reporter:
                hchiang2 Hsin-Fang Chiang
                Watchers:
                Hsin-Fang Chiang, Michelle Gower, Samantha Thrush
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel