Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-34623 AP Performance sprint
  3. DM-34827

Identify larger DC2 dataset for timing tests

    XMLWordPrintable

Details

    • Technical task
    • Status: Done
    • Resolution: Done
    • None
    • None
    • None
    • AP S22-6 (May)
    • Alert Production

    Description

      Identify a larger dataset from DC2 for bulk timing tests.

      Choose these from the the 5.5 year depth DC2 tract 4431 used by Kenneth Herner (e.g. in https://jira.lsstcorp.org/browse/PREOPS-631).

      We should focus on many visits instead of many detectors. Simon's work creating the rc2_subset HSC CI package could be a useful reference.

      Attachments

        Issue Links

          Activity

            Excellent, let's adopt those 272 datasets as our larger, uh, dataset. The fact that they all fully reside inside those four patches is excellent because it gives us good spatial overlap for testing association.

            We should think about where we want your "find calexps in patches" script to land, because it's very useful for AP.

            Since the 272 visit+detectors are quite the hodgepodge due to all the dithering, we should probably manually create a TAGGED collection in /repo/dc2 once instead of defining a long list of datasets each time. This will require an appropriate and/or clever name. I don't know how to create a TAGGED collection like this offhand, but I know it's possible; for example, Jim created "hits2014" and "hits2015" tagged collections in /repo/main/DECam for me.

            mrawls Meredith Rawls added a comment - Excellent, let's adopt those 272 datasets as our larger, uh, dataset. The fact that they all fully reside inside those four patches is excellent because it gives us good spatial overlap for testing association. We should think about where we want your "find calexps in patches" script to land, because it's very useful for AP. Since the 272 visit+detectors are quite the hodgepodge due to all the dithering, we should probably manually create a TAGGED collection in /repo/dc2  once instead of defining a long list of datasets each time. This will require an appropriate and/or clever name. I don't know how to create a TAGGED collection like this offhand, but I know it's possible; for example, Jim created "hits2014" and "hits2015" tagged collections in /repo/main/DECam for me.

            I created a TAGGED collection u/mrawls/DM-34827/raw/4patch_4431 with the datasets in the attached cdv file.

            I created a CHAINED collection u/mrawls/DM-34827/defaults/4patch_4431 containing the above as well as calibs, skymaps, and refcats.

            The script I used to do this is saved in /project/mrawls/ap_profile/tagged_chained_collections_create.py

             

            mrawls Meredith Rawls added a comment - I created a TAGGED collection u/mrawls/ DM-34827 /raw/4patch_4431 with the datasets in the attached cdv file. I created a CHAINED collection u/mrawls/ DM-34827 /defaults/4patch_4431 containing the above as well as calibs, skymaps, and refcats. The script I used to do this is saved in /project/mrawls/ap_profile/tagged_chained_collections_create.py .   
            Parejkoj John Parejko added a comment -

            It doesn't really matter now, but I'd have opened the file with `astropy.table.Table.read()`; then you wouldn't have to parse the columns and could access them by name. Why did you use `u/parejkoj/profiling-DM-34825` as the collection to draw the raws from in your queryDatasets call, instead of Ken's `u/kherner/2.2i/runs/tract4431-w40`?

            For the record, the idea is "make a list of dataRefs, pass them to butler.registry.associate(collections, dataRefs) to tag them, call butler collection-chain to chain them with the various calibs etc.".

            Want to set off a run in the morning on DM-34828, with the ordering fix Krzysztof gave you, just to see how it goes?

            Parejkoj John Parejko added a comment - It doesn't really matter now, but I'd have opened the file with `astropy.table.Table.read()`; then you wouldn't have to parse the columns and could access them by name. Why did you use `u/parejkoj/profiling- DM-34825 ` as the collection to draw the raws from in your queryDatasets call, instead of Ken's `u/kherner/2.2i/runs/tract4431-w40`? For the record, the idea is "make a list of dataRefs, pass them to butler.registry.associate(collections, dataRefs) to tag them, call butler collection-chain to chain them with the various calibs etc.". Want to set off a run in the morning on DM-34828 , with the ordering fix Krzysztof gave you, just to see how it goes?

            I did basic string parsing since it was what I could remember how to do offhand, but good point re Astropy for next time. I used that collection since it chained along Ken's larger collection, so it was functionally equivalent. Thanks!

             

            mrawls Meredith Rawls added a comment - I did basic string parsing since it was what I could remember how to do offhand, but good point re Astropy for next time. I used that collection since it chained along Ken's larger collection, so it was functionally equivalent. Thanks!  

            Bonus afterthought - because there are zillions of miscellaneous datasets in Ken's run, I made a second TAGGED collection that contains just the goodSeeingCoadds we want to use as templates here. The procedure I followed is also in the tagged_chained_collections_create.py script mentioned previously.

            This collection is called u/mrawls/DM-34827/coadd/4patch_4431 and is intentionally NOT chained into the defaults, in case we ever want to build better templates.

            The idea is that an ApPipe run can use just "defaults" and "coadd" as the two input collections.

            mrawls Meredith Rawls added a comment - Bonus afterthought - because there are zillions of miscellaneous datasets in Ken's run, I made a second TAGGED collection that contains just the goodSeeingCoadds we want to use as templates here. The procedure I followed is also in the tagged_chained_collections_create.py script mentioned previously. This collection is called u/mrawls/ DM-34827 /coadd/4patch_4431 and is intentionally NOT chained into the defaults, in case we ever want to build better templates. The idea is that an ApPipe run can use just "defaults" and "coadd" as the two input collections.

            People

              mrawls Meredith Rawls
              ebellm Eric Bellm
              John Parejko
              Eric Bellm, John Parejko, Meredith Rawls
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.