Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-34623 AP Performance sprint
  3. DM-34827

Identify larger DC2 dataset for timing tests

    XMLWordPrintable

    Details

    • Type: Technical task
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Sprint:
      AP S22-6 (May)
    • Team:
      Alert Production

      Description

      Identify a larger dataset from DC2 for bulk timing tests.

      Choose these from the the 5.5 year depth DC2 tract 4431 used by Kenneth Herner (e.g. in https://jira.lsstcorp.org/browse/PREOPS-631).

      We should focus on many visits instead of many detectors. Simon's work creating the rc2_subset HSC CI package could be a useful reference.

        Attachments

          Issue Links

            Activity

            Hide
            mrawls Meredith Rawls added a comment -

            Excellent, let's adopt those 272 datasets as our larger, uh, dataset. The fact that they all fully reside inside those four patches is excellent because it gives us good spatial overlap for testing association.

            We should think about where we want your "find calexps in patches" script to land, because it's very useful for AP.

            Since the 272 visit+detectors are quite the hodgepodge due to all the dithering, we should probably manually create a TAGGED collection in /repo/dc2 once instead of defining a long list of datasets each time. This will require an appropriate and/or clever name. I don't know how to create a TAGGED collection like this offhand, but I know it's possible; for example, Jim created "hits2014" and "hits2015" tagged collections in /repo/main/DECam for me.

            Show
            mrawls Meredith Rawls added a comment - Excellent, let's adopt those 272 datasets as our larger, uh, dataset. The fact that they all fully reside inside those four patches is excellent because it gives us good spatial overlap for testing association. We should think about where we want your "find calexps in patches" script to land, because it's very useful for AP. Since the 272 visit+detectors are quite the hodgepodge due to all the dithering, we should probably manually create a TAGGED collection in /repo/dc2  once instead of defining a long list of datasets each time. This will require an appropriate and/or clever name. I don't know how to create a TAGGED collection like this offhand, but I know it's possible; for example, Jim created "hits2014" and "hits2015" tagged collections in /repo/main/DECam for me.
            Hide
            mrawls Meredith Rawls added a comment -

            I created a TAGGED collection u/mrawls/DM-34827/raw/4patch_4431 with the datasets in the attached cdv file.

            I created a CHAINED collection u/mrawls/DM-34827/defaults/4patch_4431 containing the above as well as calibs, skymaps, and refcats.

            The script I used to do this is saved in /project/mrawls/ap_profile/tagged_chained_collections_create.py

             

            Show
            mrawls Meredith Rawls added a comment - I created a TAGGED collection u/mrawls/ DM-34827 /raw/4patch_4431 with the datasets in the attached cdv file. I created a CHAINED collection u/mrawls/ DM-34827 /defaults/4patch_4431 containing the above as well as calibs, skymaps, and refcats. The script I used to do this is saved in /project/mrawls/ap_profile/tagged_chained_collections_create.py .   
            Hide
            Parejkoj John Parejko added a comment -

            It doesn't really matter now, but I'd have opened the file with `astropy.table.Table.read()`; then you wouldn't have to parse the columns and could access them by name. Why did you use `u/parejkoj/profiling-DM-34825` as the collection to draw the raws from in your queryDatasets call, instead of Ken's `u/kherner/2.2i/runs/tract4431-w40`?

            For the record, the idea is "make a list of dataRefs, pass them to butler.registry.associate(collections, dataRefs) to tag them, call butler collection-chain to chain them with the various calibs etc.".

            Want to set off a run in the morning on DM-34828, with the ordering fix Krzysztof gave you, just to see how it goes?

            Show
            Parejkoj John Parejko added a comment - It doesn't really matter now, but I'd have opened the file with `astropy.table.Table.read()`; then you wouldn't have to parse the columns and could access them by name. Why did you use `u/parejkoj/profiling- DM-34825 ` as the collection to draw the raws from in your queryDatasets call, instead of Ken's `u/kherner/2.2i/runs/tract4431-w40`? For the record, the idea is "make a list of dataRefs, pass them to butler.registry.associate(collections, dataRefs) to tag them, call butler collection-chain to chain them with the various calibs etc.". Want to set off a run in the morning on DM-34828 , with the ordering fix Krzysztof gave you, just to see how it goes?
            Hide
            mrawls Meredith Rawls added a comment -

            I did basic string parsing since it was what I could remember how to do offhand, but good point re Astropy for next time. I used that collection since it chained along Ken's larger collection, so it was functionally equivalent. Thanks!

             

            Show
            mrawls Meredith Rawls added a comment - I did basic string parsing since it was what I could remember how to do offhand, but good point re Astropy for next time. I used that collection since it chained along Ken's larger collection, so it was functionally equivalent. Thanks!  
            Hide
            mrawls Meredith Rawls added a comment -

            Bonus afterthought - because there are zillions of miscellaneous datasets in Ken's run, I made a second TAGGED collection that contains just the goodSeeingCoadds we want to use as templates here. The procedure I followed is also in the tagged_chained_collections_create.py script mentioned previously.

            This collection is called u/mrawls/DM-34827/coadd/4patch_4431 and is intentionally NOT chained into the defaults, in case we ever want to build better templates.

            The idea is that an ApPipe run can use just "defaults" and "coadd" as the two input collections.

            Show
            mrawls Meredith Rawls added a comment - Bonus afterthought - because there are zillions of miscellaneous datasets in Ken's run, I made a second TAGGED collection that contains just the goodSeeingCoadds we want to use as templates here. The procedure I followed is also in the tagged_chained_collections_create.py script mentioned previously. This collection is called u/mrawls/ DM-34827 /coadd/4patch_4431 and is intentionally NOT chained into the defaults, in case we ever want to build better templates. The idea is that an ApPipe run can use just "defaults" and "coadd" as the two input collections.

              People

              Assignee:
              mrawls Meredith Rawls
              Reporter:
              ebellm Eric Bellm
              Reviewers:
              John Parejko
              Watchers:
              Eric Bellm, John Parejko, Meredith Rawls
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.