Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-29543

Gen3 refcat converter

    XMLWordPrintable

Details

    • 10
    • AP F21-4 (September)
    • Alert Production
    • No

    Description

      Currently, the only way to get a gen3 refcat is to ingest it into a gen2 repo an then convert it to gen3. IngestIndexedReferenceTask is a CmdLineTask, and we need a PipelineTask refcat ingester. Much of the internals are the same, so we can probably do it with a shared base class, but gen3 has additional information loaded into the registry about the shards, etc. that will take some care. The multiprocessing manager code, IngestIndexManager, may not be compatible with the gen3 multiprocessing system: it's designed to do all of its own multiprocessing internally, without an external controller.

      These docs about how to ingest refcat data will probably be useful in putting together the gen3 code: https://pipelines.lsst.io/modules/lsst.meas.algorithms/creating-a-reference-catalog.html

      I also wrote a custom Gaia ingester (IngestGaiaManager and IngestGaiaReferenceTask) to handle the gaia fluxes, which might be able to work "for free" with the new system.

      Attachments

        Issue Links

          Activity

            tjenness Tim Jenness added a comment -

            Is the sprint label for this accurate (July). I'm really looking forward to being able to stop relying on gen2 for gen3.

            tjenness Tim Jenness added a comment - Is the sprint label for this accurate (July). I'm really looking forward to being able to stop relying on gen2 for gen3.
            tjenness Tim Jenness added a comment -

            Parejkoj will this be done as a new `butler` subcommand for gen3, much like `ingest-raws` and `make-discrete-skymaps` subcommands use `Task` underneath? pipe_tasks already has the necessary infrastructure for registering a butler subcommand but meas_algorithms does not.

            tjenness Tim Jenness added a comment - Parejkoj will this be done as a new `butler` subcommand for gen3, much like `ingest-raws` and `make-discrete-skymaps` subcommands use `Task` underneath? pipe_tasks already has the necessary infrastructure for registering a butler subcommand but meas_algorithms does not.

            tjenness: Good question. I hadn't considered that aspect of it, I was just looking at how the underlying IngestIndexedReferenceTask code would have to change. It would probably make sense to have it as a butler subcommand, since it functions much the same as `ingest-raws`. I'm not sure even the existing code was a good fit for `meas_algorithms`: the only real meas_alg dependency is LoadReferenceObjectsTask.makeMinimalSchema. I'll have to think of whether to move it as part of this ticket, or whether to create the necessary butler subcommand infrastructure in pipe_tasks and just call things in meas_algorithms.

            Parejkoj John Parejko added a comment - tjenness : Good question. I hadn't considered that aspect of it, I was just looking at how the underlying IngestIndexedReferenceTask code would have to change. It would probably make sense to have it as a butler subcommand, since it functions much the same as `ingest-raws`. I'm not sure even the existing code was a good fit for `meas_algorithms`: the only real meas_alg dependency is LoadReferenceObjectsTask.makeMinimalSchema . I'll have to think of whether to move it as part of this ticket, or whether to create the necessary butler subcommand infrastructure in pipe_tasks and just call things in meas_algorithms.
            tjenness Tim Jenness added a comment -

            Parejkoj and I discussed this during PCW last week. The outcome of that was a plan of:

            • Remove butler completely from the command that takes an external catalogue and converts it to pipeline-native refcats.
            • During that conversion create a CSV index file mapping file name to gen3 data Id.
            • Use butler register-dataset-type command to create the new refcat dataset type.
            • Run butler ingest-files to ingest the refcat using the CSV index file.

            Currently it seems that we put all refcats in a refcat collection and use a new dataset type (with version in name) per catalog.

            tjenness Tim Jenness added a comment - Parejkoj and I discussed this during PCW last week. The outcome of that was a plan of: Remove butler completely from the command that takes an external catalogue and converts it to pipeline-native refcats. During that conversion create a CSV index file mapping file name to gen3 data Id. Use butler register-dataset-type command to create the new refcat dataset type. Run butler ingest-files to ingest the refcat using the CSV index file. Currently it seems that we put all refcats in a refcat collection and use a new dataset type (with version in name) per catalog.
            Parejkoj John Parejko added a comment - Jenkins run (pre-squashing my many commmits): https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/34914/pipeline

            erykoff: I hope you're willing to tackle this large review? It's several hundred lines, but that includes extensive updates to the documentation. Doing this review should help familiarize you with how we expect refcats to work in gen3, so you can see whether you'll need to make changes to fgcmcal (I expect that if there are any, they'll be small.

            I've uploaded the built docs, because jenkins won't build the full pipelines docs on a non-pipelines.lsst.io ticket branch.

            https://lsst.ncsa.illinois.edu/~parejkoj/DM-29543/html/lsst.meas.algorithms/creating-a-reference-catalog.html

            Parejkoj John Parejko added a comment - erykoff : I hope you're willing to tackle this large review? It's several hundred lines, but that includes extensive updates to the documentation. Doing this review should help familiarize you with how we expect refcats to work in gen3, so you can see whether you'll need to make changes to fgcmcal (I expect that if there are any, they'll be small. I've uploaded the built docs, because jenkins won't build the full pipelines docs on a non-pipelines.lsst.io ticket branch. https://lsst.ncsa.illinois.edu/~parejkoj/DM-29543/html/lsst.meas.algorithms/creating-a-reference-catalog.html
            Parejkoj John Parejko added a comment - - edited Post-review Jenkins: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/34983/pipeline/

            People

              Parejkoj John Parejko
              Parejkoj John Parejko
              Eli Rykoff
              Dan Taranu, Eli Rykoff, Gregory Dubois-Felsmann, Ian Sullivan, James Chiang, Jim Bosch, John Parejko, Krzysztof Findeisen, Lee Kelvin, Meredith Rawls, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.