Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-29543

Gen3 refcat converter

    XMLWordPrintable

    Details

      Description

      Currently, the only way to get a gen3 refcat is to ingest it into a gen2 repo an then convert it to gen3. IngestIndexedReferenceTask is a CmdLineTask, and we need a PipelineTask refcat ingester. Much of the internals are the same, so we can probably do it with a shared base class, but gen3 has additional information loaded into the registry about the shards, etc. that will take some care. The multiprocessing manager code, IngestIndexManager, may not be compatible with the gen3 multiprocessing system: it's designed to do all of its own multiprocessing internally, without an external controller.

      These docs about how to ingest refcat data will probably be useful in putting together the gen3 code: https://pipelines.lsst.io/modules/lsst.meas.algorithms/creating-a-reference-catalog.html

      I also wrote a custom Gaia ingester (IngestGaiaManager and IngestGaiaReferenceTask) to handle the gaia fluxes, which might be able to work "for free" with the new system.

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            Is the sprint label for this accurate (July). I'm really looking forward to being able to stop relying on gen2 for gen3.

            Show
            tjenness Tim Jenness added a comment - Is the sprint label for this accurate (July). I'm really looking forward to being able to stop relying on gen2 for gen3.
            Hide
            tjenness Tim Jenness added a comment -

            John Parejko will this be done as a new `butler` subcommand for gen3, much like `ingest-raws` and `make-discrete-skymaps` subcommands use `Task` underneath? pipe_tasks already has the necessary infrastructure for registering a butler subcommand but meas_algorithms does not.

            Show
            tjenness Tim Jenness added a comment - John Parejko will this be done as a new `butler` subcommand for gen3, much like `ingest-raws` and `make-discrete-skymaps` subcommands use `Task` underneath? pipe_tasks already has the necessary infrastructure for registering a butler subcommand but meas_algorithms does not.
            Hide
            Parejkoj John Parejko added a comment -

            Tim Jenness: Good question. I hadn't considered that aspect of it, I was just looking at how the underlying IngestIndexedReferenceTask code would have to change. It would probably make sense to have it as a butler subcommand, since it functions much the same as `ingest-raws`. I'm not sure even the existing code was a good fit for `meas_algorithms`: the only real meas_alg dependency is LoadReferenceObjectsTask.makeMinimalSchema. I'll have to think of whether to move it as part of this ticket, or whether to create the necessary butler subcommand infrastructure in pipe_tasks and just call things in meas_algorithms.

            Show
            Parejkoj John Parejko added a comment - Tim Jenness : Good question. I hadn't considered that aspect of it, I was just looking at how the underlying IngestIndexedReferenceTask code would have to change. It would probably make sense to have it as a butler subcommand, since it functions much the same as `ingest-raws`. I'm not sure even the existing code was a good fit for `meas_algorithms`: the only real meas_alg dependency is LoadReferenceObjectsTask.makeMinimalSchema . I'll have to think of whether to move it as part of this ticket, or whether to create the necessary butler subcommand infrastructure in pipe_tasks and just call things in meas_algorithms.
            Hide
            tjenness Tim Jenness added a comment -

            John Parejko and I discussed this during PCW last week. The outcome of that was a plan of:

            • Remove butler completely from the command that takes an external catalogue and converts it to pipeline-native refcats.
            • During that conversion create a CSV index file mapping file name to gen3 data Id.
            • Use butler register-dataset-type command to create the new refcat dataset type.
            • Run butler ingest-files to ingest the refcat using the CSV index file.

            Currently it seems that we put all refcats in a refcat collection and use a new dataset type (with version in name) per catalog.

            Show
            tjenness Tim Jenness added a comment - John Parejko and I discussed this during PCW last week. The outcome of that was a plan of: Remove butler completely from the command that takes an external catalogue and converts it to pipeline-native refcats. During that conversion create a CSV index file mapping file name to gen3 data Id. Use butler register-dataset-type command to create the new refcat dataset type. Run butler ingest-files to ingest the refcat using the CSV index file. Currently it seems that we put all refcats in a refcat collection and use a new dataset type (with version in name) per catalog.
            Hide
            Parejkoj John Parejko added a comment -
            Show
            Parejkoj John Parejko added a comment - Jenkins run (pre-squashing my many commmits): https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/34914/pipeline
            Hide
            Parejkoj John Parejko added a comment -

            Eli Rykoff: I hope you're willing to tackle this large review? It's several hundred lines, but that includes extensive updates to the documentation. Doing this review should help familiarize you with how we expect refcats to work in gen3, so you can see whether you'll need to make changes to fgcmcal (I expect that if there are any, they'll be small.

            I've uploaded the built docs, because jenkins won't build the full pipelines docs on a non-pipelines.lsst.io ticket branch.

            https://lsst.ncsa.illinois.edu/~parejkoj/DM-29543/html/lsst.meas.algorithms/creating-a-reference-catalog.html

            Show
            Parejkoj John Parejko added a comment - Eli Rykoff : I hope you're willing to tackle this large review? It's several hundred lines, but that includes extensive updates to the documentation. Doing this review should help familiarize you with how we expect refcats to work in gen3, so you can see whether you'll need to make changes to fgcmcal (I expect that if there are any, they'll be small. I've uploaded the built docs, because jenkins won't build the full pipelines docs on a non-pipelines.lsst.io ticket branch. https://lsst.ncsa.illinois.edu/~parejkoj/DM-29543/html/lsst.meas.algorithms/creating-a-reference-catalog.html
            Show
            Parejkoj John Parejko added a comment - - edited Post-review Jenkins: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/34983/pipeline/

              People

              Assignee:
              Parejkoj John Parejko
              Reporter:
              Parejkoj John Parejko
              Reviewers:
              Eli Rykoff
              Watchers:
              Dan Taranu, Eli Rykoff, Gregory Dubois-Felsmann, Ian Sullivan, James Chiang, Jim Bosch, John Parejko, Krzysztof Findeisen, Lee Kelvin, Meredith Rawls, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  CI Builds

                  No builds found.