# Gen3 refcat converter

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
10
• Sprint:
AP F21-4 (September)
• Team:
• Urgent?:
No

#### Description

Currently, the only way to get a gen3 refcat is to ingest it into a gen2 repo an then convert it to gen3. IngestIndexedReferenceTask is a CmdLineTask, and we need a PipelineTask refcat ingester. Much of the internals are the same, so we can probably do it with a shared base class, but gen3 has additional information loaded into the registry about the shards, etc. that will take some care. The multiprocessing manager code, IngestIndexManager, may not be compatible with the gen3 multiprocessing system: it's designed to do all of its own multiprocessing internally, without an external controller.

These docs about how to ingest refcat data will probably be useful in putting together the gen3 code: https://pipelines.lsst.io/modules/lsst.meas.algorithms/creating-a-reference-catalog.html

I also wrote a custom Gaia ingester (IngestGaiaManager and IngestGaiaReferenceTask) to handle the gaia fluxes, which might be able to work "for free" with the new system.

#### Activity

Hide
Tim Jenness added a comment -

Is the sprint label for this accurate (July). I'm really looking forward to being able to stop relying on gen2 for gen3.

Show
Tim Jenness added a comment - Is the sprint label for this accurate (July). I'm really looking forward to being able to stop relying on gen2 for gen3.
Hide
Tim Jenness added a comment -

John Parejko will this be done as a new butler subcommand for gen3, much like ingest-raws and make-discrete-skymaps subcommands use Task underneath? pipe_tasks already has the necessary infrastructure for registering a butler subcommand but meas_algorithms does not.

Show
Tim Jenness added a comment - John Parejko will this be done as a new butler subcommand for gen3, much like ingest-raws and make-discrete-skymaps subcommands use Task underneath? pipe_tasks already has the necessary infrastructure for registering a butler subcommand but meas_algorithms does not.
Hide
John Parejko added a comment -

Tim Jenness: Good question. I hadn't considered that aspect of it, I was just looking at how the underlying IngestIndexedReferenceTask code would have to change. It would probably make sense to have it as a butler subcommand, since it functions much the same as ingest-raws. I'm not sure even the existing code was a good fit for meas_algorithms: the only real meas_alg dependency is LoadReferenceObjectsTask.makeMinimalSchema. I'll have to think of whether to move it as part of this ticket, or whether to create the necessary butler subcommand infrastructure in pipe_tasks and just call things in meas_algorithms.

Show
John Parejko added a comment - Tim Jenness : Good question. I hadn't considered that aspect of it, I was just looking at how the underlying IngestIndexedReferenceTask code would have to change. It would probably make sense to have it as a butler subcommand, since it functions much the same as ingest-raws. I'm not sure even the existing code was a good fit for meas_algorithms: the only real meas_alg dependency is LoadReferenceObjectsTask.makeMinimalSchema . I'll have to think of whether to move it as part of this ticket, or whether to create the necessary butler subcommand infrastructure in pipe_tasks and just call things in meas_algorithms.
Hide
Tim Jenness added a comment -

John Parejko and I discussed this during PCW last week. The outcome of that was a plan of:

• Remove butler completely from the command that takes an external catalogue and converts it to pipeline-native refcats.
• During that conversion create a CSV index file mapping file name to gen3 data Id.
• Use butler register-dataset-type command to create the new refcat dataset type.
• Run butler ingest-files to ingest the refcat using the CSV index file.

Currently it seems that we put all refcats in a refcat collection and use a new dataset type (with version in name) per catalog.

Show
Tim Jenness added a comment - John Parejko and I discussed this during PCW last week. The outcome of that was a plan of: Remove butler completely from the command that takes an external catalogue and converts it to pipeline-native refcats. During that conversion create a CSV index file mapping file name to gen3 data Id. Use butler register-dataset-type command to create the new refcat dataset type. Run butler ingest-files to ingest the refcat using the CSV index file. Currently it seems that we put all refcats in a refcat collection and use a new dataset type (with version in name) per catalog.
Hide
John Parejko added a comment -
Show
John Parejko added a comment - Jenkins run (pre-squashing my many commmits): https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/34914/pipeline
Hide
John Parejko added a comment -

Eli Rykoff: I hope you're willing to tackle this large review? It's several hundred lines, but that includes extensive updates to the documentation. Doing this review should help familiarize you with how we expect refcats to work in gen3, so you can see whether you'll need to make changes to fgcmcal (I expect that if there are any, they'll be small.

I've uploaded the built docs, because jenkins won't build the full pipelines docs on a non-pipelines.lsst.io ticket branch.

https://lsst.ncsa.illinois.edu/~parejkoj/DM-29543/html/lsst.meas.algorithms/creating-a-reference-catalog.html

Show
John Parejko added a comment - Eli Rykoff : I hope you're willing to tackle this large review? It's several hundred lines, but that includes extensive updates to the documentation. Doing this review should help familiarize you with how we expect refcats to work in gen3, so you can see whether you'll need to make changes to fgcmcal (I expect that if there are any, they'll be small. I've uploaded the built docs, because jenkins won't build the full pipelines docs on a non-pipelines.lsst.io ticket branch. https://lsst.ncsa.illinois.edu/~parejkoj/DM-29543/html/lsst.meas.algorithms/creating-a-reference-catalog.html
Hide
John Parejko added a comment - - edited
Show
John Parejko added a comment - - edited Post-review Jenkins: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/34983/pipeline/

#### People

Assignee:
John Parejko
Reporter:
John Parejko
Reviewers:
Eli Rykoff
Watchers:
Dan Taranu, Eli Rykoff, Gregory Dubois-Felsmann, Ian Sullivan, James Chiang, Jim Bosch, John Parejko, Krzysztof Findeisen, Lee Kelvin, Meredith Rawls, Tim Jenness