# Convert RC2 w_2020_38 to gen3 with w_2020_42 stack

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Team:
Data Facility
• Urgent?:
No

#### Description

The conversion was originally attempted with w_2020_40, but it failed. A schema was renamed from rc2w38_ssw40 to rc2w38_ssw42, and another attempt was made with the w_2020_42 stack.

#### Attachments

1. rc2w38ssw42_att3_success.log
299 kB
2. rc2w38ssw42.log
269 kB

#### Activity

Hide

Error during the conversion:

 INFO  2020-10-08T13:57:22.332-0500 convertRepo - Ingesting 207 ps1_pv3_3pi_20170110 datasets into run refcats. Traceback (most recent call last):   File "./gen3-hsc-rc2/bootstrap.py", line 354, in      main()   File "./gen3-hsc-rc2/bootstrap.py", line 350, in main     continue_=options.continue_, reruns=reruns)   File "./gen3-hsc-rc2/bootstrap.py", line 300, in run     visits=makeVisitList(tracts, filters)   File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/obs_base/20.0.0-52-g73d9071+9bf1eb8e0a/python/lsst/obs/base/gen2to3/convertRepo.py", line 559, in run     converter.ingest()   File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/obs_base/20.0.0-52-g73d9071+9bf1eb8e0a/python/lsst/obs/base/gen2to3/repoConverter.py", line 493, in ingest     run = self.getRun(datasetType.name, calibDate)   File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/obs_base/20.0.0-52-g73d9071+9bf1eb8e0a/python/lsst/obs/base/gen2to3/standardRepoConverter.py", line 202, in getRun     raise ValueError(f"No default run for repo at {self.root}, and no " ValueError: No default run for repo at /datasets/hsc/repo, and no override for dataset fgcmLookUpTable. 

Show
Monika Adamow added a comment - Error during the conversion:   INFO  2020-10-08T13:57:22.332-0500 convertRepo - Ingesting 207 ps1_pv3_3pi_20170110 datasets into run refcats. Traceback (most recent call last):   File "./gen3-hsc-rc2/bootstrap.py", line 354, in <module>     main()   File "./gen3-hsc-rc2/bootstrap.py", line 350, in main     continue_=options.continue_, reruns=reruns)   File "./gen3-hsc-rc2/bootstrap.py", line 300, in run     visits=makeVisitList(tracts, filters)   File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/obs_base/20.0.0-52-g73d9071+9bf1eb8e0a/python/lsst/obs/base/gen2to3/convertRepo.py", line 559, in run     converter.ingest()   File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/obs_base/20.0.0-52-g73d9071+9bf1eb8e0a/python/lsst/obs/base/gen2to3/repoConverter.py", line 493, in ingest     run = self.getRun(datasetType.name, calibDate)   File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/obs_base/20.0.0-52-g73d9071+9bf1eb8e0a/python/lsst/obs/base/gen2to3/standardRepoConverter.py", line 202, in getRun     raise ValueError(f"No default run for repo at {self.root}, and no " ValueError: No default run for repo at /datasets/hsc/repo, and no override for dataset fgcmLookUpTable.
Hide
Tim Jenness added a comment -

In ci_hsc_gen2 we have a config file for the conversion which sets a default collection for specific datasets:

 # This file contains overrides for obs.base.gen2to3.ConvertRepoTask to export # the jointcal_* datasets that are in the root of the Gen2 repo into a special # "HSC/external" RUN collection, since it doesn't make sense to put them any of # the other RUNs generated from that conversion. This doesn't go in the # obs_subaru config overrides because having those datasets in the root is # unique to ci_hsc_gen2.   from lsst.obs.subaru import HyperSuprimeCam   collection = HyperSuprimeCam.makeCollectionName("external") config.runs["jointcal_wcs"] = collection config.runs["jointcal_photoCalib"] = collection 

so maybe we need something like that for fgcmLookUpTable?

Show
Tim Jenness added a comment - In ci_hsc_gen2 we have a config file for the conversion which sets a default collection for specific datasets: # This file contains overrides for obs.base.gen2to3.ConvertRepoTask to export # the jointcal_* datasets that are in the root of the Gen2 repo into a special # "HSC/external" RUN collection, since it doesn't make sense to put them any of # the other RUNs generated from that conversion. This doesn't go in the # obs_subaru config overrides because having those datasets in the root is # unique to ci_hsc_gen2.   from lsst.obs.subaru import HyperSuprimeCam   collection = HyperSuprimeCam.makeCollectionName("external") config.runs["jointcal_wcs"] = collection config.runs["jointcal_photoCalib"] = collection so maybe we need something like that for fgcmLookUpTable?
Hide
Jim Bosch added a comment -

Eli Rykoff, I think the question we need answered here is basically how we'd expect the FGCM lookup table to get to Gen3 repos in a pure-Gen3 world, because that tells us what collection we should use in the converter to make it look more-or-less like it was produced in a pure-Gen3 world.

fgcmLookUpTable is something that will be produced by a PipelineTask running on as much of a survey as we can, but is then useful for calibrating even observations that were not in that set, right?

Is there one dataset per instrument [per FGCM run]?

Or is this something produced by modtran-ish things to make a file that some human has to manually ingest?

Show
Jim Bosch added a comment - Eli Rykoff , I think the question we need answered here is basically how we'd expect the FGCM lookup table to get to Gen3 repos in a pure-Gen3 world, because that tells us what collection we should use in the converter to make it look more-or-less like it was produced in a pure-Gen3 world. fgcmLookUpTable is something that will be produced by a PipelineTask running on as much of a survey as we can, but is then useful for calibrating even observations that were not in that set, right? Is there one dataset per instrument [per FGCM run] ? Or is this something produced by modtran-ish things to make a file that some human has to manually ingest?
Hide
Eli Rykoff added a comment -

Ah, so this error is the same thing I mentioned on slack: https://lsstc.slack.com/archives/C2JPT1KB7/p1601913036259400 where this used to be a WARN and now is an exception. I'm still not 100% behind that change...

Anyway, so specifically looking forward fgcmLookUpTable is a "curated calibration". It should be one per instrument, though it could be updated over time it is not a matter of a validity range, it's a choice for a given processing run.

In the case of RC2, there is actually a default lookup table in the root repo, but that assumes we have r2 and i2, which we don't have for RC2. So that's why the RC2 instructions have it regenerated within the rerun. I'm not sure what the best thing to do about this is except wait for RC3.

Show
Eli Rykoff added a comment - Ah, so this error is the same thing I mentioned on slack: https://lsstc.slack.com/archives/C2JPT1KB7/p1601913036259400 where this used to be a WARN and now is an exception. I'm still not 100% behind that change... Anyway, so specifically looking forward fgcmLookUpTable is a "curated calibration". It should be one per instrument, though it could be updated over time it is not a matter of a validity range, it's a choice for a given processing run. In the case of RC2 , there is actually a default lookup table in the root repo, but that assumes we have r2 and i2 , which we don't have for RC2 . So that's why the RC2 instructions have it regenerated within the rerun. I'm not sure what the best thing to do about this is except wait for RC3 .
Hide
Jim Bosch added a comment -

Interesting.  So it's a curated calibration - in the sense that we should put what we thing is the best one for an instrument in an obs_x_data package, even?  But it doesn't have a validity range, either because it has temporal dependence inside it or it doesn't actually depend on anything temporal?

Show
Jim Bosch added a comment - Interesting.  So it's a curated calibration - in the sense that we should put what we thing is the best one for an instrument in an obs_x_data package, even?  But it doesn't have a validity range, either because it has temporal dependence inside it or it doesn't actually depend on anything temporal?
Hide
Eli Rykoff added a comment -

It's quite large (a few Gb), is that okay for an obs_x_data package? But I don't know how these things are carried around...something to note, though, is that it can be regenerated for a particular repo without too much trouble, it just takes some time.

As for validity range, the only temporal dependence is implied (e.g., there would be an r and r2 filter which were installed at different times), but that is handled by the fgcm code itself.

What might change is that (say) we replace a filter (as in HSC); or we get better measurements of the mirror reflectivity or CCD QEs, etc.

In the future, though, it is possible that explicit temporal dependence is added, especially with regards to mirror reflectivity (I hope the CCD QE and filter throughputs don't change!) but that is not currently supported.

Show
Eli Rykoff added a comment - It's quite large (a few Gb), is that okay for an obs_x_data package? But I don't know how these things are carried around...something to note, though, is that it can be regenerated for a particular repo without too much trouble, it just takes some time. As for validity range, the only temporal dependence is implied (e.g., there would be an r and r2 filter which were installed at different times), but that is handled by the fgcm code itself. What might change is that (say) we replace a filter (as in HSC); or we get better measurements of the mirror reflectivity or CCD QEs, etc. In the future, though, it is possible that explicit temporal dependence is added, especially with regards to mirror reflectivity (I hope the CCD QE and filter throughputs don't change!) but that is not currently supported.
Hide
Tim Jenness added a comment -

Monika Adamow is blocked on this ticket. Is the simplest possible answer to add a line to the conversion configuration that declares a collection to use like we do with jointcal_wcs above?

Show
Tim Jenness added a comment - Monika Adamow is blocked on this ticket. Is the simplest possible answer to add a line to the conversion configuration that declares a collection to use like we do with jointcal_wcs above?
Hide
Eli Rykoff added a comment - - edited

All dataset types need to be listed explicitly in the conversion config as of w40, as I mentioned above. So this needs to be added to the conversion config, and anything else that wasn't explicitly mentioned. There will be more than just this, I'm sure this is just the first.

It used to be that it would warn and continue anyway.

I think that whether this is a curated calibration or what is just a red herring. Right now it's just a dataset, and it must be configured.

Show
Eli Rykoff added a comment - - edited All dataset types need to be listed explicitly in the conversion config as of w40, as I mentioned above. So this needs to be added to the conversion config, and anything else that wasn't explicitly mentioned. There will be more than just this, I'm sure this is just the first. It used to be that it would warn and continue anyway. I think that whether this is a curated calibration or what is just a red herring. Right now it's just a dataset, and it must be configured.
Hide
Jim Bosch added a comment -

So, best solution I can think of now is to put this in the "unbounded" collection run, and revisit this in the future (at least on DM-27147), in an obs_subaru config override:

 from lsst.obs.subaru import HyperSuprimeCam config.runs["fgcmLookUpTable"] = HyperSuprimeCam.makeUnboundedCalibrationRunName()

That's an awkward location to expect pipetask invocations to find this, and we should probably fix that by finding a way to

a) mark this as a calibration when we register the dataset type (currently I think ConvertRepoTask does that IFF it finds a dataset in a calibraiton repo)

b) certify it into CALIBRATION collections like HSC/calib with an unbounded validity range (currently something only done by writeCuratedCalibrations.

But we don't need any of those other steps to unblock RC2 Gen3 conversion, so maybe they should be another ticket; this does get cleaner if we decide to make this a regular curated calibration.  Treating it the same way as the HSC yBackground datasets might be another option, but I confess I don't actually remember exactly how we handle that.

Show
Jim Bosch added a comment - So, best solution I can think of now is to put this in the "unbounded" collection run, and revisit this in the future (at least on DM-27147 ), in an obs_subaru config override: from lsst.obs.subaru import HyperSuprimeCam config.runs[ "fgcmLookUpTable" ] = HyperSuprimeCam.makeUnboundedCalibrationRunName() That's an awkward location to expect pipetask invocations to find this, and we should probably fix that by finding a way to a) mark this as a calibration when we register the dataset type (currently I think ConvertRepoTask does that IFF it finds a dataset in a calibraiton repo) b) certify it into CALIBRATION collections like HSC/calib with an unbounded validity range (currently something only done by writeCuratedCalibrations . But we don't need any of those other steps to unblock RC2 Gen3 conversion, so maybe they should be another ticket; this does get cleaner if we decide to make this a regular curated calibration.  Treating it the same way as the HSC yBackground datasets might be another option, but I confess I don't actually remember exactly how we handle that.
Hide
Tim Jenness added a comment -

Jim Bosch so to be completely clear, you want those two lines to be added to the end of obs_subaru/config/hsc/convertRepo.py ?

Show
Tim Jenness added a comment - Jim Bosch so to be completely clear, you want those two lines to be added to the end of obs_subaru/config/hsc/convertRepo.py ?
Hide
Jim Bosch added a comment -

Yes, that's my proposal.

Show
Jim Bosch added a comment - Yes, that's my proposal.
Hide
Michelle Gower added a comment -

Since this started as an RC2 gen3 conversion ticket, please make a separate ticket for the other steps mentioned in the above comment.  We'll reassign this ticket to Monika Adamow.   She will make a ticket branch of obs_subaru and make this change and any other similar simple change that pops up while trying to run the conversion.

Show
Michelle Gower added a comment - Since this started as an RC2 gen3 conversion ticket, please make a separate ticket for the other steps mentioned in the above comment.  We'll reassign this ticket to Monika Adamow .   She will make a ticket branch of obs_subaru and make this change and any other similar simple change that pops up while trying to run the conversion.
Hide
Eli Rykoff added a comment -

The script itself patches the config as well: https://github.com/lsst-dm/gen3-hsc-rc2/blob/master/bootstrap.py#L236-L252

And this excludes the yBackground as best as I can tell.

I think that there will be other hiccups, if I'm reading the code and configs correctly, then the datasetIncludePatterns only fires if a rerun isn't specified. And if a rerun isn't specified we're going to have to explicitly include or exclude all dataset types, configs, etc, that are in the RC2 repo.

Show
Eli Rykoff added a comment - The script itself patches the config as well: https://github.com/lsst-dm/gen3-hsc-rc2/blob/master/bootstrap.py#L236-L252 And this excludes the yBackground as best as I can tell. I think that there will be other hiccups, if I'm reading the code and configs correctly, then the datasetIncludePatterns only fires if a rerun isn't specified. And if a rerun isn't specified we're going to have to explicitly include or exclude all dataset types, configs, etc, that are in the RC2 repo.
Hide
Jim Bosch added a comment -

I was originally thinking that obs_subaru was a better place for this config override than bootstrap.py because it'd be more general.  But now I see that bootstrap.py ignores yBackground in the configs precisely because it has other logic to add it later, and that logic is exactly the "extra steps" I referenced above.  So maybe with that as our model, we should apply the config override for FGCM there, too, and just get it all done now.  I take Michelle Gower's point that we don't want scope creep on a ticket like this, but now that I see that established pattern I think it's barely more difficult to just get it all done.  If there are no objections, I'll just put that on a branch of gen3-hsc-rc2 today (probably in about an hour and a half) and ask Monika Adamow to test it.

I think that there will be other hiccups, if I'm reading the code and configs correctly, then the datasetIncludePatterns only fires if a rerun isn't specified. And if a rerun isn't specified we're going to have to explicitly include or exclude all dataset types, configs, etc, that are in the RC2 repo.

This should only be true of datasets that are in a root Gen2 repo, and right now I still consider it a feature rather than a bug that we're forced to figure out what to do with any datasets people are putting there.  I might recant if we discover that the state of Gen2 repos in the wild is even more varied than I suspect, but I think there are a finite number of weird testdata repos in git LFS (which we've just about worked our way through), plus the big shared ones (NCSA, Princeton, NAOJ, NERSC, CCIN2P3) that are most standard and where non-standard datasets are a bigger problem anyway.

Show
Jim Bosch added a comment - I was originally thinking that obs_subaru was a better place for this config override than bootstrap.py because it'd be more general.  But now I see that bootstrap.py ignores yBackground in the configs precisely because it has other logic to add it later, and that logic is exactly the "extra steps" I referenced above.  So maybe with that as our model, we should apply the config override for FGCM there, too, and just get it all done now.  I take Michelle Gower 's point that we don't want scope creep on a ticket like this, but now that I see that established pattern I think it's barely more difficult to just get it all done.  If there are no objections, I'll just put that on a branch of gen3-hsc-rc2 today (probably in about an hour and a half) and ask Monika Adamow to test it. I think that there will be other hiccups, if I'm reading the code and configs correctly, then the datasetIncludePatterns only fires if a rerun isn't specified. And if a rerun isn't specified we're going to have to explicitly include or exclude all dataset types, configs, etc, that are in the RC2 repo. This should only be true of datasets that are in a root Gen2 repo, and right now I still consider it a feature rather than a bug that we're forced to figure out what to do with any datasets people are putting there.  I might recant if we discover that the state of Gen2 repos in the wild is even more varied than I suspect, but I think there are a finite number of weird testdata repos in git LFS (which we've just about worked our way through), plus the big shared ones (NCSA, Princeton, NAOJ, NERSC, CCIN2P3) that are most standard and where non-standard datasets are a bigger problem anyway.
Hide
Jim Bosch added a comment -

Monika Adamow, I've just pushed the most minimal change that should fix this (just ignoring this dataset) to branch u/jbosch/DM-27113 of gen3-hsc-rc2, and I think it's ready for testing again.

I'll open a new ticket for a more complete fix; after looking a bit more, doing that well will take more work than belongs on this ticket.

Show
Jim Bosch added a comment - Monika Adamow , I've just pushed the most minimal change that should fix this (just ignoring this dataset) to branch u/jbosch/ DM-27113 of gen3-hsc-rc2, and I think it's ready for testing again. I'll open a new ticket for a more complete fix; after looking a bit more, doing that well will take more work than belongs on this ticket.
Hide

The fix in u/jbosch/DM-27113 worked, but the conversion failed with another error. After discussing it with Michelle Gower, we decided to try with w_2020_42 stack (schema was renamed). It failed with the same error. Log file is attached to this ticket (rc2w38_ssw42.log).

Show
Monika Adamow added a comment - The fix in u/jbosch/ DM-27113  worked, but the conversion failed with another error. After discussing it with  Michelle Gower , we decided to try with w_2020_42 stack (schema was renamed). It failed with the same error. Log file is attached to this ticket (rc2w38_ssw42.log).
Hide
Tim Jenness added a comment -

There is an error about a missing run name at the top but I assume that's not important.

This is the other error:

 INFO 2020-10-20T16:39:02.262-0500 convertRepo - Ingesting 1 deepCoadd_skyMap dataset into run skymaps. Traceback (most recent call last):  File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1278, in _execute_context  cursor, statement, parameters, context  File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute  cursor.execute(statement, parameters) psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "dataset_tags_0008000000_unq_dataset_type_id_collection_5cf81e3b" DETAIL: Key (dataset_type_id, collection_id, skymap)=(16, 238, hsc_rings_v1) already exists.     The above exception was the direct cause of the following exception:   Traceback (most recent call last):  File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_butler/19.0.0-174-g77e5f269+ff10c6d78d/python/lsst/daf/butler/registry/_registry.py", line 671, in insertDatasets  refs = list(storage.insert(runRecord, expandedDataIds))  File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_butler/19.0.0-174-g77e5f269+ff10c6d78d/python/lsst/daf/butler/registry/datasets/byDimensions/_storage.py", line 83, in insert  self._db.insert(self._tags, *tagsRows)  File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_butler/19.0.0-174-g77e5f269+ff10c6d78d/python/lsst/daf/butler/registry/interfaces/_database.py", line 1222, in insert  self._connection.execute(table.insert(), *rows)  File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1014, in execute  return meth(self, multiparams, params)  File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection  return connection._execute_clauseelement(self, multiparams, params)  File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement  distilled_params,  File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1318, in _execute_context  e, statement, parameters, cursor, context  File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1512, in _handle_dbapi_exception  sqlalchemy_exception, with_traceback=exc_info[2], from_=e  File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_  raise exception  File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1278, in _execute_context  cursor, statement, parameters, context  File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute  cursor.execute(statement, parameters) sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "dataset_tags_0008000000_unq_dataset_type_id_collection_5cf81e3b" DETAIL: Key (dataset_type_id, collection_id, skymap)=(16, 238, hsc_rings_v1) already exists.   [SQL: INSERT INTO rc2w38_ssw42.dataset_tags_0008000000 (dataset_type_id, dataset_id, collection_id, skymap) VALUES (%(dataset_type_id)s, %(dataset_id)s, %(collection_id)s, %(skymap)s)] [parameters: {'dataset_type_id': 16, 'dataset_id': 439257, 'collection_id': 238, 'skymap': 'hsc_rings_v1'}] (Background on this error at: http://sqlalche.me/e/13/gkpj)   The above exception was the direct cause of the following exception:   Traceback (most recent call last):  File "./gen3-hsc-rc2/bootstrap.py", line 356, in   main()  File "./gen3-hsc-rc2/bootstrap.py", line 352, in main  continue_=options.continue_, reruns=reruns)  File "./gen3-hsc-rc2/bootstrap.py", line 302, in run  visits=makeVisitList(tracts, filters)  File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/obs_base/20.0.0-58-g47b63df+0e9af1ef10/python/lsst/obs/base/gen2to3/convertRepo.py", line 570, in run  converter.ingest()  File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/obs_base/20.0.0-58-g47b63df+0e9af1ef10/python/lsst/obs/base/gen2to3/repoConverter.py", line 505, in ingest  run=run)  File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_butler/19.0.0-174-g77e5f269+ff10c6d78d/python/lsst/daf/butler/core/utils.py", line 261, in inner  return func(self, *args, **kwargs)  File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_butler/19.0.0-174-g77e5f269+ff10c6d78d/python/lsst/daf/butler/_butler.py", line 1239, in ingest  run=run)  File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_butler/19.0.0-174-g77e5f269+ff10c6d78d/python/lsst/daf/butler/core/utils.py", line 261, in inner  return func(self, *args, **kwargs)  File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_butler/19.0.0-174-g77e5f269+ff10c6d78d/python/lsst/daf/butler/registry/_registry.py", line 678, in insertDatasets  f"dimension row is missing.") from err lsst.daf.butler.registry._exceptions.ConflictingDefinitionError: A database constraint failure was triggered by inserting one or more datasets of type DatasetType('deepCoadd_skyMap', {skymap}, SkyMap) into collection 'skymaps'. This probably means a dataset with the same data ID and dataset type already exists, but it may also mean a dimension row is missing. + ./query_results.py postgresql://madamow@lsst-pg-prod1.ncsa.illinois.edu:5432/lsstdb1 rc2w38_ssw42 select count(*), c.name from rc2w38_ssw42.run r, rc2w38_ssw42.collection c, rc2w38_ssw42.dataset ds where ds.run_id=r.collection_id and r.collection_id=c.collection_id group by c.name order by c.name 

Show
Tim Jenness added a comment - There is an error about a missing run name at the top but I assume that's not important. This is the other error: INFO 2020-10-20T16:39:02.262-0500 convertRepo - Ingesting 1 deepCoadd_skyMap dataset into run skymaps. Traceback (most recent call last): File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1278, in _execute_context cursor, statement, parameters, context File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute cursor.execute(statement, parameters) psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "dataset_tags_0008000000_unq_dataset_type_id_collection_5cf81e3b" DETAIL: Key (dataset_type_id, collection_id, skymap)=(16, 238, hsc_rings_v1) already exists.     The above exception was the direct cause of the following exception:   Traceback (most recent call last): File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_butler/19.0.0-174-g77e5f269+ff10c6d78d/python/lsst/daf/butler/registry/_registry.py", line 671, in insertDatasets refs = list(storage.insert(runRecord, expandedDataIds)) File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_butler/19.0.0-174-g77e5f269+ff10c6d78d/python/lsst/daf/butler/registry/datasets/byDimensions/_storage.py", line 83, in insert self._db.insert(self._tags, *tagsRows) File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_butler/19.0.0-174-g77e5f269+ff10c6d78d/python/lsst/daf/butler/registry/interfaces/_database.py", line 1222, in insert self._connection.execute(table.insert(), *rows) File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1014, in execute return meth(self, multiparams, params) File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement distilled_params, File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1318, in _execute_context e, statement, parameters, cursor, context File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1512, in _handle_dbapi_exception sqlalchemy_exception, with_traceback=exc_info[2], from_=e File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_ raise exception File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1278, in _execute_context cursor, statement, parameters, context File "/software/lsstsw/stack_20200922/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "dataset_tags_0008000000_unq_dataset_type_id_collection_5cf81e3b" DETAIL: Key (dataset_type_id, collection_id, skymap)=(16, 238, hsc_rings_v1) already exists.   [SQL: INSERT INTO rc2w38_ssw42.dataset_tags_0008000000 (dataset_type_id, dataset_id, collection_id, skymap) VALUES (%(dataset_type_id)s, %(dataset_id)s, %(collection_id)s, %(skymap)s)] [parameters: {'dataset_type_id': 16, 'dataset_id': 439257, 'collection_id': 238, 'skymap': 'hsc_rings_v1'}] (Background on this error at: http://sqlalche.me/e/13/gkpj)   The above exception was the direct cause of the following exception:   Traceback (most recent call last): File "./gen3-hsc-rc2/bootstrap.py", line 356, in <module> main() File "./gen3-hsc-rc2/bootstrap.py", line 352, in main continue_=options.continue_, reruns=reruns) File "./gen3-hsc-rc2/bootstrap.py", line 302, in run visits=makeVisitList(tracts, filters) File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/obs_base/20.0.0-58-g47b63df+0e9af1ef10/python/lsst/obs/base/gen2to3/convertRepo.py", line 570, in run converter.ingest() File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/obs_base/20.0.0-58-g47b63df+0e9af1ef10/python/lsst/obs/base/gen2to3/repoConverter.py", line 505, in ingest run=run) File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_butler/19.0.0-174-g77e5f269+ff10c6d78d/python/lsst/daf/butler/core/utils.py", line 261, in inner return func(self, *args, **kwargs) File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_butler/19.0.0-174-g77e5f269+ff10c6d78d/python/lsst/daf/butler/_butler.py", line 1239, in ingest run=run) File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_butler/19.0.0-174-g77e5f269+ff10c6d78d/python/lsst/daf/butler/core/utils.py", line 261, in inner return func(self, *args, **kwargs) File "/software/lsstsw/stack_20200922/stack/miniconda3-py37_4.8.2-cb4e2dc/Linux64/daf_butler/19.0.0-174-g77e5f269+ff10c6d78d/python/lsst/daf/butler/registry/_registry.py", line 678, in insertDatasets f"dimension row is missing.") from err lsst.daf.butler.registry._exceptions.ConflictingDefinitionError: A database constraint failure was triggered by inserting one or more datasets of type DatasetType('deepCoadd_skyMap', {skymap}, SkyMap) into collection 'skymaps'. This probably means a dataset with the same data ID and dataset type already exists, but it may also mean a dimension row is missing. + ./query_results.py postgresql://madamow@lsst-pg-prod1.ncsa.illinois.edu:5432/lsstdb1 rc2w38_ssw42 select count(*), c.name from rc2w38_ssw42.run r, rc2w38_ssw42.collection c, rc2w38_ssw42.dataset ds where ds.run_id=r.collection_id and r.collection_id=c.collection_id group by c.name order by c.name
Hide
Jim Bosch added a comment -

I have a theory: there are equivalent deepCoadd_skyMap datasets in multiple Gen2 repos, we have no code dedicating to deduplicating them, and a

override is telling the converter to put them all in one Gen3 collection. I think we want some of those to land in rerun-named collections instead, which means that override is active somewhere it shouldn't be in the converter code. But I'd like to poke around the Gen2 repos and conversion code to test that theory to be sure.

Show
Jim Bosch added a comment - I have a theory: there are equivalent deepCoadd_skyMap datasets in multiple Gen2 repos, we have no code dedicating to deduplicating them, and a config.runs ["deepCoadd_skyMap"] override is telling the converter to put them all in one Gen3 collection. I think we want some of those to land in rerun-named collections instead, which means that override is active somewhere it shouldn't be in the converter code. But I'd like to poke around the Gen2 repos and conversion code to test that theory to be sure.
Hide
Jim Bosch added a comment -

I have a potential fix on branch u/jbosch/DM-27113 of obs_base. Monika Adamow, could you test that? If it works, we should make sure it gets through Jenkins ci_hsc before merging it.

Show
Jim Bosch added a comment - I have a potential fix on branch u/jbosch/ DM-27113 of obs_base. Monika Adamow , could you test that? If it works, we should make sure it gets through Jenkins ci_hsc before merging it.
Hide

Jim Bosch one problem fixed, another popped out.

 INFO  2020-10-22T16:29:42.040-0500 convertRepo - Calibration validity gap closed from 2017-09-04 00:00:00.000 to 2017-09-05 00:00:00.000 Traceback (most recent call last):   File "./gen3-hsc-rc2/bootstrap.py", line 356, in      main()   File "./gen3-hsc-rc2/bootstrap.py", line 352, in main     continue_=options.continue_, reruns=reruns)   File "./gen3-hsc-rc2/bootstrap.py", line 302, in run     visits=makeVisitList(tracts, filters)   File "/scratch/madamow/rc2w38_convert_w42/obs_base/python/lsst/obs/base/gen2to3/convertRepo.py", line 583, in run     chain.append(spec.parent) AttributeError: 'Rerun' object has no attribute 'parent' 

Show
Monika Adamow added a comment - Jim Bosch one problem fixed, another popped out.   INFO  2020-10-22T16:29:42.040-0500 convertRepo - Calibration validity gap closed from 2017-09-04 00:00:00.000 to 2017-09-05 00:00:00.000 Traceback (most recent call last):   File "./gen3-hsc-rc2/bootstrap.py", line 356, in <module>     main()   File "./gen3-hsc-rc2/bootstrap.py", line 352, in main     continue_=options.continue_, reruns=reruns)   File "./gen3-hsc-rc2/bootstrap.py", line 302, in run     visits=makeVisitList(tracts, filters)   File "/scratch/madamow/rc2w38_convert_w42/obs_base/python/lsst/obs/base/gen2to3/convertRepo.py", line 583, in run     chain.append(spec.parent) AttributeError: 'Rerun' object has no attribute 'parent'
Hide
Jim Bosch added a comment -

I've pushed another commit to the u/jbosch/DM-27113 branch of obs_base that will hopefully fix that one.

Show
Jim Bosch added a comment - I've pushed another commit to the u/jbosch/ DM-27113 branch of obs_base that will hopefully fix that one.
Hide

Thanks Jim Bosch! The conversion is complete.

Show
Monika Adamow added a comment - Thanks Jim Bosch ! The conversion is complete.

#### People

Assignee:
Reporter:
Watchers:
Eli Rykoff, Jim Bosch, Michelle Gower, Monika Adamow, Tim Jenness