# Move patch/tract and config mapping definitions to daf_butlerUtils

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
4
• Sprint:
DRP F16-2, DRP F16-3, DRP F16-4
• Team:
Data Release Production

#### Description

Implement RFC-204 by adding new entries for all patch/tract and config mapping definitions to .yaml files in daf_butlerUtils, and removing any such entries that are identical to the common ones from .paf files in obs* packages.

I think the "common" entry can usually be defined by consensus between any two of obs_cfht, obs_decam, and obs_subaru (and frequently all three). If there are any patch/tract or config datasets for which no two cameras agree, I think we should use obs_subaru's definitions (but I doubt there are any such cases).

Entries that are not identical to the common ones should not be removed on this issue (that should make this change entirely backwards compatible), but should be documented in new per-camera issues for later standardization.

#### Attachments

1. changes.summary
3 kB
2. hscdiffs.2.tar
100 kB
3. hscdiffs.tar
90 kB
4. mapper.diffs
5 kB
5. movekeys.yaml
1 kB
6. obskeys.yaml
0.7 kB
7. potential.mod.lists.tar
40 kB

#### Activity

Hide
Jim Bosch added a comment -

To verify that these changes are correct (since unit test coverage of the mappers is not good), it'd be a good idea to introspect each of the mappers to get a list of all of their datasets and their template paths, dump this to a file, and then check that the list is the same after the changes have been made (note that we probably don't care about sort order in this comparison, so it probably makes sense to sort the datasets somehow before dumping them to files). I'm not entirely sure what the interface is for doing that introspection, but I think doing DM-6858 first (since that blocks this) will provide at least a hint for that.

Show
Jim Bosch added a comment - To verify that these changes are correct (since unit test coverage of the mappers is not good), it'd be a good idea to introspect each of the mappers to get a list of all of their datasets and their template paths, dump this to a file, and then check that the list is the same after the changes have been made (note that we probably don't care about sort order in this comparison, so it probably makes sense to sort the datasets somehow before dumping them to files). I'm not entirely sure what the interface is for doing that introspection, but I think doing DM-6858 first (since that blocks this) will provide at least a hint for that.
Hide
Perry Gee added a comment -

Nate Pease [X] Is dafButlerUtils ready for this work to be done?

Show
Perry Gee added a comment - Nate Pease [X] Is dafButlerUtils ready for this work to be done?
Hide
Jim Bosch added a comment -

Perry Gee I think the only blocker is DM-6858. Nate Pease [X] may provide some code to make this work more convenient on another issue (I'm thinking of code that reads in .paf and dumps .yaml), but I don't think that's a requirement, and I'm not sure how helpful it would actually be since most of the work is probably in inspecting the various obs* packages to see whether they're consistent.

Show
Jim Bosch added a comment - Perry Gee I think the only blocker is DM-6858 . Nate Pease [X] may provide some code to make this work more convenient on another issue (I'm thinking of code that reads in .paf and dumps .yaml), but I don't think that's a requirement, and I'm not sure how helpful it would actually be since most of the work is probably in inspecting the various obs* packages to see whether they're consistent.
Hide
Perry Gee added a comment -

Nate told me at the conference that he was working on the yaml file conversion. I was trying to find out what his status was.

I am also waiting on further explanation from you about DM-6858. I will write more there.

Show
Perry Gee added a comment - Nate told me at the conference that he was working on the yaml file conversion. I was trying to find out what his status was. I am also waiting on further explanation from you about DM-6858 . I will write more there.
Hide
Perry Gee added a comment -

I have looked over this ticket, but it is somewhat unclear to me what I should change and what I should just document

I decided to restrict myself to datasets, as that seems like the intent of this ticket. I also think that similar changes to other mapper properties should be done on separate tickets to avoid too massive a change.

I also took HSCMapper as the point of comparison, as it seems to be the most up-to-date.
---------------------------------------------------------------------------------
Given these qualifications, I still want to know what to do with datasets which differ from one mapper to another. These differences might include:

1. Not existing in one or more of the mappers (so adding a common dataset will add datasets to the missing ones.
2. Having a different storage because the template is different (but the replacement names are the same)
3. Having a different storage location because the replacement variable are different
4. In many cases, there is a difference because both raw and raw_skyTile are included in the table list

2,3 will produce different storage locations, so possibly are not something we should do on this ticket?

Show
Perry Gee added a comment - I have looked over this ticket, but it is somewhat unclear to me what I should change and what I should just document I decided to restrict myself to datasets, as that seems like the intent of this ticket. I also think that similar changes to other mapper properties should be done on separate tickets to avoid too massive a change. I also took HSCMapper as the point of comparison, as it seems to be the most up-to-date. --------------------------------------------------------------------------------- Given these qualifications, I still want to know what to do with datasets which differ from one mapper to another. These differences might include: 1. Not existing in one or more of the mappers (so adding a common dataset will add datasets to the missing ones. 2. Having a different storage because the template is different (but the replacement names are the same) 3. Having a different storage location because the replacement variable are different 4. In many cases, there is a difference because both raw and raw_skyTile are included in the table list 2,3 will produce different storage locations, so possibly are not something we should do on this ticket?
Hide
Perry Gee added a comment -

Nate, do you think it is safe to make changes to the use the definitions in obs_test/policy/datasets.yaml? They include the things I need, though I am not sure if they are tested.

As a first run, I propose to include the following in my changes:

1. datasets ONLY which contain either (patch) or (tract)
2. either have identical definitions or are not included in some obs_* policies
3. include raw_skyTIle but don't use (skyTile)

That would include the following:

I will wait until Monday to do this to give you both a chance to comment.

Show
Hide
Perry Gee added a comment -

Nate Pease [X] Actually, most of these are not contained in the yaml files in obs_test. But I can follow the same style. Or you can send me your conversion program.

Show
Perry Gee added a comment - Nate Pease [X] Actually, most of these are not contained in the yaml files in obs_test. But I can follow the same style. Or you can send me your conversion program.
Hide
Perry Gee added a comment - - edited

Jim Bosch, This last list does not actually contain very many interesting items. There are many (such as the chiSquared coadds) which are not contained in very many of the obs_* paf files. There are also a lot of datasets not in this list which are shared by almost all of the mappers, but which differ from mapper to mapper due to slight differences in the template. These could be standardized, but not without changing the locations in existing repos. This would turn out to be a very large job.

Show
Perry Gee added a comment - - edited Jim Bosch , This last list does not actually contain very many interesting items. There are many (such as the chiSquared coadds) which are not contained in very many of the obs_* paf files. There are also a lot of datasets not in this list which are shared by almost all of the mappers, but which differ from mapper to mapper due to slight differences in the template. These could be standardized, but not without changing the locations in existing repos. This would turn out to be a very large job.
Hide
Jim Bosch added a comment -

In your numbered list, I'd remove (3) - skyType and raw_skyTile are not actually related to skymap, and are essentially irrelevant for this issue (I think one or both may even be deprecated, but let's ignore that for now).

 *_config deepCoadd deepCoadd_calexp[Background] deepCoadd_det deepCoadd_mergeDet deepCoadd_meas deepCoadd_ref deepCoadd_forced_src deep_detectCoaddSources_metadata deep_mergeCoaddDetections_metadata deep_measureCoaddSources_metadata deep_mergeCoaddMeasurements_metadata deep_forcedPhotCoadd_metadata deep_multiBandDriver_metadata 

We can then decide whether to duplicate these for e.g. "goodSeeing" instead of "deep" or just drop "goodSeeing", etc., and see more clearly whether the remaining "deepCoadd*" data products are worth keeping.

It's possible the above metadata entries don't exist exactly as I've written them above. If the names are inconsistently related to the task names or don't have "deep" in them, we should probably open an RFC to make them uniform.

If the above entries don't have the same templates across all cameras, we should (as you've already guessed) go with the HSC one as the one we put in daf_butlerUtils, remove the ones in all the obs* packages that agree with it, and make a list of the obs_* packages and datasets that disagree.

Show
Hide
Nate Pease [X] (Inactive) added a comment -

Perry Gee, I think most of the dataset types you're talking about changing are in .paf files, right? I think you should be able to move the common dataset types to base paf files (If I'm understanding correctly). I wouldn't bother converting to yaml just yet. (I'm finishing some cleanup work on the butler Policy handling of yaml policy files. I'll probably do a bulk convert after that, hopefully this week).

Show
Nate Pease [X] (Inactive) added a comment - Perry Gee , I think most of the dataset types you're talking about changing are in .paf files, right? I think you should be able to move the common dataset types to base paf files (If I'm understanding correctly). I wouldn't bother converting to yaml just yet. (I'm finishing some cleanup work on the butler Policy handling of yaml policy files. I'll probably do a bulk convert after that, hopefully this week).
Hide
Jim Bosch added a comment -

Nate Pease [X], if you'll be ready for a bulk convert, you may well be done before Perry Gee, and we may instead want to delay his work (beyond inspecting and comparing the existing obs_* package files, which I think is most of the work he has to do). Perry Gee, do you agree?

Show
Jim Bosch added a comment - Nate Pease [X] , if you'll be ready for a bulk convert, you may well be done before Perry Gee , and we may instead want to delay his work (beyond inspecting and comparing the existing obs_* package files, which I think is most of the work he has to do). Perry Gee , do you agree?
Hide
Perry Gee added a comment -

Nate Pease [X] I have already converted all of the entries I meant to change in the paf files to yaml, and stuffed them into datasets.yaml. If that isn't the right way to do it right now, I can put them in a paf format easily enough. Where would you like them to go? I would prefer to check this work in (in whatever form) rather than hold on to it and have it get bit-rot. I can put it wherever you like, and you can convert it when you please.

Jim Bosch, Are you in agreement that I should only do the datasets which have the same templates across all obs_* packages for which they have been defined? That will mean new datasets in some of the derived classes, since they will get them from the base mapper class. But the ones actually being used will be the same as before.

Show
Perry Gee added a comment - Nate Pease [X] I have already converted all of the entries I meant to change in the paf files to yaml, and stuffed them into datasets.yaml. If that isn't the right way to do it right now, I can put them in a paf format easily enough. Where would you like them to go? I would prefer to check this work in (in whatever form) rather than hold on to it and have it get bit-rot. I can put it wherever you like, and you can convert it when you please. Jim Bosch , Are you in agreement that I should only do the datasets which have the same templates across all obs_* packages for which they have been defined? That will mean new datasets in some of the derived classes, since they will get them from the base mapper class. But the ones actually being used will be the same as before.
Hide
Nate Pease [X] (Inactive) added a comment -

Perry Gee did the entries come from files named policy.paf?

Show
Nate Pease [X] (Inactive) added a comment - Perry Gee did the entries come from files named policy.paf?
Hide
Perry Gee added a comment -

They came from the 7 files named obs_*/policy/*Mapper.paf

I took the union of all of the dataset names as my starting point, and looked at which entries were identical or missing in all 7.

Is there a paf file in daf_butlerUtils/policy similar to dataset.yaml which I could transfer them to?

Show
Perry Gee added a comment - They came from the 7 files named obs_*/policy/*Mapper.paf I took the union of all of the dataset names as my starting point, and looked at which entries were identical or missing in all 7. Is there a paf file in daf_butlerUtils/policy similar to dataset.yaml which I could transfer them to?
Hide
Nate Pease [X] (Inactive) added a comment - - edited

ok, I think I'm starting to understand. Jim Bosch you added datasets.yaml and the code that handles it in CameraMapper, right?
If so: If you agree that what Perry Gee is the correct use of that code, then I'm ok with him proceeding with what he's got (I'm not suggesting he's incorrect, but rather that I don't feel confident saying it's right or not). When he comes to a stopping point I can do DM-7363.

Show
Nate Pease [X] (Inactive) added a comment - - edited ok, I think I'm starting to understand. Jim Bosch you added datasets.yaml and the code that handles it in CameraMapper, right? If so: If you agree that what Perry Gee is the correct use of that code, then I'm ok with him proceeding with what he's got (I'm not suggesting he's incorrect, but rather that I don't feel confident saying it's right or not). When he comes to a stopping point I can do DM-7363 .
Hide
Paul Price added a comment -

I'm not sure datasets.yaml is the right place for everything. I see that obs_subaru has various entries (e.g., deepCoadd, deepCoadd_calexp) under exposures, so they should go in exposures.yaml.

Show
Paul Price added a comment - I'm not sure datasets.yaml is the right place for everything. I see that obs_subaru has various entries (e.g., deepCoadd , deepCoadd_calexp ) under exposures , so they should go in exposures.yaml .
Hide
Nate Pease [X] (Inactive) added a comment -

Perry Gee I'm ready to merge DM-7255. It should not affect the work you're doing, but will certainly require you to merge daf_butlerUitls (and of course could break something). Would you like me to hold off, or are you ok taking that hit?

Show
Nate Pease [X] (Inactive) added a comment - Perry Gee I'm ready to merge DM-7255 . It should not affect the work you're doing, but will certainly require you to merge daf_butlerUitls (and of course could break something). Would you like me to hold off, or are you ok taking that hit?
Hide
Jim Bosch added a comment -

Jim Bosch you added datasets.yaml and the code that handles it in CameraMapper, right?

Since you (Nate Pease [X]) apparently didn't, I think Paul Price must have added datasets.yaml and the mechanism to read it, though I've added some entries to it it, and may now be responsible for most of its contents.

Paul Price says:

I'm not sure datasets.yaml is the right place for everything. I see that obs_subaru has various entries (e.g., deepCoadd, deepCoadd_calexp) under exposures, so they should go in exposures.yaml.

That's consistent with my understanding of how these files worked, but I have to admit I don't understand the distinction between these categories. Are they actually meaningful? Should we just move everything into the "datasets" category?

Jim Bosch, Are you in agreement that I should only do the datasets which have the same templates across all obs_* packages for which they have been defined? That will mean new datasets in some of the derived classes, since they will get them from the base mapper class. But the ones actually being used will be the same as before.

If the templates have the same keys but use them to construct a different path, we should use the HSC definitions and then just not remove the definitions from the obs_* packages that differ from the HSC ones (those definitions will then override the ones in daf_butlerUtils for now, and we'll instead make a list of these differences and RFC changing them later). If that effectively adds datasets to some mappers, that's desirable for at least my explicit list of datasets above. I suspect anything else you find is likely to be vestigial, but it's possible I've missed something. In any case, adding/removing these datasets to/from mappers to make them consistent is definitely desirable (it's just a question of which datasets are in active use and which are bitrot).

If the templates have completely different keys (as they would for e.g. calexp or processCcd_metadata), we can ignore them entirely on this issue.

Show
Hide
Paul Price added a comment - - edited

The exposures allow use of the _sub dataset suffix, e.g., deepCoadd_sub. That suggests that any image type should be in exposures and not datasets.

Show
Paul Price added a comment - - edited The exposures allow use of the _sub dataset suffix, e.g., deepCoadd_sub . That suggests that any image type should be in exposures and not datasets .
Hide
Nate Pease [X] (Inactive) added a comment -

I"m getting that this is still going to take some time to settle... I'm going to merge DM-7255.

Show
Nate Pease [X] (Inactive) added a comment - I"m getting that this is still going to take some time to settle... I'm going to merge DM-7255 .
Hide
Perry Gee added a comment -

Paul Price, I am only currently doing datasets, so I just assumed they would go in datasets.yaml. I think it makes sens to break to work up into different tickets.

Nate Pease [X] The changes I have been making don't involve any code – just moving things from the obs_/policy paf files and into a central location. But I have recorded the original complete resultant collection of datasets for each obs_ mapper, and would prefer not to have to do that again. Will you changes have any affect on the datasets visible through the mapper?

Jim Bosch, I didn't realize that there were large numbers of datasets which were no longer in use. I can certaintly do a subset of what I have.

Show
Perry Gee added a comment - Paul Price , I am only currently doing datasets, so I just assumed they would go in datasets.yaml. I think it makes sens to break to work up into different tickets. Nate Pease [X] The changes I have been making don't involve any code – just moving things from the obs_ /policy paf files and into a central location. But I have recorded the original complete resultant collection of datasets for each obs_ mapper, and would prefer not to have to do that again. Will you changes have any affect on the datasets visible through the mapper? Jim Bosch , I didn't realize that there were large numbers of datasets which were no longer in use. I can certaintly do a subset of what I have.
Hide
Nate Pease [X] (Inactive) added a comment -

Perry Gee there shouldn't be any overlap with your dataset work. The changes are merged into master so you'll have rebase your branch onto master & test at some point; let me know if you encounter any new issues as a result of doing that. (I don't expect anything, but you know how it goes.)

Show
Nate Pease [X] (Inactive) added a comment - Perry Gee there shouldn't be any overlap with your dataset work. The changes are merged into master so you'll have rebase your branch onto master & test at some point; let me know if you encounter any new issues as a result of doing that. (I don't expect anything, but you know how it goes.)
Hide
Paul Price added a comment -

It doesn't make sense to do it as a separate ticket. Please do it correctly on this ticket.

Show
Paul Price added a comment - It doesn't make sense to do it as a separate ticket. Please do it correctly on this ticket.
Hide
Perry Gee added a comment -

Paul Price, Jim Bosch I don't really understand what the issue is here. Paul, are you suggesting that I move all image types to exposures?

I think I am being asked in this ticket to move things to a central location, but not to change any of the mapper structures while I do that. I am not sure what the effect of moving things to "exposures" is, so I would really like a clarification from one or both of you.

Show
Perry Gee added a comment - Paul Price , Jim Bosch I don't really understand what the issue is here. Paul, are you suggesting that I move all image types to exposures? I think I am being asked in this ticket to move things to a central location, but not to change any of the mapper structures while I do that. I am not sure what the effect of moving things to "exposures" is, so I would really like a clarification from one or both of you.
Hide
Perry Gee added a comment -

Jim Bosch, I think we need a short talk on the phone about this ticket. Remember, I have had no connection the last 2 years with what has been going on in HSC land. So I really don't know how things need to be organized. Two of your changes are in the exposures section of HscMapper (deepCoadd and deepCoadd_calexp). Should I alter those?

I'm not sure what multiBandDriver_metadata refers to in the HscMapper. Most of the deepCoadd.*_metadata items you suggest don't exist, but they do exist without the "deepCoadd"

Show
Hide
Perry Gee added a comment -

I think I have located the policy items you asked me to move except for multiBandDriver_metadata. No such entry in HscMapper.paf. There is only a singleFrameDriver_metadata.

I assume that by the 4 metadata items you named as "deep_*_metadata", you reall meant

I find the following in the exposures section, and will move them to the base class:

This issue with moving things from datasets to exposures does not come up with these items, so I will proceed to make these specific changes on this ticket, but note the need to do this in the future with the other changes I am supposed to document.

Show
Hide
Jim Bosch added a comment -

I'm a little bothered by the coadd-processing tasks not having "deep" in their metadata names, but since they're consistent about not doing it, I suppose we shouldn't change it now.

The same is true of the missing multiBandDriver_metadata entries - if they're not there, this probably isn't the ticket to add them, though I may open a new one to make sure things are consistent later.

For the entries in the exposures section, I think the appropriate thing to do is to move them to an "exposures.yaml" file next to the "datasets.yaml" file in daf_butlerUtils; I think they should then be picked up automatically. As I understand it, the "datasets" and "exposures" sections of the .paf files are split out into separate files in daf_butlerUtils, for reasons I don't entirely understand. Paul Price, can you confirm that this is how it's supposed to work?

Show
Jim Bosch added a comment - I'm a little bothered by the coadd-processing tasks not having "deep" in their metadata names, but since they're consistent about not doing it, I suppose we shouldn't change it now. The same is true of the missing multiBandDriver_metadata entries - if they're not there, this probably isn't the ticket to add them, though I may open a new one to make sure things are consistent later. For the entries in the exposures section, I think the appropriate thing to do is to move them to an "exposures.yaml" file next to the "datasets.yaml" file in daf_butlerUtils; I think they should then be picked up automatically. As I understand it, the "datasets" and "exposures" sections of the .paf files are split out into separate files in daf_butlerUtils, for reasons I don't entirely understand. Paul Price , can you confirm that this is how it's supposed to work?
Hide
Paul Price added a comment -

It's not about "moving" things from datasets to exposures, but keeping the image types as exposures — notice that, e.g., deepCoadd is under exposures in HscMapper.paf.

Show
Paul Price added a comment - It's not about "moving" things from datasets to exposures , but keeping the image types as exposures — notice that, e.g., deepCoadd is under exposures in HscMapper.paf .
Hide
Paul Price added a comment -

For the entries in the exposures section, I think the appropriate thing to do is to move them to an "exposures.yaml" file next to the "datasets.yaml" file in daf_butlerUtils; I think they should then be picked up automatically. As I understand it, the "datasets" and "exposures" sections of the .paf files are split out into separate files in daf_butlerUtils, for reasons I don't entirely understand. Paul Price, can you confirm that this is how it's supposed to work?

Yes, that's it exactly.

Show
Paul Price added a comment - For the entries in the exposures section, I think the appropriate thing to do is to move them to an "exposures.yaml" file next to the "datasets.yaml" file in daf_butlerUtils; I think they should then be picked up automatically. As I understand it, the "datasets" and "exposures" sections of the .paf files are split out into separate files in daf_butlerUtils, for reasons I don't entirely understand. Paul Price, can you confirm that this is how it's supposed to work? Yes, that's it exactly.
Hide
Perry Gee added a comment -

Yes, I've made that addition to daf_butlerUtils/policy. Seemed to work correctly for both datasets and exposures.

I will not change the obs_* paf files at this time, except for HscMapper.py. But the other mappers (decam, megacam, sdss, test, lsstSim, and sumprimecam) willl now have entries they did not have before.

I will add a document describing the work which has to be done for each of the other mappers.

Show
Perry Gee added a comment - Yes, I've made that addition to daf_butlerUtils/policy. Seemed to work correctly for both datasets and exposures. I will not change the obs_* paf files at this time, except for HscMapper.py. But the other mappers (decam, megacam, sdss, test, lsstSim, and sumprimecam) willl now have entries they did not have before. I will add a document describing the work which has to be done for each of the other mappers.
Hide
Jim Bosch added a comment -

Ok. I think we will probably want to make some changes to the other mappers on this issue eventually, but waiting until you've written the document describing the differences makes sense.

Show
Jim Bosch added a comment - Ok. I think we will probably want to make some changes to the other mappers on this issue eventually, but waiting until you've written the document describing the differences makes sense.
Hide
Perry Gee added a comment -

This issue has never been very well defined. I've never really understood the scope of the work, but at least long term, it seems like the more common entries, the better. But identifying which policy items should be made "common" from the hundreds which exist is difficult. Your request for specific keys makes it much easier, but the problem of going through all the others still remains.

I do think it is probably a slippery slope modifying all the other mappers, because the differences I've seen indicate a need to change both policy keys and templates in all the other mappers. I'm not sure that there is a good stopping point once we try to achieve consistency. But some questions to ask are:

1. Which of the entries are trash, and need to be removed?
2. Do we want to rename keys if they are inconsistent from mapper to mapper? In some cases, I can't even tell if two similarly named entries are supposed to be the same or different, especially when comparing very old mappers to very new mappers.
3. Do we want to change storage locations (templates) even when if that means that existing repos will have to be converted?

I see this as work that should be done mapper by mapper, and with the input of people who know the code for the other obs_* drivers.

Show
Perry Gee added a comment - This issue has never been very well defined. I've never really understood the scope of the work, but at least long term, it seems like the more common entries, the better. But identifying which policy items should be made "common" from the hundreds which exist is difficult. Your request for specific keys makes it much easier, but the problem of going through all the others still remains. I do think it is probably a slippery slope modifying all the other mappers, because the differences I've seen indicate a need to change both policy keys and templates in all the other mappers. I'm not sure that there is a good stopping point once we try to achieve consistency. But some questions to ask are: 1. Which of the entries are trash, and need to be removed? 2. Do we want to rename keys if they are inconsistent from mapper to mapper? In some cases, I can't even tell if two similarly named entries are supposed to be the same or different, especially when comparing very old mappers to very new mappers. 3. Do we want to change storage locations (templates) even when if that means that existing repos will have to be converted? I see this as work that should be done mapper by mapper, and with the input of people who know the code for the other obs_* drivers.
Hide
Perry Gee added a comment -

All of the mappers except "testMapper" have ExposureF or ImageF entries in their "datasets" sections. Those are the ones which I thought Paul was saying needed to be moved. If we do need to move them, I'm not really sure what change this will make to the code which uses the policies, but I assume it will be a similar problem to making changes to individual templates.

Show
Perry Gee added a comment - All of the mappers except "testMapper" have ExposureF or ImageF entries in their "datasets" sections. Those are the ones which I thought Paul was saying needed to be moved. If we do need to move them, I'm not really sure what change this will make to the code which uses the policies, but I assume it will be a similar problem to making changes to individual templates.
Hide
Jim Bosch added a comment -

My intent for the scope of this issue is:

1. Add definitions for all active datasets that could be defined to be the same for all mappers (because they have the same keys) to a common location. I believe the list of datasets I've provided above is this list.
2. Remove definitions in obs* packages that are exactly redundant with these from existing mappers.
3. Remove at least some definitions in obs* packages that are no longer in use. We don't need to be exhaustive on this here, but I think we can e.g. at least delete anything that starts with "deepCoadd" and isn't in the list I put together, or any _config datasets that don't correspond to an existing CmdLineTask.
4. Create new JIRA issues (but do not do the work) to adjust keys or templates in obs_* packages that disagree with the common-location ones.
Show
Jim Bosch added a comment - My intent for the scope of this issue is: Add definitions for all active datasets that could be defined to be the same for all mappers (because they have the same keys) to a common location. I believe the list of datasets I've provided above is this list. Remove definitions in obs* packages that are exactly redundant with these from existing mappers. Remove at least some definitions in obs* packages that are no longer in use. We don't need to be exhaustive on this here, but I think we can e.g. at least delete anything that starts with "deepCoadd" and isn't in the list I put together, or any _config datasets that don't correspond to an existing CmdLineTask. Create new JIRA issues (but do not do the work) to adjust keys or templates in obs_* packages that disagree with the common-location ones.
Hide
Jim Bosch added a comment -

All of the mappers except "testMapper" have ExposureF or ImageF entries in their "datasets" sections.

I suspect Paul Price is right that these shouldn't be in "datasets", but it's probably best to just include them in your document for now and see what they actually are. We may want to remove them instead of moving them to "exposures".

Show
Jim Bosch added a comment - All of the mappers except "testMapper" have ExposureF or ImageF entries in their "datasets" sections. I suspect Paul Price is right that these shouldn't be in "datasets", but it's probably best to just include them in your document for now and see what they actually are. We may want to remove them instead of moving them to "exposures".
Hide
Perry Gee added a comment -

This is a request for a review of the change on tickets/DM-7049 to the yaml files in dafButlerUtils/policy. As there are other things that still have to be done for this ticket, I am asking for a review of just this part to be sure that the yaml additions are correct.

I just want your feedback. Do NOT mark this issue as Reviewed when you are done, as other changes still have to be made.

Note that this only involves the policy entries asked for by @jbosch in this ticket. Specific keys were added to exposures.yaml and datasets.yaml and removed from HscMapper.paf in obs_subaru.

The attached file mapper.diffs shows how these changes affect the various mappers. Hsc is unchanged, but the others have new entries which they did not have before as a result of the inheritance from dafButlerUtils

Show
Perry Gee added a comment - This is a request for a review of the change on tickets/ DM-7049 to the yaml files in dafButlerUtils/policy. As there are other things that still have to be done for this ticket, I am asking for a review of just this part to be sure that the yaml additions are correct. I just want your feedback. Do NOT mark this issue as Reviewed when you are done, as other changes still have to be made. Note that this only involves the policy entries asked for by @jbosch in this ticket. Specific keys were added to exposures.yaml and datasets.yaml and removed from HscMapper.paf in obs_subaru. The attached file mapper.diffs shows how these changes affect the various mappers. Hsc is unchanged, but the others have new entries which they did not have before as a result of the inheritance from dafButlerUtils
Hide
Perry Gee added a comment -

I have attached a summary of the differences between the hsc mapper and each of the other mappers.

not present – the key is in the hsc policy but not in this mapper
different – there is some difference in the content of the policy item in this mapper as shown
does not appear in hsc - the key is in this mapper but not in hsc.

Show
Perry Gee added a comment - I have attached a summary of the differences between the hsc mapper and each of the other mappers. not present – the key is in the hsc policy but not in this mapper different – there is some difference in the content of the policy item in this mapper as shown does not appear in hsc - the key is in this mapper but not in hsc.
Hide
Perry Gee added a comment -

Added many more items to exposures.yaml and datasets.yaml. All items in HSC except those with replacement values othere hand patch, tract, and filter. This seems overly agressive to me, so someone should probably look through these two files for HSC specific items and remvoe them.

Also attached a new tar file with possible modifications for each mapper, where mapper=lsstSim, decam, megacam, suprimecam, test, or sdss. Each file has 3 sections:

1. Keys in mapper.paf not found in hsc: these are keys which might be unique to the mapper.paf by intention. But they could also be keys which either need to be removed or renamed to match one of the hsc keys

2. Keys in original hsc not found in the mapper.paf: these are keys which were originally in hsc, are not now shared, and do not exist in the mapper.paf file

3. Keys which were moved to butlerUtils and differ between hsc and decam: these are keys which have been moved to butlerUtils, so they could be shared between mappers. If they appear in this list, the mapper.paf file has an entry which differs from hsc. The differences are shown.

Show
Perry Gee added a comment - Added many more items to exposures.yaml and datasets.yaml. All items in HSC except those with replacement values othere hand patch, tract, and filter. This seems overly agressive to me, so someone should probably look through these two files for HSC specific items and remvoe them. Also attached a new tar file with possible modifications for each mapper, where mapper=lsstSim, decam, megacam, suprimecam, test, or sdss. Each file has 3 sections: 1. Keys in mapper.paf not found in hsc: these are keys which might be unique to the mapper.paf by intention. But they could also be keys which either need to be removed or renamed to match one of the hsc keys 2. Keys in original hsc not found in the mapper.paf: these are keys which were originally in hsc, are not now shared, and do not exist in the mapper.paf file 3. Keys which were moved to butlerUtils and differ between hsc and decam: these are keys which have been moved to butlerUtils, so they could be shared between mappers. If they appear in this list, the mapper.paf file has an entry which differs from hsc. The differences are shown.
Hide
Perry Gee added a comment -

@jbosch See the tickets/DM-7049 branches on daf_butlerUtils and obs_subaru.

Show
Perry Gee added a comment - @jbosch See the tickets/ DM-7049 branches on daf_butlerUtils and obs_subaru.
Hide
Jim Bosch added a comment -

I'll take a look at your lists.

As a side note, I think we have a difference in nomenclature that's probably led to some of the confusion on this issue. When I say "key" in this context, I'm referring to the keys of the data ID dicts; things like "visit", "filter", or "patch", which you've most recently called "replacement values". I think when you say "key" you mean something like "calexp" or "src", which I think are usually called datasets or dataset types, but would comprise the keys of a dict version of the mapper paf/yaml files. I think I understand what you've done here, but I wanted to make sure there wasn't any more confusion on this subject.

Show
Jim Bosch added a comment - I'll take a look at your lists. As a side note, I think we have a difference in nomenclature that's probably led to some of the confusion on this issue. When I say "key" in this context, I'm referring to the keys of the data ID dicts; things like "visit", "filter", or "patch", which you've most recently called "replacement values". I think when you say "key" you mean something like "calexp" or "src", which I think are usually called datasets or dataset types, but would comprise the keys of a dict version of the mapper paf/yaml files. I think I understand what you've done here, but I wanted to make sure there wasn't any more confusion on this subject.
Hide
Jim Bosch added a comment -

Here are my notes on everything you've moved to daf_butlerUtils.

If there's no comment on a dataset, it's fine.

For anything that's marked "obsolete; can be removed", I'm fairly certain we can get rid of it, and hence we should do that for these datasets across ALL mappers. In fact, for any "remove" datasets that start with "deep", we can also remove any "goodSeeing" variants, too.

For several others, I've asked you to get a bit more information; could you do that, and then put together a community post asking other people for input on these, along with the results of your investigation? (Usually the investigation just involves something like a regexp search for the dataset name in a couple of packages).

After this cleanup, I'm guessing the diffs with the other mappers will get a lot smaller; I'd prefer not to look too closely at those until then.

Here's the list:

• diffsources_schema: investigate; please see if we're using a "diffsources" dataset in ip_diffim or pipe_tasks
• eups_versions: I think this is obsoleted by "products". Maybe Paul Price knows?
• deepCoadd_src(_schema): obsolete; can be removed.
• deepCoadd_icSrc(_schema): obsolete; can be removed.
• deepCoadd_psf: obsolete; can be removed.
• processStack_config: obsolete; can be removed.
• stackExposureId(_bits): obsolete; can be removed.
• deepCoadd_apCorr: obsolete; can be removed.
• solvetansip_config: obsolete; can be removed.
• deep_processCoadd_config: obsolete; can be removed.
• deep_coadd_config: obsolete; can be removed.
• processFocus_config: keep in HSC, do not move to common.
• processFocusSweep_config: keep in HSC, do not move to common.
• deepCoadd_multibandReprocessing: probably obsolete, even though it's new, but maybe Paul Price wants it kept around?
• deepCoadd_icSrc: obsolete; can be removed.
• forcedCoadd_config: obsolete; can be removed.
• deepCoadd_initPsf: obsolete; can be removed.
Show
Hide
Jim Bosch added a comment -

Also note my recent community posts on some general questions that I'm not sure how to resolve here:

https://community.lsst.org/t/appropriate-use-of-raw-and-raw-skytile-tables-in-mappers/1111

and

You may want to wait for some response to those before doing much more work on diffs between mappers, as they may lead to more cleanup we can do before trying to do those comparisons.

Show
Jim Bosch added a comment - Also note my recent community posts on some general questions that I'm not sure how to resolve here: https://community.lsst.org/t/appropriate-use-of-raw-and-raw-skytile-tables-in-mappers/1111 and https://community.lsst.org/t/possibly-bitrotted-coadd-names/1110 You may want to wait for some response to those before doing much more work on diffs between mappers, as they may lead to more cleanup we can do before trying to do those comparisons.
Hide
Perry Gee added a comment -

Yes, I can try to take over the investigation. This is going to be a process, I can see.

One thing I noticed while doing some further testing on the datasets and exposures formed using the new butlerUtils yamls files and the old mapper.paf files. The expected effect is that if mapper.paf does not contain something in the new yaml file, a new policy entry will appear inherited from the base classe.

But there can be a rather strange thing can happen if the mapper.paf DOES contain the entry, but does NOT contain all of the subkeys (like template, persistable, tables, etc.) The resulting entry will then be a hybrid of what is in the yaml file and what is in the mapper.paf, as the substitution is actually done at individually for each path in the policy. This appears only 4 times among the mappers.

Show
Perry Gee added a comment - Yes, I can try to take over the investigation. This is going to be a process, I can see. One thing I noticed while doing some further testing on the datasets and exposures formed using the new butlerUtils yamls files and the old mapper.paf files. The expected effect is that if mapper.paf does not contain something in the new yaml file, a new policy entry will appear inherited from the base classe. But there can be a rather strange thing can happen if the mapper.paf DOES contain the entry, but does NOT contain all of the subkeys (like template, persistable, tables, etc.) The resulting entry will then be a hybrid of what is in the yaml file and what is in the mapper.paf, as the substitution is actually done at individually for each path in the policy. This appears only 4 times among the mappers.
Hide
Jim Bosch added a comment -

But there can be a rather strange thing can happen if the mapper.paf DOES contain the entry, but does NOT contain all of the subkeys (like template, persistable, tables, etc.) The resulting entry will then be a hybrid of what is in the yaml file and what is in the mapper.paf, as the substitution is actually done at individually for each path in the policy. This appears only 4 times among the mappers.

That is definitely weird. I'm hoping we can just remove these inconsistencies (the only things that should be different are the templates, I think), so this is a moot point, but we'll have to keep an eye on it. I'm glad you noticed it!

Show
Jim Bosch added a comment - But there can be a rather strange thing can happen if the mapper.paf DOES contain the entry, but does NOT contain all of the subkeys (like template, persistable, tables, etc.) The resulting entry will then be a hybrid of what is in the yaml file and what is in the mapper.paf, as the substitution is actually done at individually for each path in the policy. This appears only 4 times among the mappers. That is definitely weird. I'm hoping we can just remove these inconsistencies (the only things that should be different are the templates, I think), so this is a moot point, but we'll have to keep an eye on it. I'm glad you noticed it!
Hide
Paul Price added a comment -

eups_versions and deepCoadd_multibandReprocessing can both be removed.

Show
Paul Price added a comment - eups_versions and deepCoadd_multibandReprocessing can both be removed.
Hide
Jim Bosch added a comment -

Based on the responses (and lack thereof) to my community posts, I think we can:

• Remove any "table: raw" or "table: raw_skyTile" entries in any of the dataset definitions we're working with here (some datasets should continue to have "table: raw", but all of these have data IDs that would not allow them to be shared across cameras). I'm hoping this will remove the conflicts you saw previously where some dataset definitons differed across cameras in something other than their template string.
• Just ignore (do not move or remove) any coadd datasets that aren't "deep" ones (e.g. "goodSeeing", "chiSquared"). I'll probably propose removing them or redefining them to match the "deep" definitions in a future RFC, but that will be a different issue.
Show
Jim Bosch added a comment - Based on the responses (and lack thereof) to my community posts, I think we can: Remove any "table: raw" or "table: raw_skyTile" entries in any of the dataset definitions we're working with here (some datasets should continue to have "table: raw", but all of these have data IDs that would not allow them to be shared across cameras). I'm hoping this will remove the conflicts you saw previously where some dataset definitons differed across cameras in something other than their template string. Just ignore (do not move or remove) any coadd datasets that aren't "deep" ones (e.g. "goodSeeing", "chiSquared"). I'll probably propose removing them or redefining them to match the "deep" definitions in a future RFC, but that will be a different issue.
Hide
Perry Gee added a comment -

I've looked for all these strings as literals in the stack, including doing matches goodSeeing, chiSquared appended to the "deep" ones. Here are the only matches which I found. There appear to be no butler calls (like get, set, or datasetExists) but these are probably unnecessary mapper definitions which could be removed. I'm not sure if there are other syntaxes which I need to worry about, other than the string itself and the string with quotes around it. I searched all .py, .h, and .cc files.

/obs_cfht/python/lsst/obs/cfht/megacamMapper.py:192: def bypass_stackExposureId(self, datasetType, pythonType, location, dataId):
/obs_cfht/python/lsst/obs/cfht/megacamMapper.py:196: def bypass_stackExposureId_bits(self, datasetType, pythonType, location, dataId):
/obs_subaru/python/lsst/obs/suprimecam/suprimecamMapper.py:256: def bypass_stackExposureId(self, datasetType, pythonType, location, dataId):
/obs_subaru/python/lsst/obs/suprimecam/suprimecamMapper.py:259: def bypass_stackExposureId_bits(self, datasetType, pythonType, location, dataId):

The are also a set call goodSeeingDiff which do not appear to be used.

Show
Hide
Perry Gee added a comment -

I assume that if I any of the exposures or datasets which you ask me to investigate should be removed if I do not find any place where they are used?

Show
Perry Gee added a comment - I assume that if I any of the exposures or datasets which you ask me to investigate should be removed if I do not find any place where they are used?
Hide
Paul Price added a comment - - edited

You don't need to look in C++ files since the butler only works in python.

Show
Paul Price added a comment - - edited You don't need to look in C++ files since the butler only works in python. Things like deepCoadd_calexp are more often written as self.config.coaddName + "Coadd_calexp" . There's also the CoaddBaseTask.writeCoaddOutput method, which only takes the suffix (e.g., "calexp" ).
Hide
Jim Bosch added a comment -

I assume that if I any of the exposures or datasets which you ask me to investigate should be removed if I do not find any place where they are used?

I don't think we necessarily want to remove them on this issue, but I'd like to able to create lists of things that can probably be removed when we create the "recommended changes" issues for the various obs* packages.

Show
Jim Bosch added a comment - I assume that if I any of the exposures or datasets which you ask me to investigate should be removed if I do not find any place where they are used? I don't think we necessarily want to remove them on this issue, but I'd like to able to create lists of things that can probably be removed when we create the "recommended changes" issues for the various obs* packages.
Hide
Perry Gee added a comment -

The recommended changes to the obs* paf files can be found in "hscdiffs.2.tar". These are inconsistencies between the model HscMapper.paf file and the other mappers which should be examined eventually.

A general discussion of what changes will be done on this ticket and what should be done in the future is in "changes.summary".

Please note the issue with raw_visit the the HscMapper.paf which we should probably addressed on this ticket.

Show
Perry Gee added a comment - The recommended changes to the obs* paf files can be found in "hscdiffs.2.tar". These are inconsistencies between the model HscMapper.paf file and the other mappers which should be examined eventually. A general discussion of what changes will be done on this ticket and what should be done in the future is in "changes.summary". Please note the issue with raw_visit the the HscMapper.paf which we should probably addressed on this ticket.
Hide
Perry Gee added a comment -

Changes have been checked in to datasets.yaml and exposures.yaml. These are mapper items which have been identified as common across mappers and should be centralized.

obs_subaru has been changed so the "obsolete" items have been removed. SuprimecamMapper.paf is otherwise the same, but HscMapper.paf has also had items which were moved to daf_ButlerUtils removed.

See movekeys.yaml and obskeys.yaml attached

Show
Perry Gee added a comment - Changes have been checked in to datasets.yaml and exposures.yaml. These are mapper items which have been identified as common across mappers and should be centralized. obs_subaru has been changed so the "obsolete" items have been removed. SuprimecamMapper.paf is otherwise the same, but HscMapper.paf has also had items which were moved to daf_ButlerUtils removed. See movekeys.yaml and obskeys.yaml attached
Hide
Perry Gee added a comment -

Note: the attachments from Aug 30, 31 and Sept 1 are now obsolete.

Show
Perry Gee added a comment - Note: the attachments from Aug 30, 31 and Sept 1 are now obsolete.
Hide
Jim Bosch added a comment - - edited

Thanks for the detailed summary. I've started looking at them, but probably won't get it done today.

So far, there's one aspect of your notation I don't understand. In the hscdiffs files, what is the distinction between these two categories?

 Keys missing from mapper news:decam which appear in news:hsc 

and

 Keys not in news:decam which exist in reference news:hsc 

Show
Jim Bosch added a comment - - edited Thanks for the detailed summary. I've started looking at them, but probably won't get it done today. So far, there's one aspect of your notation I don't understand. In the hscdiffs files, what is the distinction between these two categories? Keys missing from mapper news:decam which appear in news:hsc and Keys not in news:decam which exist in reference news:hsc
Hide
Jim Bosch added a comment -

We can go ahead and remove all of the "tables:" lines from the entries that have been moved to daf_butlerUtils. We can also remove the "tables:" lines from those same entries in all of the obs* packages. I think that will reduce the number of conflicts in those items considerably. (No need to make a new set of diffs or a summary after this; I'll just keep looking at your existing diffs and ignore differences in "tables:" in those entries).

Show
Jim Bosch added a comment - We can go ahead and remove all of the "tables:" lines from the entries that have been moved to daf_butlerUtils. We can also remove the "tables:" lines from those same entries in all of the obs* packages. I think that will reduce the number of conflicts in those items considerably. (No need to make a new set of diffs or a summary after this; I'll just keep looking at your existing diffs and ignore differences in "tables:" in those entries).
Hide
Jim Bosch added a comment -

I think we shouldn't move isr_config to daf_butlerUtils; unlike all of the other _config entries, this one is generally going to have a different template for every camera, so there's no point in trying to standardize it.

Show
Jim Bosch added a comment - I think we shouldn't move isr_config to daf_butlerUtils; unlike all of the other _config entries, this one is generally going to have a different template for every camera, so there's no point in trying to standardize it.
Hide
Perry Gee added a comment -

Just noticed that there is a typo in the output files: Should say "Keys in news:decam do not exit in reference news:hsc

Show
Perry Gee added a comment - Just noticed that there is a typo in the output files: Should say "Keys in news:decam do not exit in reference news:hsc
Hide
Perry Gee added a comment -

Oh, I see you caught that typo already.

I am presently also writing a script to locate any of these keys which are not in hscMapper.paf and also do not have a butler.get() associated with them.

Show
Perry Gee added a comment - Oh, I see you caught that typo already. I am presently also writing a script to locate any of these keys which are not in hscMapper.paf and also do not have a butler.get() associated with them.
Hide
Jim Bosch added a comment -

I've finished going through all of the .hscdiff files, and I have just a few more changes to request. The changes in daf_butlerUtils and obs_subaru look fine as well, with just the "isr_config" caveat above and the "deepCoadd_calexp_background" caveat below.

• I think we should use the deepCoadd_calexp_background template defined by obs_lsstSim, obs_decam, and obs_cfht as the base one we put in daf_butlerUtils (which would entail moving the HSC-specific one back to obs_subaru), since HSC seems to be the outlier on that one (and its template looks historical).
• The conflicting "python" entries for deep_safeClipAssembleCoadd_config should just be deleted from the non-HSC mappers; that discrepancy is definitely a bug that has been fixed on the HSC side.
• For the entries you've moved to daf_butlerUtils, we can go ahead and make the suprimecam mapper consistent with HSC now, I think, instead of making a separate issue for it. I think all of the moved entries that differ from HSC are cases where HSC is correct and suprimecam is simply incorrect (aside from isr_config, as discussed above). The differences in templates for the non-moved entries are all legitimate, and I don't think we want to worry about the differences in "python" and "persistable" for the calibration entries.

Once you've made those changes, I think you can go ahead and remove any moved entries from all of the obs packages where you haven't noted a difference in the definitions; everything with an (M) that doesn't have an (ABCDE!) looks fine to me.

The moved (M) entries where you have noted a difference (with your ABCDE options) are exactly the ones I think we should focus on for the per-obs package standardization issues. As far as I'm concerned, you're welcome to start creating those issues now, listing those entries as the differences that should be considered for standardization. That still leaves a lot of entries that you didn't move that differ, which we should ignore for now (virtually all of these are, as you noted, expected differences due to different raw data IDs), and the entries that don't exist in all mappers. Most of the things in that last case (which were present in one or two mappers, generally, but not all) are candidates for removal, but we should RFC that, and see if any of them are things I'm just not aware of being in active use. Could you compile a single list of such entries, and maybe start putting together text for that RFC? The only things that jumped out to me as definitely needed in only one mapper were the processFocus-related entries for HSC. The RFC list would include the "goodSeeingDiff" entry you asked specifically about; I just don't know if it's in use. The script you're working on now sounds like it would also help in weeding out any entries on that list that are in fact in use somewhere.

Also, I agree that we can now remove the bypass_stackExposureId and bypass_stackExposureId_bits methods from any mappers that have them.

Show
Hide
Perry Gee added a comment -

I notice that deepCoadd: and related exposures have "level:"
set in conflicting ways. Is this entry needed?

Show
Perry Gee added a comment - I notice that deepCoadd: and related exposures have "level:" set in conflicting ways. Is this entry needed?
Hide
Jim Bosch added a comment -

I notice that deepCoadd: and related exposures have "level:" set in conflicting ways. Is this entry needed?

The fact that they all work despite having different definitions suggests that it's not needed, but I don't actually understand what it does. Could you ask this on community?

Show
Jim Bosch added a comment - I notice that deepCoadd: and related exposures have "level:" set in conflicting ways. Is this entry needed? The fact that they all work despite having different definitions suggests that it's not needed, but I don't actually understand what it does. Could you ask this on community?
Hide
Perry Gee added a comment -

This came up because I was ending up with deepCoadd and deepCoad_calexp in two places for the non-HSC mappers, and I had to then choose whether to delete these entries from the other mappers (violating our previous assumption that we would not make any material changes to the other mappers on this ticket), or to add an override by those same names in the other mapper.paf files.

Since K-T is not willing to vouch that changing the level won't affect things (and in fact, he wants to remove this entirely in later code), I vote that we punt on this issue and retain compatibility with the previous paf files for now by moving these two entries from datasets: to exposures: in all the paf files.

Show
Perry Gee added a comment - This came up because I was ending up with deepCoadd and deepCoad_calexp in two places for the non-HSC mappers, and I had to then choose whether to delete these entries from the other mappers (violating our previous assumption that we would not make any material changes to the other mappers on this ticket), or to add an override by those same names in the other mapper.paf files. Since K-T is not willing to vouch that changing the level won't affect things (and in fact, he wants to remove this entirely in later code), I vote that we punt on this issue and retain compatibility with the previous paf files for now by moving these two entries from datasets: to exposures: in all the paf files.
Hide
Perry Gee added a comment -

The differerence logs for these changes will now be moved to individual RFC documents for each of the mappers.

Show
Perry Gee added a comment - The differerence logs for these changes will now be moved to individual RFC documents for each of the mappers.
Hide
John Parejko added a comment -

This could possibly live in obs_base.

Show
John Parejko added a comment - This could possibly live in obs_base.

#### People

Assignee:
Perry Gee
Reporter:
Jim Bosch
Reviewers:
Jim Bosch
Watchers:
Hsin-Fang Chiang, Jim Bosch, John Parejko, John Swinbank, Nate Pease [X] (Inactive), Paul Price, Perry Gee