# Error in flux/instFlux alias setting following DM-10302

XMLWordPrintable

#### Details

• Type: Bug
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
4
• Sprint:
AP F18-4, AP F18-5
• Team:

#### Description

In updating the pipe_analysis scripts following the changes on DM-10302, I noticed that when reading an old (I think it would be version 2) catalog, some of the aliases aren't getting set properly.  A few examples are:

 'base_SdssShape_flux_xxinstFlux'->'base_SdssShape_flux_xx_Cov' 'base_SdssShape_flux_xyinstFlux'->'base_SdssShape_flux_xy_Cov' 'base_SdssShape_flux_yyinstFlux'->'base_SdssShape_flux_yy_Cov' 'base_Blendedness_abs_flux_cinstFlux'->'base_Blendedness_abs_flux_child' 'base_Blendedness_abs_flux_painstFlux'->'base_Blendedness_abs_flux_parent' 'base_Blendedness_raw_flux_cinstFlux'->'base_Blendedness_raw_flux_child' 'base_Blendedness_raw_flux_painstFlux'->'base_Blendedness_raw_flux_parent'

It seem that if "flux" is in the string, the characters beyond the last "" are replaced with "instFlux", regardless of what they are nor where the "_flux" occurred in the original string.

#### Activity

No builds found.
Lauren MacArthur created issue -
Field Original Value New Value
Link This issue relates to DM-10302 [ DM-10302 ]
 Risk Score 0
Hide
John Parejko added a comment -

Ugh. I understand exactly how this happened, but I'm not sure I know what the best solution is. The same thing probably happens with the v1->v2 aliases, but those were only used for a month before I introduced v3.

And we don't have any catalogs in afwdata that could be used for testing, either.

Show
John Parejko added a comment - Ugh. I understand exactly how this happened, but I'm not sure I know what the best solution is. The same thing probably happens with the v1->v2 aliases, but those were only used for a month before I introduced v3. And we don't have any catalogs in afwdata that could be used for testing, either.
 Priority Undefined [ 10000 ] Major [ 3 ]
 Labels SciencePipelines
Hide
Lauren MacArthur added a comment -

You could always use any pre-w_2018_38 catalog from the RC2 reprocessing for testing (if you're on lsst-dev, that is).

Show
Lauren MacArthur added a comment - You could always use any pre- w_2018_38 catalog from the RC2 reprocessing for testing (if you're on lsst-dev, that is).
Hide
Lauren MacArthur added a comment -

Quick & dirty, this'll show you what you need to know:

 import lsst.daf.persistence as dafPersist butlerDir = "/datasets/hsc/repo/rerun/RC/w_2018_36/DM-15603/" butler = dafPersist.Butler(butlerDir) dataId = {'filter': "HSC-I", 'ccd': 49, 'visit': 1228} cat = butler.get("src", dataId) print(cat.schema)

Show
Lauren MacArthur added a comment - Quick & dirty, this'll show you what you need to know: import lsst.daf.persistence as dafPersist   butlerDir = "/datasets/hsc/repo/rerun/RC/w_2018_36/DM-15603/" butler = dafPersist.Butler(butlerDir) dataId = { 'filter' : "HSC-I" , 'ccd' : 49 , 'visit' : 1228 } cat = butler.get( "src" , dataId) print (cat.schema)
Hide
John Parejko added a comment -

Yes, but we need to have something in a test data repository so we can write tests for them.

Show
John Parejko added a comment - Yes, but we need to have something in a test data repository so we can write tests for them.
Hide
Lauren MacArthur added a comment -

Oh, that kind of testing...indeed.

Show
Lauren MacArthur added a comment - Oh, that kind of testing...indeed.
 Link This issue relates to DM-15857 [ DM-15857 ]
Hide
John Swinbank added a comment -

(For the record, we said “adding some small catalogues to afwdata would be fine”).

Show
John Swinbank added a comment - John Parejko , given that we just talked about this, are you already working on it? (For the record, we said “adding some small catalogues to afwdata would be fine”).
 Assignee John Parejko [ parejkoj ]
 Team Data Release Production [ 10301 ] Alert Production [ 10300 ]
 Epic Link DM-14447 [ 80385 ]
 Story Points 1
Hide
Lauren MacArthur added a comment -

Ok, I just noticed some behavior I don't think is correct.  If I run the above snippet with w_2018_34 setup, then:

 In [3]: cat.schema.VERSION Out[3]: 2

However, with w_2018_38 (i.e. the current weekly) setup, I get:

 In [3]: cat.schema.VERSION Out[3]: 3

I don't think the stack is meant to change the version of a schema (but I may be missing something subtle here)!

Show
Lauren MacArthur added a comment - Ok, I just noticed some behavior I don't think is correct.  If I run the above snippet with w_2018_34 setup, then: In [3]: cat.schema.VERSION Out[3]: 2 However, with w_2018_38 (i.e. the current weekly) setup, I get: In [3]: cat.schema.VERSION Out[3]: 3 I don't think the stack is meant to change the version of a schema (but I may be missing something subtle here)!
 Story Points 1 4
Hide
John Parejko added a comment -

I believe the schema version changes (Jim Bosch can confirm), because when you save it it saves all the new aliases. As I just discovered when trying to make a stripped down catalog to put in afw/tests/data. Which, incidentally, is where all afw test data more recent than ~2014 lives. Not in afwdata, which we should think about just removing (once we figure out whether the existing tests that use it actually matter).

I'm working on fixing DM-15857 first: I don't think the above is actually a breaking change, just a bunch of unnecessary aliases. Or is it? What exactly is base_SdssShape_flux_xx_Cov, anyway? I don't think it's a "flux" field at all, but I'm not sure.

Show
John Parejko added a comment - I believe the schema version changes ( Jim Bosch can confirm), because when you save it it saves all the new aliases. As I just discovered when trying to make a stripped down catalog to put in afw/tests/data . Which, incidentally, is where all afw test data more recent than ~2014 lives. Not in afwdata , which we should think about just removing (once we figure out whether the existing tests that use it actually matter). I'm working on fixing DM-15857 first: I don't think the above is actually a breaking change, just a bunch of unnecessary aliases. Or is it? What exactly is base_SdssShape_flux_xx_Cov , anyway? I don't think it's a "flux" field at all, but I'm not sure.
Hide
Lauren MacArthur added a comment -

 name="base_SdssShape_flux_xx_Cov", doc="uncertainty covariance between base_SdssShape_flux and base_SdssShape_xx", units="count*pixel^2"

Whatever that means...

Show
Lauren MacArthur added a comment - name="base_SdssShape_flux_xx_Cov", doc="uncertainty covariance between base_SdssShape_flux and base_SdssShape_xx", units="count*pixel^2" Whatever that means...
Hide
Lauren MacArthur added a comment - - edited

So, if I'm reading a catalog that was written as VERSION 2, how can I tell that upon reading it with a current stack (Jim Bosch)?

Show
Lauren MacArthur added a comment - - edited So, if I'm reading a catalog that was written as VERSION 2, how can I tell that upon reading it with a current stack ( Jim Bosch )?
Hide
Jim Bosch added a comment -

cat.schema.VERSION is actually a constant, not something specific to cat, and it just says what version new schemas are now created with.  It's there primarily so the I/O code can notice when it's reading something old so it can modify it to look like something new.  I wouldn't expect anything outside the I/O code to need to know what version a particular on-disk schema has; do you think you do?

Show
Jim Bosch added a comment - cat.schema.VERSION is actually a constant, not something specific to cat , and it just says what version new schemas are now created with.  It's there primarily so the I/O code can notice when it's reading something old so it can modify it to look like something new.  I wouldn't expect anything outside the I/O code to need to know what version a particular on-disk schema has; do you think you do?
Hide
Lauren MacArthur added a comment -

Ah, ok.  Yeah, I think I do (but that may be a red herring that I'm doing something silly...).  Basically, I have a function I use to look up all "flux" keys so that I can loop through these and perform an operation on all fluxes in the catalog (and their errors, which I find by just appending Err to the key names in my fluxKeys list).  I identify them with re.search(r"^(\w+_flux)$" of the schema keys. If I just replace "_flux" with "_instFlux" in my search, I get no keys found for an older (VERSION < 3) catalog (I guess the alias mapping of _instFlux --> _flux doesn't work as the key name in the schema is still just _flux). We can take this offline (apologies for the noise on this ticket!), but I'd love to hear if there's a better way I should be doing this! Show Lauren MacArthur added a comment - Ah, ok. Yeah, I think I do (but that may be a red herring that I'm doing something silly...). Basically, I have a function I use to look up all "flux" keys so that I can loop through these and perform an operation on all fluxes in the catalog (and their errors, which I find by just appending Err to the key names in my fluxKeys list). I identify them with re.search(r"^(\w+_flux)$" of the schema keys.  If I just replace "_flux" with "_instFlux" in my search, I get no keys found for an older (VERSION < 3) catalog (I guess the alias mapping of _instFlux --> _flux doesn't work as the key name in the schema is still just _flux ).  We can take this offline (apologies for the noise on this ticket!), but I'd love to hear if there's a better way I should be doing this!
 Description In updating the pipe_analysis scripts following the changes on DM-10302, I noticed that when reading an old (I think it would be version 2) catalog, some of the aliases aren't getting set properly.  A few examples are: {noformat} 'base_SdssShape_flux_xxinstFlux'->'base_SdssShape_flux_xx_Cov' 'base_SdssShape_flux_xyinstFlux'->'base_SdssShape_flux_xy_Cov' 'base_SdssShape_flux_yyinstFlux'->'base_SdssShape_flux_yy_Cov' 'base_Blendedness_abs_flux_cinstFlux'->'base_Blendedness_abs_flux_child' 'base_Blendedness_abs_flux_painstFlux'->'base_Blendedness_abs_flux_parent' 'base_Blendedness_raw_flux_cinstFlux'->'base_Blendedness_raw_flux_child' 'base_Blendedness_raw_flux_painstFlux'->'base_Blendedness_raw_flux_parent' {noformat} It seem that if "\_flux" is in the string, the characters beyond the last "\_" are replaced with "instFlux", regardless of what they are nor where the "_flux" occurred in the original string. In updating the pipe_analysis scripts following the changes on DM-10302, I noticed that when reading an old (I think it would be version 2) catalog, some of the aliases aren't getting set properly.  A few examples are: {noformat}'base_SdssShape_flux_xxinstFlux'->'base_SdssShape_flux_xx_Cov' 'base_SdssShape_flux_xyinstFlux'->'base_SdssShape_flux_xy_Cov' 'base_SdssShape_flux_yyinstFlux'->'base_SdssShape_flux_yy_Cov' 'base_Blendedness_abs_flux_cinstFlux'->'base_Blendedness_abs_flux_child' 'base_Blendedness_abs_flux_painstFlux'->'base_Blendedness_abs_flux_parent' 'base_Blendedness_raw_flux_cinstFlux'->'base_Blendedness_raw_flux_child' 'base_Blendedness_raw_flux_painstFlux'->'base_Blendedness_raw_flux_parent' {noformat} It seem that if "_flux" is in the string, the characters beyond the last "_" are replaced with "instFlux", regardless of what they are nor where the "_flux" occurred in the original string.
Hide
Jim Bosch added a comment -

Does

schema.extract("*_instFlux")

find them all?

Show
Jim Bosch added a comment - Does schema.extract("*_instFlux") find them all?
 Status To Do [ 10001 ] In Progress [ 3 ]
 Sprint AP F18-4 [ 749 ]
Hide
Lauren MacArthur added a comment -

Jim Bosch, yes, that works, thanks.  I also realized I could get the table version from the catalog FITS header:

 HIERARCH AFW_TABLE_VERSION

Is there any reason to prefer one over the other?

Show
Lauren MacArthur added a comment - Jim Bosch , yes, that works, thanks.  I also realized I could get the table version from the catalog FITS header: HIERARCH AFW_TABLE_VERSION Is there any reason to prefer one over the other?
Hide
Jim Bosch added a comment -

Definitely better to use schema.extract; it's designed to combine regular fields and aliases.  It's also best in general to avoid reading FITS header entries directly, though that may be harder to avoid in your line of work than most others.

Show
Jim Bosch added a comment - Definitely better to use schema.extract ; it's designed to combine regular fields and aliases.  It's also best in general to avoid reading FITS header entries directly, though that may be harder to avoid in your line of work than most others.
Hide
Lauren MacArthur added a comment -

Awesome, thanks.  As you will have just seen, I have just discovered that I will have to use this trick in meas_mosaic as well (for backwards compatibility with older catalogs, DM-15916), so I wanted to double check.

Show
Lauren MacArthur added a comment - Awesome, thanks.  As you will have just seen, I have just discovered that I will have to use this trick in meas_mosaic as well (for backwards compatibility with older catalogs, DM-15916 ), so I wanted to double check.
 Assignee John Parejko [ parejkoj ] Jim Bosch [ jbosch ]
Hide
John Parejko added a comment -

Jim Bosch is fixing this as part of DM-15857.

Show
John Parejko added a comment - Jim Bosch is fixing this as part of DM-15857 .
 Sprint AP F18-4 [ 749 ] AP F18-4, AP F18-5 [ 749, 750 ]
Hide
Lauren MacArthur added a comment -

I've just discovered another issue related to the alias setting in regards to backwards compatibility in the context of joining reference matches to catalogs.  The matchesToCatalog function in afw (here) is a handy function that joins the matches found with the associated catalog entry into a single "match" catalog, putting a ref_ or src_ prefix on the field names for the reference and source fields, respectively. However, the aliases don't get propagated, so this function is currently not backwards compatible with old catalogs. I may be the only one who makes use of this function and I could easily hack a fix for this in the pipe_analysis scripts, but it does live in afw, so I wonder if the function itself should be updated to also include the aliases. What do you think? (Also pinging Paul Price on this one as I believe he was the original author of matchesToCatalog)

Show
Lauren MacArthur added a comment - I've just discovered another issue related to the alias setting in regards to backwards compatibility in the context of joining reference matches to catalogs.  The  matchesToCatalog function in afw ( here ) is a handy function that joins the matches found with the associated catalog entry into a single "match" catalog, putting a ref_ or src_ prefix on the field names for the reference and source fields, respectively. However, the aliases don't get propagated, so this function is currently not backwards compatible with old catalogs. I may be the only one who makes use of this function and I could easily hack a fix for this in the pipe_analysis scripts, but it does live in afw , so I wonder if the function itself should be updated to also include the aliases. What do you think? (Also pinging Paul Price on this one as I believe he was the original author of  matchesToCatalog)
Hide
Paul Price added a comment -

If you're looking at lsst.afw.table.matchesToCatalog, you should also look at last.meas.astrom.denormalizeMatches, which is incredibly similar and more widely used.

Show
Paul Price added a comment - If you're looking at lsst.afw.table.matchesToCatalog , you should also look at last.meas.astrom.denormalizeMatches , which is incredibly similar and more widely used.
Hide
John Parejko added a comment -

Lauren MacArthur: I'm marking this done, as the original bug you described should be fixed in DM-15857, which I just merged. If the bug you describe in your last comment remains, please file a new ticket (and try Paul Price's suggestion too).

Show
John Parejko added a comment - Lauren MacArthur : I'm marking this done, as the original bug you described should be fixed in DM-15857 , which I just merged. If the bug you describe in your last comment remains, please file a new ticket (and try Paul Price 's suggestion too).
 Resolution Done [ 10000 ] Status In Progress [ 3 ] Done [ 10002 ]
Hide
Lauren MacArthur added a comment -

Thanks...doing it now: DM-16023

Show
Lauren MacArthur added a comment - Thanks...doing it now: DM-16023
 Assignee Jim Bosch [ jbosch ] John Parejko [ parejkoj ]

#### People

Assignee:
John Parejko
Reporter:
Lauren MacArthur
Watchers:
Hsin-Fang Chiang, Jim Bosch, John Parejko, John Swinbank, Lauren MacArthur, Paul Price