Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15891

Error in flux/instFlux alias setting following DM-10302

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: afw
    • Story Points:
      4
    • Sprint:
      AP F18-4, AP F18-5
    • Team:
      Alert Production

      Description

      In updating the pipe_analysis scripts following the changes on DM-10302, I noticed that when reading an old (I think it would be version 2) catalog, some of the aliases aren't getting set properly.  A few examples are:

      'base_SdssShape_flux_xxinstFlux'->'base_SdssShape_flux_xx_Cov'
      'base_SdssShape_flux_xyinstFlux'->'base_SdssShape_flux_xy_Cov'
      'base_SdssShape_flux_yyinstFlux'->'base_SdssShape_flux_yy_Cov'
      'base_Blendedness_abs_flux_cinstFlux'->'base_Blendedness_abs_flux_child'
      'base_Blendedness_abs_flux_painstFlux'->'base_Blendedness_abs_flux_parent'
      'base_Blendedness_raw_flux_cinstFlux'->'base_Blendedness_raw_flux_child'
      'base_Blendedness_raw_flux_painstFlux'->'base_Blendedness_raw_flux_parent'
      

      It seem that if "flux" is in the string, the characters beyond the last "" are replaced with "instFlux", regardless of what they are nor where the "_flux" occurred in the original string.

        Attachments

          Issue Links

            Activity

            No builds found.
            lauren Lauren MacArthur created issue -
            lauren Lauren MacArthur made changes -
            Field Original Value New Value
            Link This issue relates to DM-10302 [ DM-10302 ]
            lauren Lauren MacArthur made changes -
            Risk Score 0
            Hide
            Parejkoj John Parejko added a comment -

            Ugh. I understand exactly how this happened, but I'm not sure I know what the best solution is. The same thing probably happens with the v1->v2 aliases, but those were only used for a month before I introduced v3.

            And we don't have any catalogs in afwdata that could be used for testing, either.

            Show
            Parejkoj John Parejko added a comment - Ugh. I understand exactly how this happened, but I'm not sure I know what the best solution is. The same thing probably happens with the v1->v2 aliases, but those were only used for a month before I introduced v3. And we don't have any catalogs in afwdata that could be used for testing, either.
            Parejkoj John Parejko made changes -
            Priority Undefined [ 10000 ] Major [ 3 ]
            Parejkoj John Parejko made changes -
            Labels SciencePipelines
            Hide
            lauren Lauren MacArthur added a comment -

            You could always use any pre-w_2018_38 catalog from the RC2 reprocessing for testing (if you're on lsst-dev, that is).

            Show
            lauren Lauren MacArthur added a comment - You could always use any pre- w_2018_38 catalog from the RC2 reprocessing for testing (if you're on lsst-dev, that is).
            Hide
            lauren Lauren MacArthur added a comment -

            Quick & dirty, this'll show you what you need to know:

            import lsst.daf.persistence as dafPersist
             
            butlerDir = "/datasets/hsc/repo/rerun/RC/w_2018_36/DM-15603/"
            butler = dafPersist.Butler(butlerDir)
            dataId = {'filter': "HSC-I", 'ccd': 49, 'visit': 1228}
            cat = butler.get("src", dataId)
            print(cat.schema)
            

            Show
            lauren Lauren MacArthur added a comment - Quick & dirty, this'll show you what you need to know: import lsst.daf.persistence as dafPersist   butlerDir = "/datasets/hsc/repo/rerun/RC/w_2018_36/DM-15603/" butler = dafPersist.Butler(butlerDir) dataId = { 'filter' : "HSC-I" , 'ccd' : 49 , 'visit' : 1228 } cat = butler.get( "src" , dataId) print (cat.schema)
            Hide
            Parejkoj John Parejko added a comment -

            Yes, but we need to have something in a test data repository so we can write tests for them.

            Show
            Parejkoj John Parejko added a comment - Yes, but we need to have something in a test data repository so we can write tests for them.
            Hide
            lauren Lauren MacArthur added a comment -

            Oh, that kind of testing...indeed.

            Show
            lauren Lauren MacArthur added a comment - Oh, that kind of testing...indeed.
            Parejkoj John Parejko made changes -
            Link This issue relates to DM-15857 [ DM-15857 ]
            Hide
            swinbank John Swinbank added a comment -

            John Parejko, given that we just talked about this, are you already working on it?

            (For the record, we said “adding some small catalogues to afwdata would be fine”).

            Show
            swinbank John Swinbank added a comment - John Parejko , given that we just talked about this, are you already working on it? (For the record, we said “adding some small catalogues to afwdata would be fine”).
            swinbank John Swinbank made changes -
            Assignee John Parejko [ parejkoj ]
            swinbank John Swinbank made changes -
            Team Data Release Production [ 10301 ] Alert Production [ 10300 ]
            swinbank John Swinbank made changes -
            Epic Link DM-14447 [ 80385 ]
            swinbank John Swinbank made changes -
            Story Points 1
            Hide
            lauren Lauren MacArthur added a comment -

            Ok, I just noticed some behavior I don't think is correct.  If I run the above snippet with w_2018_34 setup, then:

            In [3]: cat.schema.VERSION
            Out[3]: 2
            

            However, with w_2018_38 (i.e. the current weekly) setup, I get:

            In [3]: cat.schema.VERSION
            Out[3]: 3
            

            I don't think the stack is meant to change the version of a schema (but I may be missing something subtle here)!

            Show
            lauren Lauren MacArthur added a comment - Ok, I just noticed some behavior I don't think is correct.  If I run the above snippet with w_2018_34 setup, then: In [3]: cat.schema.VERSION Out[3]: 2 However, with w_2018_38 (i.e. the current weekly) setup, I get: In [3]: cat.schema.VERSION Out[3]: 3 I don't think the stack is meant to change the version of a schema (but I may be missing something subtle here)!
            Parejkoj John Parejko made changes -
            Story Points 1 4
            Hide
            Parejkoj John Parejko added a comment -

            I believe the schema version changes (Jim Bosch can confirm), because when you save it it saves all the new aliases. As I just discovered when trying to make a stripped down catalog to put in afw/tests/data. Which, incidentally, is where all afw test data more recent than ~2014 lives. Not in afwdata, which we should think about just removing (once we figure out whether the existing tests that use it actually matter).

            I'm working on fixing DM-15857 first: I don't think the above is actually a breaking change, just a bunch of unnecessary aliases. Or is it? What exactly is base_SdssShape_flux_xx_Cov, anyway? I don't think it's a "flux" field at all, but I'm not sure.

            Show
            Parejkoj John Parejko added a comment - I believe the schema version changes ( Jim Bosch can confirm), because when you save it it saves all the new aliases. As I just discovered when trying to make a stripped down catalog to put in afw/tests/data . Which, incidentally, is where all afw test data more recent than ~2014 lives. Not in afwdata , which we should think about just removing (once we figure out whether the existing tests that use it actually matter). I'm working on fixing DM-15857 first: I don't think the above is actually a breaking change, just a bunch of unnecessary aliases. Or is it? What exactly is base_SdssShape_flux_xx_Cov , anyway? I don't think it's a "flux" field at all, but I'm not sure.
            Hide
            lauren Lauren MacArthur added a comment -

            name="base_SdssShape_flux_xx_Cov", doc="uncertainty covariance between base_SdssShape_flux and base_SdssShape_xx", units="count*pixel^2"
            

            Whatever that means...

            Show
            lauren Lauren MacArthur added a comment - name="base_SdssShape_flux_xx_Cov", doc="uncertainty covariance between base_SdssShape_flux and base_SdssShape_xx", units="count*pixel^2" Whatever that means...
            Hide
            lauren Lauren MacArthur added a comment - - edited

            So, if I'm reading a catalog that was written as VERSION 2, how can I tell that upon reading it with a current stack (Jim Bosch)?

            Show
            lauren Lauren MacArthur added a comment - - edited So, if I'm reading a catalog that was written as VERSION 2, how can I tell that upon reading it with a current stack ( Jim Bosch )?
            Hide
            jbosch Jim Bosch added a comment -

            cat.schema.VERSION is actually a constant, not something specific to cat, and it just says what version new schemas are now created with.  It's there primarily so the I/O code can notice when it's reading something old so it can modify it to look like something new.  I wouldn't expect anything outside the I/O code to need to know what version a particular on-disk schema has; do you think you do?

            Show
            jbosch Jim Bosch added a comment - cat.schema.VERSION is actually a constant, not something specific to cat , and it just says what version new schemas are now created with.  It's there primarily so the I/O code can notice when it's reading something old so it can modify it to look like something new.  I wouldn't expect anything outside the I/O code to need to know what version a particular on-disk schema has; do you think you do?
            Hide
            lauren Lauren MacArthur added a comment -

            Ah, ok.  Yeah, I think I do (but that may be a red herring that I'm doing something silly...).  Basically, I have a function I use to look up all "flux" keys so that I can loop through these and perform an operation on all fluxes in the catalog (and their errors, which I find by just appending Err to the key names in my fluxKeys list).  I identify them with re.search(r"^(\w+_flux)$" of the schema keys.  If I just replace "_flux" with "_instFlux" in my search, I get no keys found for an older (VERSION < 3) catalog (I guess the alias mapping of _instFlux --> _flux doesn't work as the key name in the schema is still just _flux).  We can take this offline (apologies for the noise on this ticket!), but I'd love to hear if there's a better way I should be doing this!

            Show
            lauren Lauren MacArthur added a comment - Ah, ok.  Yeah, I think I do (but that may be a red herring that I'm doing something silly...).  Basically, I have a function I use to look up all "flux" keys so that I can loop through these and perform an operation on all fluxes in the catalog (and their errors, which I find by just appending Err to the key names in my fluxKeys list).  I identify them with re.search(r"^(\w+_flux)$" of the schema keys.  If I just replace "_flux" with "_instFlux" in my search, I get no keys found for an older (VERSION < 3) catalog (I guess the alias mapping of _instFlux --> _flux doesn't work as the key name in the schema is still just _flux ).  We can take this offline (apologies for the noise on this ticket!), but I'd love to hear if there's a better way I should be doing this!
            lauren Lauren MacArthur made changes -
            Description In updating the `pipe_analysis` scripts following the changes on DM-10302, I noticed that when reading an old (I think it would be version 2) catalog, some of the aliases aren't getting set properly.  A few examples are:

            {noformat}
            'base_SdssShape_flux_xxinstFlux'->'base_SdssShape_flux_xx_Cov'
            'base_SdssShape_flux_xyinstFlux'->'base_SdssShape_flux_xy_Cov'
            'base_SdssShape_flux_yyinstFlux'->'base_SdssShape_flux_yy_Cov'
            'base_Blendedness_abs_flux_cinstFlux'->'base_Blendedness_abs_flux_child'
            'base_Blendedness_abs_flux_painstFlux'->'base_Blendedness_abs_flux_parent'
            'base_Blendedness_raw_flux_cinstFlux'->'base_Blendedness_raw_flux_child'
            'base_Blendedness_raw_flux_painstFlux'->'base_Blendedness_raw_flux_parent'
            {noformat}

            It seem that if "\_flux" is in the string, the characters beyond the last "\_" are replaced with "instFlux", regardless of what they are nor where the "_flux" occurred in the original string.
            In updating the pipe_analysis scripts following the changes on DM-10302, I noticed that when reading an old (I think it would be version 2) catalog, some of the aliases aren't getting set properly.  A few examples are:
            {noformat}'base_SdssShape_flux_xxinstFlux'->'base_SdssShape_flux_xx_Cov'
            'base_SdssShape_flux_xyinstFlux'->'base_SdssShape_flux_xy_Cov'
            'base_SdssShape_flux_yyinstFlux'->'base_SdssShape_flux_yy_Cov'
            'base_Blendedness_abs_flux_cinstFlux'->'base_Blendedness_abs_flux_child'
            'base_Blendedness_abs_flux_painstFlux'->'base_Blendedness_abs_flux_parent'
            'base_Blendedness_raw_flux_cinstFlux'->'base_Blendedness_raw_flux_child'
            'base_Blendedness_raw_flux_painstFlux'->'base_Blendedness_raw_flux_parent'
            {noformat}
            It seem that if "_flux" is in the string, the characters beyond the last "_" are replaced with "instFlux", regardless of what they are nor where the "_flux" occurred in the original string.
            Hide
            jbosch Jim Bosch added a comment -

            Does

            schema.extract("*_instFlux")

            find them all?

            Show
            jbosch Jim Bosch added a comment - Does schema.extract("*_instFlux") find them all?
            Parejkoj John Parejko made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            Parejkoj John Parejko made changes -
            Sprint AP F18-4 [ 749 ]
            Hide
            lauren Lauren MacArthur added a comment -

            Jim Bosch, yes, that works, thanks.  I also realized I could get the table version from the catalog FITS header:

            HIERARCH AFW_TABLE_VERSION
            

            Is there any reason to prefer one over the other?

            Show
            lauren Lauren MacArthur added a comment - Jim Bosch , yes, that works, thanks.  I also realized I could get the table version from the catalog FITS header: HIERARCH AFW_TABLE_VERSION Is there any reason to prefer one over the other?
            Hide
            jbosch Jim Bosch added a comment -

            Definitely better to use schema.extract; it's designed to combine regular fields and aliases.  It's also best in general to avoid reading FITS header entries directly, though that may be harder to avoid in your line of work than most others.

            Show
            jbosch Jim Bosch added a comment - Definitely better to use schema.extract ; it's designed to combine regular fields and aliases.  It's also best in general to avoid reading FITS header entries directly, though that may be harder to avoid in your line of work than most others.
            Hide
            lauren Lauren MacArthur added a comment -

            Awesome, thanks.  As you will have just seen, I have just discovered that I will have to use this trick in meas_mosaic as well (for backwards compatibility with older catalogs, DM-15916), so I wanted to double check.

            Show
            lauren Lauren MacArthur added a comment - Awesome, thanks.  As you will have just seen, I have just discovered that I will have to use this trick in meas_mosaic as well (for backwards compatibility with older catalogs, DM-15916 ), so I wanted to double check.
            Parejkoj John Parejko made changes -
            Assignee John Parejko [ parejkoj ] Jim Bosch [ jbosch ]
            Hide
            Parejkoj John Parejko added a comment -

            Jim Bosch is fixing this as part of DM-15857.

            Show
            Parejkoj John Parejko added a comment - Jim Bosch is fixing this as part of DM-15857 .
            swinbank John Swinbank made changes -
            Sprint AP F18-4 [ 749 ] AP F18-4, AP F18-5 [ 749, 750 ]
            Hide
            lauren Lauren MacArthur added a comment -

            I've just discovered another issue related to the alias setting in regards to backwards compatibility in the context of joining reference matches to catalogs.  The matchesToCatalog function in afw (here) is a handy function that joins the matches found with the associated catalog entry into a single "match" catalog, putting a ref_ or src_ prefix on the field names for the reference and source fields, respectively. However, the aliases don't get propagated, so this function is currently not backwards compatible with old catalogs. I may be the only one who makes use of this function and I could easily hack a fix for this in the pipe_analysis scripts, but it does live in afw, so I wonder if the function itself should be updated to also include the aliases. What do you think? (Also pinging Paul Price on this one as I believe he was the original author of matchesToCatalog)

            Show
            lauren Lauren MacArthur added a comment - I've just discovered another issue related to the alias setting in regards to backwards compatibility in the context of joining reference matches to catalogs.  The  matchesToCatalog function in afw ( here ) is a handy function that joins the matches found with the associated catalog entry into a single "match" catalog, putting a ref_ or src_ prefix on the field names for the reference and source fields, respectively. However, the aliases don't get propagated, so this function is currently not backwards compatible with old catalogs. I may be the only one who makes use of this function and I could easily hack a fix for this in the pipe_analysis scripts, but it does live in afw , so I wonder if the function itself should be updated to also include the aliases. What do you think? (Also pinging Paul Price on this one as I believe he was the original author of  matchesToCatalog)
            Hide
            price Paul Price added a comment -

            If you're looking at lsst.afw.table.matchesToCatalog, you should also look at last.meas.astrom.denormalizeMatches, which is incredibly similar and more widely used.

            Show
            price Paul Price added a comment - If you're looking at lsst.afw.table.matchesToCatalog , you should also look at last.meas.astrom.denormalizeMatches , which is incredibly similar and more widely used.
            Hide
            Parejkoj John Parejko added a comment -

            Lauren MacArthur: I'm marking this done, as the original bug you described should be fixed in DM-15857, which I just merged. If the bug you describe in your last comment remains, please file a new ticket (and try Paul Price's suggestion too).

            Show
            Parejkoj John Parejko added a comment - Lauren MacArthur : I'm marking this done, as the original bug you described should be fixed in DM-15857 , which I just merged. If the bug you describe in your last comment remains, please file a new ticket (and try Paul Price 's suggestion too).
            Parejkoj John Parejko made changes -
            Resolution Done [ 10000 ]
            Status In Progress [ 3 ] Done [ 10002 ]
            Hide
            lauren Lauren MacArthur added a comment -

            Thanks...doing it now: DM-16023

            Show
            lauren Lauren MacArthur added a comment - Thanks...doing it now: DM-16023
            swinbank John Swinbank made changes -
            Assignee Jim Bosch [ jbosch ] John Parejko [ parejkoj ]

              People

              Assignee:
              Parejkoj John Parejko
              Reporter:
              lauren Lauren MacArthur
              Watchers:
              Hsin-Fang Chiang, Jim Bosch, John Parejko, John Swinbank, Lauren MacArthur, Paul Price
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins Builds

                  No builds found.