Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-1130 refactor C++ Algorithm classes
  3. DM-1523

fix interaction of slots with SafeCentroidExtractor

    XMLWordPrintable

    Details

    • Sprint:
      Science Pipelines DM-W15-2
    • Team:
      Data Release Production

      Description

      SafeCentroidExtractor doesn't work properly for algorithms that are centroiders themselves (especially true for anything that could be used in the centroid slot).

      Fixing this also requires setting up the slots when the Schema is defined (in the measurement task's constructor), not when the SourceTable object is created, which is something we probably should have done back when slot definitions were moved from SourceTable to Schema.

        Attachments

          Activity

          Hide
          jbosch Jim Bosch added a comment -

          Ready for review on branch u/jbosch/DM-1523. Here's the summary from git of what changed:

          meas_base:u/jbosch/DM-1523 % git --no-pager log --stat --reverse LSST/master..  
          commit 0f38633d5f9d3c77f74251b8edf269daf3244e7e
          Author: Jim Bosch <jbosch@astro.princeton.edu>
          Date:   Mon Nov 17 12:22:40 2014 -0500
           
              Ensure slot centroid is run first in forced mode, just as it in in SFM.
              
              Also improved code comment for the same logic in SFM.
           
           python/lsst/meas/base/forcedMeasurement.py |    6 ++++++
           python/lsst/meas/base/sfm.py               |    4 +++-
           2 files changed, 9 insertions(+), 1 deletion(-)
           
          commit 554f2bd57b4c305994a8f4dcebe5a653fef0a4ea
          Author: Jim Bosch <jbosch@astro.princeton.edu>
          Date:   Mon Nov 17 12:24:49 2014 -0500
           
              Move slot setup from Task run() methods to constructors.
              
              It may be useful to algorithms to know what the slots point to before they're
              initialized, and now that the slots are managed by Schema aliases instead of
              SourceTable, we can do that.
           
           python/lsst/meas/base/base.py              |   18 +++++++++---------
           python/lsst/meas/base/forcedMeasurement.py |    2 +-
           python/lsst/meas/base/sfm.py               |    2 +-
           3 files changed, 11 insertions(+), 11 deletions(-)
           
          commit 4916b9fb3c0f3d03ee9e4ce37030acd6ba9811d7
          Author: Jim Bosch <jbosch@astro.princeton.edu>
          Date:   Mon Nov 17 13:04:06 2014 -0500
           
              Fixups to SFM test to adapt to changes in slot definition.
              
              The previous state of this test seemed a little incorrect - the task was initialized
              twice, but only used once - probably due to a merge conflict resolution problem.
              But that didn't cause a failure until the preceding commit that moves slot definition
              from the run() method of the task to the ctor.
           
           tests/testSingleFrameMeasurement.py |   10 +++++-----
           1 file changed, 5 insertions(+), 5 deletions(-)
           
          commit 1d684a1af292dbb66ef35f1f1cee35731e1410ad
          Author: Jim Bosch <jbosch@astro.princeton.edu>
          Date:   Mon Nov 17 13:47:52 2014 -0500
           
              Make flag aliases in safe extractors direct
              
              Instead of using the slot flag alias as the target, we want to resolve the slot flag
              alias, and use its target as the target, in order to keep things consistent if slots are
              modified after measurement.
           
           include/lsst/meas/base/InputUtilities.h |   10 ++++------
           src/InputUtilities.cc                   |   16 ++++++++++++----
           tests/testPsfFlux.py                    |    2 +-
           3 files changed, 17 insertions(+), 11 deletions(-)
           
          commit 6c83c3f612e950f421518956a54f49fa607fa285
          Author: Jim Bosch <jbosch@astro.princeton.edu>
          Date:   Mon Nov 17 14:43:45 2014 -0500
           
              Add special handling for centroiders in SafeCentroidExtractor
              
              Unlike other algorithms, Centroiders shouldn't consider it a failure if we can't feed
              them anything better than the Peak, because that's how they'd need to be run if
              they were the first one.  And we need to check that they aren't the slot centroider
              before we try to make an alias back to the slot centroider.
           
           include/lsst/meas/base/InputUtilities.h |    8 ++-
           src/InputUtilities.cc                   |  101 ++++++++++++++++++++-----------
           2 files changed, 74 insertions(+), 35 deletions(-)

          Show
          jbosch Jim Bosch added a comment - Ready for review on branch u/jbosch/ DM-1523 . Here's the summary from git of what changed: meas_base:u/jbosch/DM-1523 % git --no-pager log --stat --reverse LSST/master.. commit 0f38633d5f9d3c77f74251b8edf269daf3244e7e Author: Jim Bosch <jbosch@astro.princeton.edu> Date: Mon Nov 17 12:22:40 2014 -0500   Ensure slot centroid is run first in forced mode, just as it in in SFM. Also improved code comment for the same logic in SFM.   python/lsst/meas/base/forcedMeasurement.py | 6 ++++++ python/lsst/meas/base/sfm.py | 4 +++- 2 files changed, 9 insertions(+), 1 deletion(-)   commit 554f2bd57b4c305994a8f4dcebe5a653fef0a4ea Author: Jim Bosch <jbosch@astro.princeton.edu> Date: Mon Nov 17 12:24:49 2014 -0500   Move slot setup from Task run() methods to constructors. It may be useful to algorithms to know what the slots point to before they're initialized, and now that the slots are managed by Schema aliases instead of SourceTable, we can do that.   python/lsst/meas/base/base.py | 18 +++++++++--------- python/lsst/meas/base/forcedMeasurement.py | 2 +- python/lsst/meas/base/sfm.py | 2 +- 3 files changed, 11 insertions(+), 11 deletions(-)   commit 4916b9fb3c0f3d03ee9e4ce37030acd6ba9811d7 Author: Jim Bosch <jbosch@astro.princeton.edu> Date: Mon Nov 17 13:04:06 2014 -0500   Fixups to SFM test to adapt to changes in slot definition. The previous state of this test seemed a little incorrect - the task was initialized twice, but only used once - probably due to a merge conflict resolution problem. But that didn't cause a failure until the preceding commit that moves slot definition from the run() method of the task to the ctor.   tests/testSingleFrameMeasurement.py | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)   commit 1d684a1af292dbb66ef35f1f1cee35731e1410ad Author: Jim Bosch <jbosch@astro.princeton.edu> Date: Mon Nov 17 13:47:52 2014 -0500   Make flag aliases in safe extractors direct Instead of using the slot flag alias as the target, we want to resolve the slot flag alias, and use its target as the target, in order to keep things consistent if slots are modified after measurement.   include/lsst/meas/base/InputUtilities.h | 10 ++++------ src/InputUtilities.cc | 16 ++++++++++++---- tests/testPsfFlux.py | 2 +- 3 files changed, 17 insertions(+), 11 deletions(-)   commit 6c83c3f612e950f421518956a54f49fa607fa285 Author: Jim Bosch <jbosch@astro.princeton.edu> Date: Mon Nov 17 14:43:45 2014 -0500   Add special handling for centroiders in SafeCentroidExtractor Unlike other algorithms, Centroiders shouldn't consider it a failure if we can't feed them anything better than the Peak, because that's how they'd need to be run if they were the first one. And we need to check that they aren't the slot centroider before we try to make an alias back to the slot centroider.   include/lsst/meas/base/InputUtilities.h | 8 ++- src/InputUtilities.cc | 101 ++++++++++++++++++++----------- 2 files changed, 74 insertions(+), 35 deletions(-)
          Hide
          pgee Perry Gee added a comment -

          Looking through the code in InputUtilities.cc brings up a question I have meant to ask. Why are we using an alias here at all? I assume that you use this kind of "referential integrity", for lack of a better term, because is robust to changes which could occur later in the schema. For example, the centroid slot could get changed later. But even if you are careful to be sure that the badCentroid flag points to the right Centroid failure flag, isn't it possible that the flag could get altered after the measurement it depends on is done? Or is that never anticipated?

          Show
          pgee Perry Gee added a comment - Looking through the code in InputUtilities.cc brings up a question I have meant to ask. Why are we using an alias here at all? I assume that you use this kind of "referential integrity", for lack of a better term, because is robust to changes which could occur later in the schema. For example, the centroid slot could get changed later. But even if you are careful to be sure that the badCentroid flag points to the right Centroid failure flag, isn't it possible that the flag could get altered after the measurement it depends on is done? Or is that never anticipated?
          Hide
          pgee Perry Gee added a comment -

          InputUtilities.cc

          See previous comment

          Also, extractPeak should not throw the message "Centroid slot value is NaN" – it should pass the message up to the calling routine, which actually knows why extractPeak is being called, or it should print something like "Centroid peak is needed for this record, but Footprint has no Peaks".

          testSingleFrameMeasurement.py – this is really a comment about schema.disconnectAliases. I thought that any change in the schema would cause a copy to get made, and wouldn't that disconnect the two?

          Show
          pgee Perry Gee added a comment - InputUtilities.cc See previous comment Also, extractPeak should not throw the message "Centroid slot value is NaN" – it should pass the message up to the calling routine, which actually knows why extractPeak is being called, or it should print something like "Centroid peak is needed for this record, but Footprint has no Peaks". testSingleFrameMeasurement.py – this is really a comment about schema.disconnectAliases. I thought that any change in the schema would cause a copy to get made, and wouldn't that disconnect the two?
          Hide
          pgee Perry Gee added a comment -

          Comments for review in the previous comments. Otherwise, looks good.

          Show
          pgee Perry Gee added a comment - Comments for review in the previous comments. Otherwise, looks good.
          Hide
          jbosch Jim Bosch added a comment -

          Why are we using an alias here at all?

          As opposed to just having a separate field that we just copy the same flag value into, I think the main reason I prefer using an alias is that it's somewhat self-documenting - the fact that it is an alias conveys the useful information that the badCentroid flag's value is determined strictly by the centroid slot's failure flag, rather than giving the user the (erroneous) impression that there's some other logic involved that could allow them to differ. It also saves a bit per record, and while still I don't think that's enough to matter, we will have many algorithms that depend on the algorithm, so it probably would add up to a byte or two per record.

          But even if you are careful to be sure that the badCentroid flag points to the right Centroid failure flag, isn't it possible that the flag could get altered after the measurement it depends on is done? Or is that never anticipated?

          It's just not anticipated. Or, rather, if the user wants to change field values that "belong" to an algorithm after that algorithm has been run, it's their fault the catalog doesn't make sense anymore.

          Also, extractPeak should not throw the message "Centroid slot value is NaN" – it should pass the message up to the calling routine, which actually knows why extractPeak is being called, or it should print something like "Centroid peak is needed for this record, but Footprint has no Peaks".

          I tried to make it so SafeCentroidExtractor only throws when a very unusual error condition occurs, one that indicates what the measurement framework would consider a bug in a previous algorithm or in detection/deblending. That includes having no peak (having a peak with non-NaN values should be guaranteed by detection/deblender), and when the centroider outputs NaNs but does not set its flag. In those cases, I'm throwing an exception because I wanted that exception to propagate up and generate a log message. But it's a good point that the log message should explain better that the current algorithm is encountering this problem because it needs the centroid as input; I'll fix that.

          testSingleFrameMeasurement.py – this is really a comment about schema.disconnectAliases. I thought that any change in the schema would cause a copy to get made, and wouldn't that disconnect the two?

          You're mostly right, but that's actually not true of the Schema's aliases. I agonized about that decision quite a bit, as I couldn't find a solution that I really liked without having to change a lot of afw::table in backwards-incompatible ways. Making the aliases not copy-on-write (admittedly confusing, but something I thought would rarely come up) was the best compromise I could come up with. The problem is with code like this:

          catalog.getSchema().getAliasMap()->set("foo", "bar");

          That needs to set an alias on the catalog's internal schema in order to allow slots to be modified after a catalog has been constructed, but if aliases were copy-on-write, it'd only set the alias on a copy. What we'd need to make this work is some sort of Schema reference type that allowed aliases to be changed without letting other parts of the schema be changed.

          Show
          jbosch Jim Bosch added a comment - Why are we using an alias here at all? As opposed to just having a separate field that we just copy the same flag value into, I think the main reason I prefer using an alias is that it's somewhat self-documenting - the fact that it is an alias conveys the useful information that the badCentroid flag's value is determined strictly by the centroid slot's failure flag, rather than giving the user the (erroneous) impression that there's some other logic involved that could allow them to differ. It also saves a bit per record, and while still I don't think that's enough to matter, we will have many algorithms that depend on the algorithm, so it probably would add up to a byte or two per record. But even if you are careful to be sure that the badCentroid flag points to the right Centroid failure flag, isn't it possible that the flag could get altered after the measurement it depends on is done? Or is that never anticipated? It's just not anticipated. Or, rather, if the user wants to change field values that "belong" to an algorithm after that algorithm has been run, it's their fault the catalog doesn't make sense anymore. Also, extractPeak should not throw the message "Centroid slot value is NaN" – it should pass the message up to the calling routine, which actually knows why extractPeak is being called, or it should print something like "Centroid peak is needed for this record, but Footprint has no Peaks". I tried to make it so SafeCentroidExtractor only throws when a very unusual error condition occurs, one that indicates what the measurement framework would consider a bug in a previous algorithm or in detection/deblending. That includes having no peak (having a peak with non-NaN values should be guaranteed by detection/deblender), and when the centroider outputs NaNs but does not set its flag. In those cases, I'm throwing an exception because I wanted that exception to propagate up and generate a log message. But it's a good point that the log message should explain better that the current algorithm is encountering this problem because it needs the centroid as input; I'll fix that. testSingleFrameMeasurement.py – this is really a comment about schema.disconnectAliases. I thought that any change in the schema would cause a copy to get made, and wouldn't that disconnect the two? You're mostly right, but that's actually not true of the Schema's aliases. I agonized about that decision quite a bit, as I couldn't find a solution that I really liked without having to change a lot of afw::table in backwards-incompatible ways. Making the aliases not copy-on-write (admittedly confusing, but something I thought would rarely come up) was the best compromise I could come up with. The problem is with code like this: catalog.getSchema().getAliasMap()->set("foo", "bar"); That needs to set an alias on the catalog's internal schema in order to allow slots to be modified after a catalog has been constructed, but if aliases were copy-on-write, it'd only set the alias on a copy. What we'd need to make this work is some sort of Schema reference type that allowed aliases to be changed without letting other parts of the schema be changed.
          Hide
          pgee Perry Gee added a comment -

          Actually, my issue with extractCentroid is that it doesn't know why it is being called, and normal protocol with exceptions is to throw the exception up the line if you don't know what the right error message is. The caller of extractCentroid knows what is wrong with the available Centroid.

          Show
          pgee Perry Gee added a comment - Actually, my issue with extractCentroid is that it doesn't know why it is being called, and normal protocol with exceptions is to throw the exception up the line if you don't know what the right error message is. The caller of extractCentroid knows what is wrong with the available Centroid.
          Hide
          jbosch Jim Bosch added a comment -

          Do you mean extractPeak or SafeCentroidExtractor::operator()? I guess in either case, my point would be that the real problem isn't something the caller knows much about - the problem is in the state the record was in before it got to the caller (i.e. something went wrong with the centroider, or in detect/deblend, or in the executionOrder of the plugins) and I do think these functions know enough to create a sensible error messages about those conditions.

          Show
          jbosch Jim Bosch added a comment - Do you mean extractPeak or SafeCentroidExtractor::operator() ? I guess in either case, my point would be that the real problem isn't something the caller knows much about - the problem is in the state the record was in before it got to the caller (i.e. something went wrong with the centroider, or in detect/deblend, or in the executionOrder of the plugins) and I do think these functions know enough to create a sensible error messages about those conditions.
          Hide
          jbosch Jim Bosch added a comment -

          I've just added another commit to u/jbosch/DM-1523 with what I think are improved error messages (but no change in where they're generated).

          Show
          jbosch Jim Bosch added a comment - I've just added another commit to u/jbosch/ DM-1523 with what I think are improved error messages (but no change in where they're generated).

            People

            Assignee:
            jbosch Jim Bosch
            Reporter:
            jbosch Jim Bosch
            Reviewers:
            Perry Gee
            Watchers:
            Jim Bosch, Perry Gee
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Jenkins

                No builds found.