XMLWordPrintable

#### Details

• Type: Story
• Status: To Do
• Resolution: Unresolved
• Fix Version/s: None
• Component/s:
• Labels:
None
• Team:
Data Release Production
• Urgent?:
No

#### Description

In moving towards reading in the "src"-equivalent parquet tables now being persisted in pipe_tasks's postprocessing.py script, a performance hit was encountered when trying to apply an external photoCalib to the visit-level source catalogs. The correction is being made using photoCalib.instFluxToNanojansky which, currently, only includes overloads involving SourceCatalog}}s, or single entry {{SourceRecord or scalar values for the flux[fluxErr][position]. When reading in the parquet tables, we do not have a SourceCatalog to pass in, so this correction must be done by looping over each source to pass in the individual flux/fluxErr/position values as scalars, which is very slow (and sort of defeats the performance-boosting motivation for reading the parquet tables in the first place). Having an override of this function that accepts arrays of flux/fluxError/position that is efficient would solve this issue.

The hint in the summary that there may be more methods on photoCalib that would benefit from having array-based overloads refers to, e.g. instFluxToMagnitude, magnitudeToInstFlux, calibrateCatalog, ....?

#### Activity

Hide
John Parejko added a comment -

If we want just array access, what should the API for this look like? Should it return arrays or astropy Quantities since these things have units? Do we include units in our existing Parquet or Pandas data or not (is it even possible)? A short worked example data flow would help sort out the appropriate API. The first example that comes to my mind might look like:

 array = photoCalib.instFluxToFluxAndMagnitude(x, y, instFlux, instFluxErr) # array is an astropy.Table with array['flux'] , array['fluxErr'], array['mag'], array['magErr'] each with appropriate units. 

Show
John Parejko added a comment - If we want just array access, what should the API for this look like? Should it return arrays or astropy Quantities since these things have units? Do we include units in our existing Parquet or Pandas data or not (is it even possible)? A short worked example data flow would help sort out the appropriate API. The first example that comes to my mind might look like: array = photoCalib.instFluxToFluxAndMagnitude(x, y, instFlux, instFluxErr) # array is an astropy.Table with array['flux'] , array['fluxErr'], array['mag'], array['magErr'] each with appropriate units.
Hide
Jim Bosch added a comment -

I think

computeLocalCalibration(x: np.ndarray, y: np.ndarray) -> np:ndarray

is the highest priority; there will be code that wants just that, and from it anything else can be easily written in Python without subverting or duplicating PhotoCalib's role.  I would probably be content with just that, as it sidesteps a lot of design questions for which I have only much less confident answers.

But I have no problem with adding instFluxToFlux and instFluxToMagnitude variants with and without errors for convenience, taking and returning vanilla np.ndarray objects (via regular tuple returns if more than one is returned).  I don't think using astropy for unit safety here is necessarily a bad idea (especially if it's in addition to a np.ndarray interface that might then be named/marked as "advanced" or "use with care"), but it's not something I'd want to do just here without a larger (RFC) discussion about when we should and should not use units.  It's a tradeoff between "program defensively" and "don't pay for what you don't need", and neither is always right.

Show
Jim Bosch added a comment - I think computeLocalCalibration(x: np.ndarray, y: np.ndarray) -> np:ndarray is the highest priority; there will be code that wants just that, and from it anything else can be easily written in Python without subverting or duplicating PhotoCalib's role.  I would probably be content with just that, as it sidesteps a lot of design questions for which I have only much less confident answers. But I have no problem with adding instFluxToFlux and instFluxToMagnitude variants with and without errors for convenience, taking and returning vanilla  np.ndarray objects (via regular tuple returns if more than one is returned).  I don't think using astropy for unit safety here is necessarily a bad idea (especially if it's in addition to a np.ndarray interface that might then be named/marked as "advanced" or "use with care"), but it's not something I'd want to do just here without a larger (RFC) discussion about when we should and should not use units.  It's a tradeoff between "program defensively" and "don't pay for what you don't need", and neither is always right.
Hide
John Parejko added a comment -

So, we're basically dropping code for the uncertainty on flux and magnitude at this point?

Show
John Parejko added a comment - So, we're basically dropping code for the uncertainty on flux and magnitude at this point?
Hide
Lauren MacArthur added a comment -

I'd really like the errors!  The placeholder function I'm working on now is fetching them via photoCalib.getCalibrationErr().  On that note (but perhaps a detour from the real discussion here...), when I implemented this, I figured that that was giving me the error on the spatially constant value, so I should scale that by the spatially varying component (if present), so

 variableScaleErrArray = photoCalib.getCalibrationErr()*variableScaleArray 

This gives me very slightly (1000th of a percent-ish!) different errors than the current methods and the difference is in that scaling. Any thoughts (or cares!) on which is more "correct" (and, if the scaled version, is it worth the bother to update the stack computation)?

Show
Lauren MacArthur added a comment - I'd really like the errors!  The placeholder function I'm working on now is fetching them via photoCalib.getCalibrationErr().  On that note (but perhaps a detour from the real discussion here...), when I implemented this, I figured that that was giving me the error on the spatially constant value, so I should scale that by the spatially varying component (if present), so variableScaleErrArray =  photoCalib.getCalibrationErr() * variableScaleArray This gives me very slightly (1000th of a percent-ish!) different errors than the current methods and the difference is in that scaling. Any thoughts (or cares!) on which is more "correct" (and, if the scaled version, is it worth the bother to update the stack computation)?
Hide
Jim Bosch added a comment -

I don't have a problem with keeping code for propagating instFlux uncertainties through the conversion to nJy or AB mag (I did say "with and without errors" above.  I do think we should have that code somewhere in afw because it's complex enough to easily get wrong (even if it's still simple enough that I also don't think we should be bothered if alternate implementations pop up elsewhere if there's a good reason they can't use the centralized one).  And given that PhotoCalib already does a lot of that stuff it's the best place to put such things now, and I should try harder not to sidetrack practical conversations musing about what-I-might-do-differently-if-I-was-starting-from-scratch-isms

The question of what to do about propagating uncertainties in the photometric calibration is a totally different bag of cats.  I think I'd eventually be comfortable just never propagating that information at all, because I think we've learned that it's basically impossible to do correctly in the limit where it matters, without propagating a lot more about how the calibration was fit.  But my memory of those discussions is fuzzy, and I definitely wouldn't be comfortable removing that functionality if it's in active use (as Lauren MacArthur's example strongly implies) in order to satisfy API concerns; that should be a separate science-oriented discussion involving very different kinds of arguments.

So that's one more argument in favor having a suite of methods on PhotoCalib that do all of this, and making this ticket just about add np.ndarray versions of several of them (including, but not limited to, computeLocalCalibration).

Show
Jim Bosch added a comment - I don't have a problem with keeping code for propagating instFlux uncertainties through the conversion to nJy or AB mag (I did say "with and without errors" above.  I do think we should have that code somewhere in afw because it's complex enough to easily get wrong (even if it's still simple enough that I also don't think we should be bothered if alternate implementations pop up elsewhere if there's a good reason they can't use the centralized one).  And given that PhotoCalib already does a lot of that stuff it's the best place to put such things now, and I should try harder not to sidetrack practical conversations musing about what-I-might-do-differently-if-I-was-starting-from-scratch-isms The question of what to do about propagating uncertainties in the photometric calibration is a totally different bag of cats.  I think I'd eventually be comfortable just never propagating that information at all, because I think we've learned that it's basically impossible to do correctly in the limit where it matters, without propagating a lot more about how the calibration was fit.  But my memory of those discussions is fuzzy, and I definitely wouldn't be comfortable removing that functionality if it's in active use (as Lauren MacArthur 's example strongly implies) in order to satisfy API concerns; that should be a separate science-oriented discussion involving very different kinds of arguments. So that's one more argument in favor having a suite of methods on PhotoCalib that do all of this, and making this ticket just about add np.ndarray versions of several of them (including, but not limited to, computeLocalCalibration ).

#### People

Assignee:
Unassigned
Reporter:
Lauren MacArthur
Watchers:
Eli Rykoff, Jim Bosch, John Parejko, Lauren MacArthur, Yusra AlSayyad