 # PTC task should produce a linearity model

XMLWordPrintable

## Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
12
• Team:
Data Release Production

## Description

Goal: The PTC task should calculate a linearity model, in whatever form is necessary for isrTask's linearity correction to be able to run and use it.

This ticket captures the thinking and algorithmic work necessary to generate linearity models, but isn't concerned with how they are persisted, or passed from task to task.

This ticket used to say:

In talking with Andrés, it was decided that the de facto title for this ticket is actually "Write a task that creates a linearity model, which could either be the PTC task itself, a wrapper around the PTC task, or a separate task which uses the output of the PTC task", but that's a bit long to actually change it to, so keeping the title the same, but this encodes the essence of the ticket.

## Activity

How exactly should the linearity model be calculated given the input to the PTC task? (i.e., a set of flat pairs at different exposure times).

Should a quadratic fit be performed to the signal vs time curve? If the signal is modeled as S = C0 + C1*t + C2*(C1*t)^2,  with C1*t ≡ Q(t) the charge and C1 the flux in e-/s, then Q(t) = (1/2*C2)* (−1 + sqrt(1 − 4*C2*(C0 − S) )).

Alternatively, a look up table per amplifier (something like "i j ADU (measured) ADU (lineal)") could be estimated. How can a table like this be calculated from the input to the PTC task? Don't we need information from a calibration sensor/diode?

Show Andrés Alejandro Plazas Malagón added a comment - How exactly should the linearity model be calculated given the input to the PTC task? (i.e., a set of flat pairs at different exposure times).  Should a quadratic fit be performed to the signal vs time curve? If the signal is modeled as S = C0 + C1*t + C2*(C1*t)^2,  with C1*t ≡ Q(t) the charge and C1 the flux in e-/s, then Q(t) = (1/2*C2)* (−1 + sqrt(1 − 4*C2*(C0 − S) )).  Alternatively, a look up table per amplifier (something like "i j ADU (measured) ADU (lineal)") could be estimated. How can a table like this be calculated from the input to the PTC task? Don't we need information from a calibration sensor/diode?

It needs to be able to output in both formats.

And no, we don't need any info from a calibration diode - that would just be for calibration/refinement of the values. We might end up needing it in practice, and even needing to add shutter time corrections, but from an algorithmic/dimensionality perspective, it's totally unnecessary.

Does that all make sense/sound reasonable?

Show Merlin Fisher-Levine added a comment - It needs to be able to output in both formats. And no, we don't need any info from a calibration diode - that would just be for calibration/refinement of the values. We might end up needing it in practice, and even needing to add shutter time corrections, but from an algorithmic/dimensionality perspective, it's totally unnecessary. Does that all make sense/sound reasonable?

In general terms, it makes sense. I understand that the PTC task needs to produce (or "persiste", which seems to be synonyms with "save in a file in disk", correct?) whatever "LinearizeSquared" and "LinearizeLookupTable" need.   In order to understand what it is exactly what those two classes need, I looked at the doc strings in

https://github.com/lsst/ip_isr/blob/master/python/lsst/ip/isr/linearize.py

but I still have questions:

1). From LinearizedSquared, what is exactly c0 (yes, it says  that"c0 is linearity coefficient 0 in the AmpInfoCatalog of the detector", but how is it calculated exactly and how do you get it from the input for the PTC task?)

2). For LinearizeLookupTable, it seems that what PTC task needs to produce is a 2D table, with #rows = # of amplifiers and # of columns = the corrected linearity value (one per amplifier), if I understand well () Looking at the doc string, I see: corrImage[i,j] = uncorrImage[i,j] + table[rowInd, colInd]. How is exactly the value table[rowInd, colInd] calculated from the input to the PTC task? I guess one option is just to fit the mean vs time curve to a polynomial and define the linear part as the linearity corrected value. Is that the case?  Are there perhaps examples somewhere of these tables so I can see exactly what the format is? Are there examples of how the values for the tables are calculated?

Show Andrés Alejandro Plazas Malagón added a comment - In general terms, it makes sense. I understand that the PTC task needs to produce (or "persiste", which seems to be synonyms with "save in a file in disk", correct?) whatever "LinearizeSquared" and "LinearizeLookupTable" need.   In order to understand what it is exactly what those two classes need, I looked at the doc strings in  https://github.com/lsst/ip_isr/blob/master/python/lsst/ip/isr/linearize.py but I still have questions:  1). From LinearizedSquared, what is exactly c0 (yes, it says  that"c0 is linearity coefficient 0 in the AmpInfoCatalog of the detector", but how is it calculated exactly and how do you get it from the input for the PTC task?) 2). For LinearizeLookupTable, it seems that what PTC task needs to produce is a 2D table, with #rows = # of amplifiers and # of columns = the corrected linearity value (one per amplifier), if I understand well () Looking at the doc string, I see: corrImage [i,j] = uncorrImage [i,j] + table [rowInd, colInd] . How is exactly the value table [rowInd, colInd] calculated from the input to the PTC task? I guess one option is just to fit the mean vs time curve to a polynomial and define the linear part as the linearity corrected value. Is that the case?  Are there perhaps examples somewhere of these tables so I can see exactly what the format is? Are there examples of how the values for the tables are calculated?

(or "persiste", which seems to be synonyms with "save in a file in disk", correct?)

Correct in essence, the subtlety here being that persisting something usually means doing a butler.put() to write it, rather than writing a file as you normally might.

1). From LinearizedSquared, what is exactly c0 (yes, it says that"c0 is linearity coefficient 0 in the AmpInfoCatalog of the detector", but how is it calculated exactly and how do you get it from the input for the PTC task?)

You know as much as I do. I can work it out, but I imagine you can too, and me working it out just to explain it to you seems inefficient. If you try, and struggle, let me know and I'm happy to help, of course.

How is exactly the value table[rowInd, colInd] calculated from the input to the PTC task?

Again, is this really what you mean? Either way, again, you know as much as I do about this stuff I think.

I guess one option is just to fit the mean vs time curve to a polynomial and define the linear part as the linearity corrected value. Is that the case?

And again, I think this is up to you to solve, but that sounds about right. I guess I'd phrase it differently, but I imagine you're thinking the right things (I'd probably focus on the non-linear parts being the ones that form the correction, rather than the linear part being the corrected, because that's much more what's actually going on there, I think).

Are there perhaps examples somewhere of these tables so I can see exactly what the format is?

Not that I know of, but maybe.

Are there examples of how the values for the tables are calculated?

Again, not that I know of, but maybe.

Show Merlin Fisher-Levine added a comment - (or "persiste", which seems to be synonyms with "save in a file in disk", correct?) Correct in essence, the subtlety here being that persisting something usually means doing a butler.put() to write it, rather than writing a file as you normally might. 1). From LinearizedSquared, what is exactly c0 (yes, it says that"c0 is linearity coefficient 0 in the AmpInfoCatalog of the detector", but how is it calculated exactly and how do you get it from the input for the PTC task?) You know as much as I do. I can work it out, but I imagine you can too, and me working it out just to explain it to you seems inefficient. If you try, and struggle, let me know and I'm happy to help, of course. You ask about getting that from the "input for the PTC task" though - is that really what you meant? Please confirm. How is exactly the value table [rowInd, colInd] calculated from the input to the PTC task? Again, is this really what you mean? Either way, again, you know as much as I do about this stuff I think. I guess one option is just to fit the mean vs time curve to a polynomial and define the linear part as the linearity corrected value. Is that the case? And again, I think this is up to you to solve, but that sounds about right. I guess I'd phrase it differently, but I imagine you're thinking the right things (I'd probably focus on the non-linear parts being the ones that form the correction, rather than the linear part being the corrected, because that's much more what's actually going on there, I think). Are there perhaps examples somewhere of these tables so I can see exactly what the format is? Not that I know of, but maybe. Are there examples of how the values for the tables are calculated? Again, not that I know of, but maybe.

> You know as much as I do. I can work it out, but I imagine you can too, and me working it out just to explain it to you seems inefficient.

> And again, I think this is up to you to solve, but that sounds about right

I wanted to clarify if you had an exact way of correcting for linearity and for producing a look-up table, but if not, no problem, I can think more about it or consult with Robert.

> You ask about getting that from the "input for the PTC task" though - is that really what you meant? Please confirm.

> Again, is this really what you mean?

Yes, that is really what I mean.  This ticket says "Goal: The PTC task should persist a linearity model,...".  This indicates to me that after running the PTC task we should have a model that will be read by the linearizer to correct for signal-chain non-linearity. This implies, to me, that whatever information the PTC task receives (pairs of flats, exposure times), it should be used to produce what this ticket asks. That's also why I asked if we also needed the info of the calibration diode, or if not (and you said no). Could you please confirm/clarify?

Show Andrés Alejandro Plazas Malagón added a comment - - edited > You know as much as I do. I can work it out, but I imagine you can too, and me working it out just to explain it to you seems inefficient.  > And again, I think this is up to you to solve, but that sounds  about  right   I wanted to clarify if you had an exact way of correcting for linearity and for producing a look-up table, but if not, no problem, I can think more about it or consult with Robert.    > You ask about getting that from the "input for the PTC task" though - is that really what you meant? Please confirm. > Again, is this really what you mean? Yes, that is really what I mean.  This ticket says "Goal: The PTC task should persist a linearity model,...".  This indicates to me that after running the PTC task we should have a model that will be read by the linearizer to correct for signal-chain non-linearity. This implies, to me, that whatever information the PTC task receives (pairs of flats, exposure times), it should be used to produce what this ticket asks. That's also why I asked if we also needed the info of the calibration diode, or if not (and you said no). Could you please confirm/clarify?

OK, I see. I was just checking, because I think I'd have described that as the output of the current task, but yes, it is of course also a function of the inputs, so that's not wrong So yes, confirming that "whatever information the PTC task receives (pairs of flats, exposure times), it should be used to produce what this ticket asks." and that no, calibration diode data is not required to produce the output (though we will quite possibly need to add a way of adding that as a correction later).

Does that answer all the questions?

Show Merlin Fisher-Levine added a comment - OK, I see. I was just checking, because I think I'd have described that as the output of the current task, but yes, it is of course also a function of the inputs, so that's not wrong   So yes, confirming that "whatever information the PTC task receives (pairs of flats, exposure times), it should be used to produce what this ticket asks." and that no, calibration diode data is not required to produce the output (though we will quite possibly need to add a way of adding that as a correction later). Does that answer all the questions?

I had a brief chat with Robert Lupton about this, and he also thinks that an image that has been corrected for signal chain non-linearity should make use of a calibration diode. Robert wrote:

"I think that your corrImage is gain*flux0*time where flux0 is the measured average brightness of the light source as measured using a calibrated photodiode."

So, I am still confused about what this ticket is asking for. Is it asking to produce a non-linearity correction based on a quadratic fit to the mean of (difference image of pair of flats) vs exposure time curve, even though we know that, in the end, in LSST we are going to correct for signal chain non-linearity with a look-up table produced by using a calibrated diode?

Apologies for the confusion. Robert said that he will comment on this when he has some time.

Show Andrés Alejandro Plazas Malagón added a comment - I had a brief chat with Robert Lupton about this, and he also thinks that an image that has been corrected for signal chain non-linearity should make use of a calibration diode. Robert wrote:  "I think that your  corrImage  is  gain*flux0*time  where  flux0  is the measured average brightness of the light source as measured using a calibrated photodiode." So, I am still confused about what this ticket is asking for. Is it asking to produce a non-linearity correction based on a quadratic fit to the mean of (difference image of pair of flats) vs exposure time curve, even though we know that, in the end, in LSST we are going to correct for signal chain non-linearity with a look-up table produced by using a calibrated diode?  Apologies for the confusion. Robert said that he will comment on this when he has some time.

I had a brief chat with Robert Lupton about this, and he also thinks that an image that has been corrected for signal chain non-linearity should make use of a calibration diode. Robert wrote:

"I think that your corrImage is gain*flux0*time where flux0 is the measured average brightness of the light source as measured using a calibrated photodiode."

This is indeed formally correct. It should be noted that time also is in fact the effective time not the stated exposure time, i.e. includes a term for shutter-time jitter/calibration.

However, for now, for the purposes of the software/algorithmic development, we're assuming both an ideal lamp and an ideal shutter. This is for several reasons: 1) we don't have the data 2) we don't know how dominant these contributions are 3) we don't want to wait.

So, I am still confused about what this ticket is asking for. Is it asking to produce a non-linearity correction based on a quadratic fit to the mean of (difference image of pair of flats) vs exposure time curve, even though we know that, in the end, in LSST we are going to correct for signal chain non-linearity with a look-up table produced by using a calibrated diode?

In a word, yes. That's exactly what we are looking for, for reasons 1, 2 and 3 above. If this is still hard to grasp, just image that you do have the diode data, and it's just an array of 1's (if you're assuming it's current units) or an array with values equal to the exposure times (if you're assuming it's in units of integrated charge), i.e. you can apply some fake diode data which does nothing if this helps you think about the problem.

Does that make sense?

Show Merlin Fisher-Levine added a comment - I had a brief chat with Robert Lupton about this, and he also thinks that an image that has been corrected for signal chain non-linearity should make use of a calibration diode. Robert wrote:  "I think that your  corrImage  is  gain*flux0*time  where  flux0  is the measured average brightness of the light source as measured using a calibrated photodiode." This is indeed formally correct. It should be noted that time also is in fact the effective time not the stated exposure time, i.e. includes a term for shutter-time jitter/calibration. However, for now , for the purposes of the software/algorithmic development, we're assuming both an ideal lamp and an ideal shutter. This is for several reasons: 1) we don't have the data 2) we don't know how dominant these contributions are 3) we don't want to wait. So, I am still confused about what this ticket is asking for. Is it asking to produce a non-linearity correction based on a quadratic fit to the mean of (difference image of pair of flats) vs exposure time curve, even though we know that, in the end, in LSST we are going to correct for signal chain non-linearity with a look-up table produced by using a calibrated diode? In a word, yes. That's exactly what we are looking for, for reasons 1, 2 and 3 above. If this is still hard to grasp, just image that you do have the diode data, and it's just an array of 1's (if you're assuming it's current units) or an array with values equal to the exposure times (if you're assuming it's in units of integrated charge), i.e. you can apply some fake diode data which does nothing if this helps you think about the problem. Does that make sense?

Sorry, I think I made a mistake while discussing this with you and Merlin is correct.  Let us assume that the light source and shutter are perfect for now, in that case time is proportional to true signal.  This means that the measured signal v. time is exactly the correct shape to recover the non-linearity.  If we knew the number of incident photons (which is proportional to time) then the curve would be identical, except scaled by a factor of intensity*QE/gain.  Fortunately we don't need to measure that factor;  it's the linear term in the fit.

Show Robert Lupton added a comment - Sorry, I think I made a mistake while discussing this with you and Merlin is correct.  Let us assume that the light source and shutter are perfect for now, in that case time is proportional to true signal.  This means that the measured signal v. time is exactly the correct shape to recover the non-linearity.  If we knew the number of incident photons (which is proportional to time) then the curve would be identical, except scaled by a factor of intensity*QE/gain.  Fortunately we don't need to measure that factor;  it's the linear term in the fit.

We discussed this ticket at our meeting of 2020-01-13.

We agreed to break it down into smaller parts, so that they can be independently implemented and reviewed. Specifically, we suggested:

• One ticket to capture the algorithmic part of the work (ie, figuring out what needs to be calculated, and calculating it).
• One ticket to capture the data management / persistence part of the work (how do you write the results of the above to storage, and make it available to downstream processing).
• One ticket to capture testing and validation of the results of the above, spawning further tickets to improve the algorithm as necessary. Andrés Alejandro Plazas Malagón and Merlin Fisher-Levine, as the assignee and reporter respectively, does this work for you?

I propose that we repurpose this ticket as the first of the above bullet points. I'll write further tickets to capture the second two points.

Show John Swinbank added a comment - We discussed this ticket at our meeting of 2020-01-13 . We agreed to break it down into smaller parts, so that they can be independently implemented and reviewed. Specifically, we suggested: One ticket to capture the algorithmic part of the work (ie, figuring out what needs to be calculated, and calculating it). One ticket to capture the data management / persistence part of the work (how do you write the results of the above to storage, and make it available to downstream processing). One ticket to capture testing and validation of the results of the above, spawning further tickets to improve the algorithm as necessary. Andrés Alejandro Plazas Malagón and Merlin Fisher-Levine , as the assignee and reporter respectively, does this work for you? I propose that we repurpose this ticket as the first of the above bullet points. I'll write further tickets to capture the second two points.

Sounds good to me, including using this ticket for the algorithmic part, as the start of that discussion is already on here.

Show Merlin Fisher-Levine added a comment - Sounds good to me, including using this ticket for the algorithmic part, as the start of that discussion is already on here.

This ticket's description has been updated, and DM-23044 and DM-23045 created, to put the above plan into practice.

Andrés Alejandro Plazas Malagón — I understood from our discussion earlier that you reckon you already have the algorithmic work mostly done here, so it'd be great if you can push that and get it out for review shortly. Show John Swinbank added a comment - This ticket's description has been updated, and DM-23044 and DM-23045 created, to put the above plan into practice. Andrés Alejandro Plazas Malagón — I understood from our discussion earlier that you reckon you already have the algorithmic work mostly done here, so it'd be great if you can push that and get it out for review shortly.

I have pushed the code. I found two linearizers in linearize.py, LinearizeSquare, and LinearizeLookupTable. Question: do we need to implement both in this code? It seems that LinearizeSquared was written for a specific case (Subaru); I would think that this case and other general cases could be covered by the table.

In any case, LinearizeSquared requires a coefficient c0, which we agreed would be given by c0 = -k2/k1^2 for a fit of the form "mean_signal = k0 + k1*time + k2*time^2" (Question: should k0 be set to zero?). This is implemented at the top of “calculateLinearityResidualAndLinearizers”.
The “c0” coefficient per amplifier is saved for the moment as a dictionary in the output PhotonTransferCurveDataset.

LinearizeLookupTable is a mapping from ADU values to the corrections that should be added to those values. The array should have nrows = number of amplifiers and ncolumns=range of ADU values. An example of converting DECam linearity tables is in https://github.com/lsst/obs_decam/blob/master/decam/makeLinearizer.py, where the range of ADU’s goes up to 2^16. In this case, I have set the maximum size of the ADU range to 2^18. For the moment, in order to populate the array with the corrections, I first fit an n-degree polynomial to the "mean_signal vs time curve" (new parameter: self.config.polynomialFitDegreeNl). I use the linear part of that polynomial to obtain a tMax where the linear signal is ADUMax = 2^18 ADU. Then evaluate the linear part of the polynomial at the range of times [0, tMax] (“signalIdeal”) and the full polynomial (“signalUncorrected”). The difference between the two is the correction for the table. All of this is per amplifier, and it is also in the function “calculateLinearityResidualAndLinearizers”.

Show Andrés Alejandro Plazas Malagón added a comment - - edited I have pushed the code. I found two linearizers in linearize.py, LinearizeSquare, and LinearizeLookupTable. Question: do we need to implement both in this code? It seems that LinearizeSquared was written for a specific case (Subaru); I would think that this case and other general cases could be covered by the table. In any case, LinearizeSquared requires a coefficient c0, which we agreed would be given by c0 = -k2/k1^2 for a fit of the form "mean_signal = k0 + k1*time + k2*time^2" (Question: should k0 be set to zero?). This is implemented at the top of “calculateLinearityResidualAndLinearizers”. The “c0” coefficient per amplifier is saved for the moment as a dictionary in the output PhotonTransferCurveDataset. LinearizeLookupTable is a mapping from ADU values to the corrections that should be added to those values. The array should have nrows = number of amplifiers and ncolumns=range of ADU values. An example of converting DECam linearity tables is in https://github.com/lsst/obs_decam/blob/master/decam/makeLinearizer.py , where the range of ADU’s goes up to 2^16. In this case, I have set the maximum size of the ADU range to 2^18. For the moment, in order to populate the array with the corrections, I first fit an n-degree polynomial to the "mean_signal vs time curve" (new parameter: self.config.polynomialFitDegreeNl). I use the linear part of that polynomial to obtain a tMax where the linear signal is ADUMax = 2^18 ADU. Then evaluate the linear part of the polynomial at the range of times [0, tMax] (“signalIdeal”) and the full polynomial (“signalUncorrected”). The difference between the two is the correction for the table. All of this is per amplifier, and it is also in the function “calculateLinearityResidualAndLinearizers”.

You're  right in that the LUT covers the other case quite  well, but on the other hand, if it's not too much effort, it seems a shame not to be able to have analytic ones, as they're so much more efficient. That said, if it's any significant effort I think we could go with only LUTs for now, and revisit if necessary.

Show Merlin Fisher-Levine added a comment - You're  right in that the LUT covers the other case quite  well, but on the other hand, if it's not too much effort, it seems a shame not to be able to have analytic ones, as they're so much more efficient. That said, if it's any significant effort I think we could go with only LUTs for now, and revisit if necessary.

I thought that there was a generic polynomial lineariser. In all cases, we should be using

 I_{lin} = I_{raw} + f(I_{raw}) 

Where f can be analytic or a lookup table. Note that I_{raw} can be floating point, so the lookup table needs to work on int(f_{raw}), but using the additive form means that we don't loose precision.

Show Robert Lupton added a comment - I thought that there was a generic polynomial lineariser. In all cases, we should be using I_{lin} = I_{raw} + f(I_{raw}) Where f can be analytic or a lookup table. Note that I_{raw } can be floating point, so the lookup table needs to work on int(f_{raw}) , but using the additive form means that we don't loose precision.

There will be a generic polynomial of that form after DM-23023.  I think that's the solution if that's what the best fit is.  That said, the target that I'm working towards on DM-23023 (please comment soon on RFC-665 if there are issues) is that a particular linearity correction only contains one type.  If the code will be fitting multiple models, they will need to be persisted separately.

Show Christopher Waters added a comment - There will be a generic polynomial of that form after DM-23023 .  I think that's the solution if that's what the best fit is.  That said, the target that I'm working towards on DM-23023 (please comment soon on RFC-665 if there are issues) is that a particular linearity correction only contains one type.  If the code will be fitting multiple models, they will need to be persisted separately.

## People

• Assignee: Andrés Alejandro Plazas Malagón
Reporter: Merlin Fisher-Levine
Reviewers:
Christopher Waters
Watchers:
Andrés Alejandro Plazas Malagón, Christopher Waters, John Swinbank, Merlin Fisher-Levine, Robert Lupton