Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-333

Correctly use "Sigma" and "Err" everywhere

    Details

    • Type: RFC
    • Status: Implemented
    • Resolution: Done
    • Component/s: DM, TCT
    • Labels:

      Description

      As I recently learned in John Swinbank's review of PhotoCalib, the DPDD says this about "Err" and "Sigma":

      For all catalog data, we will employ a convention where estimates of standard errors have the suffix Err, while the estimates of inherent widths of distribution (or functions in general) have the suffix Sigma

      On the other hand, we use Sigma in many places where we really mean Err. For example, our source tables are littered with _fluxSigma and _xySigma, which are documented as "1-sigma uncertainty". Our database schema is mixed in this regard as well (e.g. raSigma, an "uncertainty" and uPSFluxSigma a "Standard deviation of the distribution" in the DiaSource table).

      This RFC is to gather consensus about how best to sanitize our usage of these terms in the stack and related software and interfaces. The concrete propose here is to replace all incorrect uses of Sigma with Err, and vice versa in all code below lsst_distrib and in cat.

        Attachments

          Issue Links

            Activity

            Parejkoj John Parejko created issue -
            Hide
            krzys Krzysztof Findeisen added a comment -

            You might also want to have a plan for how to make sure people keep using these terms as intended going forward. Otherwise, you'll just need to do another cleanup pass in a year or two.

            Show
            krzys Krzysztof Findeisen added a comment - You might also want to have a plan for how to make sure people keep using these terms as intended going forward. Otherwise, you'll just need to do another cleanup pass in a year or two.
            Hide
            rhl Robert Lupton added a comment -

            What are we going to call covariances? We can't just take a square root...

            Show
            rhl Robert Lupton added a comment - What are we going to call covariances? We can't just take a square root...
            Hide
            mjuric Mario Juric added a comment -

            John Parejko: Thanks for catching this. +1 on my part for this RFC (esp. for the end-user visible database tables).

            Krzysztof Findeisen: I think these should be in the coding standards (and probably the DMSR).

            Show
            mjuric Mario Juric added a comment - John Parejko : Thanks for catching this. +1 on my part for this RFC (esp. for the end-user visible database tables). Krzysztof Findeisen : I think these should be in the coding standards (and probably the DMSR ).
            Hide
            mtpatter Maria Patterson [X] (Inactive) added a comment -

            There is relevant discussion also here: https://jira.lsstcorp.org/browse/DM-8050

            Show
            mtpatter Maria Patterson [X] (Inactive) added a comment - There is relevant discussion also here: https://jira.lsstcorp.org/browse/DM-8050
            tjenness Tim Jenness made changes -
            Field Original Value New Value
            Link This issue relates to DM-8050 [ DM-8050 ]
            Hide
            krzys Krzysztof Findeisen added a comment -

            Mario Juric, I suspect that won't be enough – even if this is a rule in the coding standards, it's an easy one for both developers and reviewers to forget. For Gaia we were cleaning up Sigma vs. Err (and no, I don't remember what the convention was) all the way through DR1.

            Show
            krzys Krzysztof Findeisen added a comment - Mario Juric , I suspect that won't be enough – even if this is a rule in the coding standards, it's an easy one for both developers and reviewers to forget. For Gaia we were cleaning up Sigma vs. Err (and no, I don't remember what the convention was) all the way through DR1.
            Parejkoj John Parejko made changes -
            Watchers Jim Bosch, John Parejko, John Swinbank, Krzysztof Findeisen, Maria Patterson, Mario Juric, Paul Price, Robert Lupton, Simon Krughoff, Tim Jenness [ Jim Bosch, John Parejko, John Swinbank, Krzysztof Findeisen, Maria Patterson, Mario Juric, Paul Price, Robert Lupton, Simon Krughoff, Tim Jenness ] Andy Salnikov, Jim Bosch, John Parejko, John Swinbank, Krzysztof Findeisen, Maria Patterson, Mario Juric, Paul Price, Robert Lupton, Simon Krughoff, Tim Jenness [ Andy Salnikov, Jim Bosch, John Parejko, John Swinbank, Krzysztof Findeisen, Maria Patterson, Mario Juric, Paul Price, Robert Lupton, Simon Krughoff, Tim Jenness ]
            Hide
            zivezic Zeljko Ivezic added a comment -

            DPDD uses Sigma/Err convention in 1-D cases (e.g. psFluxMean, psFluxMeanErr,
            psFluxSigma), where one can recall Err ~ Sigma/sqrt(N). As for covariances, whenever
            we have a multi-parameter fit, DPDD introduces a covariance array, e.g. pmParallaxCov.
            Both examples are from the DIAObject table.

            Show
            zivezic Zeljko Ivezic added a comment - DPDD uses Sigma/Err convention in 1-D cases (e.g. psFluxMean, psFluxMeanErr, psFluxSigma), where one can recall Err ~ Sigma/sqrt(N). As for covariances, whenever we have a multi-parameter fit, DPDD introduces a covariance array, e.g. pmParallaxCov. Both examples are from the DIAObject table.
            Hide
            mjuric Mario Juric added a comment -

            Krzysztof Findeisen, yeah, it's likely we'll make mistakes too, but the more we try & internalize the conventions the better we'll do.

            Also, if it's in the DMSR, it will give rise to a formal verification test that we'll have verify before the system is delivered.

            Show
            mjuric Mario Juric added a comment - Krzysztof Findeisen , yeah, it's likely we'll make mistakes too, but the more we try & internalize the conventions the better we'll do. Also, if it's in the DMSR, it will give rise to a formal verification test that we'll have verify before the system is delivered.
            Show
            ktl Kian-Tat Lim added a comment - For implementation of this, things to look at include https://developer.lsst.io/coding/cpp_style_guide.html#uncertainty-values-associated-with-a-variable-should-be-suffixed-by-one-of-var-cov-sigma and porting https://dev.lsstcorp.org/trac/wiki/dbNamingConv into the Developer Guide.
            Hide
            Parejkoj John Parejko added a comment -

            Ugh, so our current C++ and database style guides are backwards, too?

            Show
            Parejkoj John Parejko added a comment - Ugh, so our current C++ and database style guides are backwards, too?
            Hide
            ktl Kian-Tat Lim added a comment -

            Not backwards, exactly, since I believe neither prescribes the use of "Err" anywhere.

            Show
            ktl Kian-Tat Lim added a comment - Not backwards, exactly, since I believe neither prescribes the use of "Err" anywhere.
            Hide
            swinbank John Swinbank added a comment -

            We discussed this as the DMLT telecon of 2017-05-01. There was broad support for this proposal, but also a desire to understand exactly how far down the rabbit hole it goes. We think the DPDD is consistent in its usage (but somebody should check); we suspect that some data products which the DPDD refers to have been materialized in the cat package in a way which isn't consistent. We'd like to see an implementation ticket here which involves understanding just which documents and/or tools will need to be modified.

            We also wondered where the best place to record this policy is. Kian-Tat Lim's comment above already speaks to that in terms of the code, but do we have a place where we could record it as a more general policy, mandate its use in documentation, etc?

            Show
            swinbank John Swinbank added a comment - We discussed this as the DMLT telecon of 2017-05-01. There was broad support for this proposal, but also a desire to understand exactly how far down the rabbit hole it goes. We think the DPDD is consistent in its usage (but somebody should check); we suspect that some data products which the DPDD refers to have been materialized in the cat package in a way which isn't consistent. We'd like to see an implementation ticket here which involves understanding just which documents and/or tools will need to be modified. We also wondered where the best place to record this policy is. Kian-Tat Lim 's comment above already speaks to that in terms of the code, but do we have a place where we could record it as a more general policy, mandate its use in documentation, etc?
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            For easier reference:

            The database document says:

            • put column keeping error for another column one after another, and append Sigma at the end, for example raSigma column should be immediately after ra column
            • for covariances, use _ to separate names of the columns that are involved, and append Cov at the end, for example a covariance of lan and aop should be named lan_aop_Cov.

            And the coding standards say:

            Uncertainty values associated with a variable SHOULD be suffixed by one of Var, Cov, Sigma.
            There is no universal suffix for uncertainties; i.e. no Err suffix will be used. The cases that we have identified, and their appropriate suffixes, are:

            • Standard deviation: Sigma (not Rms, as RMS doesn’t imply that the mean’s subtracted)
            • Covariance: Cov
            • Variance: Var

            The postfix Err can easily be misinterpreted as error flags. Use the full Sigma since Sig can easily be misinterpreted as Signal.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - For easier reference: The database document says: put column keeping error for another column one after another, and append Sigma at the end, for example raSigma column should be immediately after ra column for covariances, use _ to separate names of the columns that are involved, and append Cov at the end, for example a covariance of lan and aop should be named lan_aop_Cov . And the coding standards say: Uncertainty values associated with a variable SHOULD be suffixed by one of Var , Cov , Sigma . There is no universal suffix for uncertainties; i.e. no Err suffix will be used. The cases that we have identified, and their appropriate suffixes, are: Standard deviation: Sigma (not Rms , as RMS doesn’t imply that the mean’s subtracted) Covariance: Cov Variance: Var The postfix Err can easily be misinterpreted as error flags. Use the full Sigma since Sig can easily be misinterpreted as Signal .
            tjenness Tim Jenness made changes -
            Remote Link This issue links to "Page (Confluence)" [ 15028 ]
            Hide
            tjenness Tim Jenness added a comment -

            John Parejko what's your status on this?

            Show
            tjenness Tim Jenness added a comment - John Parejko what's your status on this?
            Parejkoj John Parejko made changes -
            Planned End 27/Apr/17 8:16 PM 12/May/17 8:16 PM
            Hide
            Parejkoj John Parejko added a comment -

            I have extended it to the end of the week: given the confusion between the DPDD and our database/coding standards, I feel like this should maybe be escalated up to the DMLT.

            That said, I personally agree with the DPDD, with the caveat that we should clearly document our use of "Err" as the standard error in docstrings, descriptors, standards, etc.

            Show
            Parejkoj John Parejko added a comment - I have extended it to the end of the week: given the confusion between the DPDD and our database/coding standards, I feel like this should maybe be escalated up to the DMLT. That said, I personally agree with the DPDD, with the caveat that we should clearly document our use of "Err" as the standard error in docstrings, descriptors, standards, etc.
            mjuric Mario Juric made changes -
            Labels dm-sst
            lpetrick Libby Petrick [X] (Inactive) made changes -
            Remote Link This issue links to "Page (Confluence)" [ 15070 ]
            tjenness Tim Jenness made changes -
            Remote Link This issue links to "Page (Confluence)" [ 15070 ]
            Hide
            tjenness Tim Jenness added a comment -

            If you feel you need DMLT to call this RFC please escalate. I imagine Mario Juric should lead this.

            Show
            tjenness Tim Jenness added a comment - If you feel you need DMLT to call this RFC please escalate. I imagine Mario Juric should lead this.
            Hide
            Parejkoj John Parejko added a comment -

            Given the confusion between the DPDD and our database/coding standards, this needs a decision from the DMLT as to which takes precedence. I personally think the DPDD is correct here, but see comments above.

            Show
            Parejkoj John Parejko added a comment - Given the confusion between the DPDD and our database/coding standards, this needs a decision from the DMLT as to which takes precedence. I personally think the DPDD is correct here, but see comments above.
            Parejkoj John Parejko made changes -
            Status Proposed [ 10805 ] Flagged [ 10606 ]
            Hide
            krughoff Simon Krughoff added a comment -

            Tim Jenness can we pick this up on the DMLT next Monday?

            Show
            krughoff Simon Krughoff added a comment - Tim Jenness can we pick this up on the DMLT next Monday?
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            No actual objection to discussing this at DMLT, but isn't the DM-CCB (formerly TCT) formally the escalation path for RFCs?

            Show
            gpdf Gregory Dubois-Felsmann added a comment - No actual objection to discussing this at DMLT, but isn't the DM-CCB (formerly TCT) formally the escalation path for RFCs?
            Hide
            krughoff Simon Krughoff added a comment -

            If that's the correct path, I'm happy with that too, I just don't know how to urge the DM-CCB to pick it up.

            Show
            krughoff Simon Krughoff added a comment - If that's the correct path, I'm happy with that too, I just don't know how to urge the DM-CCB to pick it up.
            Hide
            swinbank John Swinbank added a comment - - edited

            Current draft of LDM-294 is actually a little vague about RFC escalation; we should tighten it up before it gets baselined.

            Some proposed resolutions may require changes to one or more of the baselined, change- controlled documents [...] (those in DocuShare with an LDM- handle or marked as change-controlled in Confluence) [...] In this case only, the DM Configuration Control Board (DMCCB) (Section 7.4) may empanel an ad hoc commit- tee including the lead author of the document and other relevant experts. This committee or the TCT (sic) itself must explicitly approve the change.

            ...

            If the DM team can’t converge on a resolution to an RFC that has no serious objections but the requestor still feel that something must be done, the request will be escalated. In most non-trivial cases, they will, with the advice of the SA, empanel a group of experts to which they will delegate the right to make the decision, by voting if need be.

            (where "they" is not clearly defined in the second paragraph above — I think it just means "the DM team in general").

            Given that LSE-163 is change controlled (albeit not an LDM), I think the DM-CCB does have jurisdiction in this case.

            Show
            swinbank John Swinbank added a comment - - edited Current draft of LDM-294 is actually a little vague about RFC escalation; we should tighten it up before it gets baselined. Some proposed resolutions may require changes to one or more of the baselined, change- controlled documents [...] (those in DocuShare with an LDM- handle or marked as change-controlled in Confluence) [...] In this case only, the DM Configuration Control Board (DMCCB) (Section 7.4) may empanel an ad hoc commit- tee including the lead author of the document and other relevant experts. This committee or the TCT (sic) itself must explicitly approve the change. ... If the DM team can’t converge on a resolution to an RFC that has no serious objections but the requestor still feel that something must be done, the request will be escalated. In most non-trivial cases, they will, with the advice of the SA, empanel a group of experts to which they will delegate the right to make the decision, by voting if need be. (where "they" is not clearly defined in the second paragraph above — I think it just means "the DM team in general"). Given that LSE-163 is change controlled (albeit not an LDM), I think the DM-CCB does have jurisdiction in this case.
            Hide
            krzys Krzysztof Findeisen added a comment -

            If you change the wording in LDM-294, please also change the wording at https://developer.lsst.io/processes/decision_process.html#appeals-process to match.

            Show
            krzys Krzysztof Findeisen added a comment - If you change the wording in LDM-294, please also change the wording at https://developer.lsst.io/processes/decision_process.html#appeals-process to match.
            Hide
            swinbank John Swinbank added a comment -

            Yes, absolutely — the Dev Guide will need several updates when the new LDM-294 is baselined, I imagine.

            Show
            swinbank John Swinbank added a comment - Yes, absolutely — the Dev Guide will need several updates when the new LDM-294 is baselined, I imagine.
            Hide
            tjenness Tim Jenness added a comment -

            Historically we've had two types of RFC flagging. "Flagged" can refer to "this RFC has not converged and we need Kian-Tat Lim to make a decision" and also "I am proposing a change to the baseline, can the CCB please take a look". I thought this RFC was more of the former but given DPDD involvement I had proposed Mario Juric to lead rather than Kian-Tat Lim. The latter type of flagging usually also includes a "TCT" component.

            Show
            tjenness Tim Jenness added a comment - Historically we've had two types of RFC flagging. "Flagged" can refer to "this RFC has not converged and we need Kian-Tat Lim to make a decision" and also "I am proposing a change to the baseline, can the CCB please take a look". I thought this RFC was more of the former but given DPDD involvement I had proposed Mario Juric to lead rather than Kian-Tat Lim . The latter type of flagging usually also includes a "TCT" component.
            krughoff Simon Krughoff made changes -
            Component/s TCT [ 13618 ]
            Hide
            krughoff Simon Krughoff added a comment -

            TCT added.

            Show
            krughoff Simon Krughoff added a comment - TCT added.
            Hide
            tjenness Tim Jenness added a comment -

            I'm not entirely sure I know what CCB is being asked to approve. From what I understand we all agree that DPDD has defined the right convention. What we aren't entirely clear on is what changes we need to make to documents and code to make things consistent with DPDD. Once changes are made to documents to match DPDD then CCB will be tasked with approving those changes. What am I missing?

            Show
            tjenness Tim Jenness added a comment - I'm not entirely sure I know what CCB is being asked to approve. From what I understand we all agree that DPDD has defined the right convention. What we aren't entirely clear on is what changes we need to make to documents and code to make things consistent with DPDD. Once changes are made to documents to match DPDD then CCB will be tasked with approving those changes. What am I missing?
            Hide
            swinbank John Swinbank added a comment -

            Re-reading the above discussion, I think Tim is correct. I don't see any requirement for a CCB or DMLT sign off on this RFC. The issue has been raised in DMLT meetings, and there were no serious objections beyond concern at the potential amount of work involved. I think what remains to be done here is to file tickets capturing that work, then this can be marked as "adopted"; transitioning it to "implemented" then becomes part of our regular scheduling process.

            Show
            swinbank John Swinbank added a comment - Re-reading the above discussion, I think Tim is correct. I don't see any requirement for a CCB or DMLT sign off on this RFC. The issue has been raised in DMLT meetings, and there were no serious objections beyond concern at the potential amount of work involved. I think what remains to be done here is to file tickets capturing that work, then this can be marked as "adopted"; transitioning it to "implemented" then becomes part of our regular scheduling process.
            Hide
            Parejkoj John Parejko added a comment -

            Thanks for clarification, Tim Jenness and John Swinbank. I will work with Simon Krughoff tomorrow to file the appropriate tickets.

            Show
            Parejkoj John Parejko added a comment - Thanks for clarification, Tim Jenness and John Swinbank . I will work with Simon Krughoff tomorrow to file the appropriate tickets.
            Hide
            Parejkoj John Parejko added a comment -

            Implementation ticket filed as Epic DM-10932, with associated stories.

            NOTE! When the schema and code tickets (DM-10935, DM-10936) are implemented, this will be a breaking change for backwards compatibility, including old reference catalogs and previously processed data. We recommend that publicly provided reference catalogs be copied and marked "deprecated: for use with LSST stack < v??", and that updated catalogs be provided.

            Show
            Parejkoj John Parejko added a comment - Implementation ticket filed as Epic DM-10932 , with associated stories. NOTE! When the schema and code tickets ( DM-10935 , DM-10936 ) are implemented, this will be a breaking change for backwards compatibility, including old reference catalogs and previously processed data. We recommend that publicly provided reference catalogs be copied and marked "deprecated: for use with LSST stack < v??", and that updated catalogs be provided.
            Parejkoj John Parejko made changes -
            Status Flagged [ 10606 ] Adopted [ 10806 ]
            tjenness Tim Jenness made changes -
            Link This issue is triggering DM-10932 [ DM-10932 ]
            rowen Russell Owen made changes -
            Link This issue relates to RFC-368 [ RFC-368 ]
            rowen Russell Owen made changes -
            Link This issue relates to RFC-271 [ RFC-271 ]
            rowen Russell Owen made changes -
            Link This issue is triggering DM-15244 [ DM-15244 ]
            Hide
            rowen Russell Owen added a comment -

            The code has been fixed. Now I'm looking at our documentation. I think the only thing that needs changing is C++ style guide entry 3-34 says to use "Sigma" not "Err" for error. Could somebody please suggest new wording for this entry? I'm really not sure what to say...for instance do we still recommend "Var" as a suffix for variance?

            Show
            rowen Russell Owen added a comment - The code has been fixed. Now I'm looking at our documentation. I think the only thing that needs changing is C++ style guide entry 3-34 says to use "Sigma" not "Err" for error. Could somebody please suggest new wording for this entry? I'm really not sure what to say...for instance do we still recommend "Var" as a suffix for variance?
            rowen Russell Owen made changes -
            Risk Score 0
            Hide
            Parejkoj John Parejko added a comment -

            Russell Owen: that was already fixed in DM-10933, it just needs to be merged (the agreement to merge happened just before the all hands and I missed it).

            Lets keep discussion about remaining work on the relevant epic: DM-10932

            Show
            Parejkoj John Parejko added a comment - Russell Owen : that was already fixed in DM-10933 , it just needs to be merged (the agreement to merge happened just before the all hands and I missed it). Lets keep discussion about remaining work on the relevant epic: DM-10932
            swinbank John Swinbank made changes -
            Link This issue is triggering DM-10933 [ DM-10933 ]
            swinbank John Swinbank made changes -
            Link This issue is triggering DM-10935 [ DM-10935 ]
            swinbank John Swinbank made changes -
            Link This issue is triggering DM-10934 [ DM-10934 ]
            swinbank John Swinbank made changes -
            Link This issue is triggering DM-10932 [ DM-10932 ]
            Parejkoj John Parejko made changes -
            Link This issue relates to RFC-535 [ RFC-535 ]
            gcomoretto Gabriele Comoretto made changes -
            Remote Link This issue links to "Page (Confluence)" [ 24379 ]
            gcomoretto Gabriele Comoretto made changes -
            Remote Link This issue links to "Page (Confluence)" [ 24428 ]
            gcomoretto Gabriele Comoretto made changes -
            Remote Link This issue links to "Page (Confluence)" [ 24525 ]
            Hide
            swinbank John Swinbank added a comment -

            Fully implemented by the completion of DM-10935.

            Show
            swinbank John Swinbank added a comment - Fully implemented by the completion of DM-10935 .
            swinbank John Swinbank made changes -
            Resolution Done [ 10000 ]
            Status Adopted [ 10806 ] Implemented [ 11105 ]
            gcomoretto Gabriele Comoretto made changes -
            Remote Link This issue links to "Page (Confluence)" [ 24720 ]

              People

              • Assignee:
                Parejkoj John Parejko
                Reporter:
                Parejkoj John Parejko
                Watchers:
                Andy Salnikov, Gregory Dubois-Felsmann, Jim Bosch, John Parejko, John Swinbank, Kian-Tat Lim, Krzysztof Findeisen, Paul Price, Robert Lupton, Russell Owen, Simon Krughoff, Tim Jenness, Zeljko Ivezic
              • Votes:
                1 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Planned End:

                  Summary Panel