Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-8050

Reconcile L1DB schema with latest DPDD

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: cat, L1 Database
    • Labels:
      None
    • Sprint:
      DB_F16_10
    • Team:
      Data Access and Database

      Description

      The level 1 db prototype schema from here: https://github.com/lsst-dm/l1dbproto/blob/master/python/lsst/l1dbproto/l1db.py and https://github.com/lsst/cat/blob/master/sql/baselineSchema.sql has some inconsistencies with the latest version of the DPDD (from 9/26/2016).

      I've just looked through diaSource and diaObject, mainly. The biggest things for the diaSource table are that dipole measurement fields in the DPDD are not in the schema and it is unclear how some fields like the second moments of intensity and some covariance terms match up between the schema vs DPDD.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            Maria, could you please review my changes to cat baseline schema when you have a minute. The list of changes is significant so I don't want to summarize it here on the ticket, but they are all on github. I tried to resolve sigma/error/variance confusion where I could, still we have a mess with matrix storage, we'll have to think about it again. I have also updated lsst-dm/l1dbproto package (on the same tickets/DM-8050 branch) but this is my test code, no need to review that.

            Show
            salnikov Andy Salnikov added a comment - Maria, could you please review my changes to cat baseline schema when you have a minute. The list of changes is significant so I don't want to summarize it here on the ticket, but they are all on github. I tried to resolve sigma/error/variance confusion where I could, still we have a mess with matrix storage, we'll have to think about it again. I have also updated lsst-dm/l1dbproto package (on the same tickets/ DM-8050 branch) but this is my test code, no need to review that.
            Hide
            mtpatter Maria Patterson [X] (Inactive) added a comment -

            Sure, I can do that.

            Show
            mtpatter Maria Patterson [X] (Inactive) added a comment - Sure, I can do that.
            Hide
            mtpatter Maria Patterson [X] (Inactive) added a comment -

            I put comments in the github cat pull request conversation.

            Show
            mtpatter Maria Patterson [X] (Inactive) added a comment - I put comments in the github cat pull request conversation.
            Hide
            salnikov Andy Salnikov added a comment -

            Maria, thanks a lot for review and all comments! I want to continue the discussion here so other interested people can join (also JIRA is probably easier to search for info later).

            So here it goes:

            OK, I wrote out all the covariance matrices and think that all the fields that should be here are here now, with the exception that I think (just noting for the future) - apFlux* in DIASource table might need many fields at some point for additional aperture sizes.

            I am already worrying about the size of the data in the L1 database, this recent update to DPDD added a lot more columns than it removed. This will definitely have an impact on sizing model, I hope someone is looking into it.

            I think everywhere *Sigma should really be *Err and the UCD types were correct before, according to naming conventions

            Here I'm not so sure which one is better. I can try to explain my reasoning when I changes "stat.error" to "stat.stdev" - DPDD states that we save covariance matrices for parameter estimates. For off-diagonals we do exactly that and we call columns "x_y_Cov" with UCD="stat.covariance". But for diagonals we do not store them directly, instead I assume we store square root of that; the diagonals in the matrix are variances and their square roots are standard deviations frequently called sigma. So if you say that covariance matrix is stored in this strange way then it's natural to call those columns xyzSigma and their UCD "stat.stdev". Of course those sigmas should be understood as related to estimates of the parameters (error of the mean) and not to the original distribution (if there is any).

            I do think we need a better and less confusing way to store covariance matrices in database. I hope I can propose something in the future. For now let me return all UCDs back to stat.error but keep column names (basically revert everything back).

            totFluxErr’s UCD should be phot.count;stat.error

            Thanks, changed it to "stat.error;phot.count".

            If there is confusion/disagreement on this generally (I think there might be because I am also confused), then I would suggest that at least the UCD types revert back to what they were originally and a new ticket to possibly change/reconsider field names in the change-controlled LDM-153 be started, if people are happier with that order of decisions.

            Let me think for some time on what I said above about covariances, I'll open RFC when I have some idea.

            Not an edit made on this PR, but for the UCD combinations special word combinations that are currently written as e.g., “stat.error;pos.eq.ra”, for anything in combination with a stat* or meta* qualifier, it is suggested to write the qualifier after the quantity. So “pos.eq.ra;stat.error”.

            According to http://www.ivoa.net/documents/REC/UCD/UCDlist-20070402.html#_Toc163276347 stat.error, stat.covariance, and stat.stdev keywords are primary and have to appear first, so I think our current approach is consistent with UCD recommendations.

            Small capitalization things - In DIAObject table- For the PSFlux and FPFlux etc entries, this reference for naming conventions prefers camel case...

            Well, I'm not quite sure, that convention does not say anything about camel case for acronyms. I guess we need more detailed convention if we want to worry about these cases. I'm OK with anything, so I'll keep it as it is for now, for MOID I just follow whatever is written in DPDD.

            In the DIAObject table- small typo: *fpFluxMean says “fliter” in the description instead of “filter"

            Indeed, thanks.

            parentDiaSourceId description should be “Id of the parent diaSource this diaSource has been deblended from”

            Changed, thanks.

            In the naming conventions here, it suggests that we use row and col instead of x and y

            I'd suggest a new ticket to check for all naming conventions (and maybe improving conventions themselves).

            apFluxErr in the UCD field should be phot.count;stat.error

            Changed to "stat.error;phot.count".

            I have pushed the fixes to UCDs and comments, there is still chance to look at them and complain if something is wrong. Will merge with master on ~Monday.

            Show
            salnikov Andy Salnikov added a comment - Maria, thanks a lot for review and all comments! I want to continue the discussion here so other interested people can join (also JIRA is probably easier to search for info later). So here it goes: OK, I wrote out all the covariance matrices and think that all the fields that should be here are here now, with the exception that I think (just noting for the future) - apFlux* in DIASource table might need many fields at some point for additional aperture sizes. I am already worrying about the size of the data in the L1 database, this recent update to DPDD added a lot more columns than it removed. This will definitely have an impact on sizing model, I hope someone is looking into it. I think everywhere *Sigma should really be *Err and the UCD types were correct before, according to naming conventions Here I'm not so sure which one is better. I can try to explain my reasoning when I changes "stat.error" to "stat.stdev" - DPDD states that we save covariance matrices for parameter estimates. For off-diagonals we do exactly that and we call columns "x_y_Cov" with UCD="stat.covariance". But for diagonals we do not store them directly, instead I assume we store square root of that; the diagonals in the matrix are variances and their square roots are standard deviations frequently called sigma. So if you say that covariance matrix is stored in this strange way then it's natural to call those columns xyzSigma and their UCD "stat.stdev". Of course those sigmas should be understood as related to estimates of the parameters (error of the mean) and not to the original distribution (if there is any). I do think we need a better and less confusing way to store covariance matrices in database. I hope I can propose something in the future. For now let me return all UCDs back to stat.error but keep column names (basically revert everything back). totFluxErr’s UCD should be phot.count;stat.error Thanks, changed it to "stat.error;phot.count". If there is confusion/disagreement on this generally (I think there might be because I am also confused), then I would suggest that at least the UCD types revert back to what they were originally and a new ticket to possibly change/reconsider field names in the change-controlled LDM-153 be started, if people are happier with that order of decisions. Let me think for some time on what I said above about covariances, I'll open RFC when I have some idea. Not an edit made on this PR, but for the UCD combinations special word combinations that are currently written as e.g., “stat.error;pos.eq.ra”, for anything in combination with a stat* or meta* qualifier, it is suggested to write the qualifier after the quantity. So “pos.eq.ra;stat.error”. According to http://www.ivoa.net/documents/REC/UCD/UCDlist-20070402.html#_Toc163276347 stat.error, stat.covariance, and stat.stdev keywords are primary and have to appear first, so I think our current approach is consistent with UCD recommendations. Small capitalization things - In DIAObject table- For the PSFlux and FPFlux etc entries, this reference for naming conventions prefers camel case... Well, I'm not quite sure, that convention does not say anything about camel case for acronyms. I guess we need more detailed convention if we want to worry about these cases. I'm OK with anything, so I'll keep it as it is for now, for MOID I just follow whatever is written in DPDD. In the DIAObject table- small typo: *fpFluxMean says “fliter” in the description instead of “filter" Indeed, thanks. parentDiaSourceId description should be “Id of the parent diaSource this diaSource has been deblended from” Changed, thanks. In the naming conventions here, it suggests that we use row and col instead of x and y I'd suggest a new ticket to check for all naming conventions (and maybe improving conventions themselves). apFluxErr in the UCD field should be phot.count;stat.error Changed to "stat.error;phot.count". I have pushed the fixes to UCDs and comments, there is still chance to look at them and complain if something is wrong. Will merge with master on ~Monday.
            Hide
            salnikov Andy Salnikov added a comment -

            No further comments received. Merged with master, done.

            Show
            salnikov Andy Salnikov added a comment - No further comments received. Merged with master, done.

              People

              • Assignee:
                salnikov Andy Salnikov
                Reporter:
                mtpatter Maria Patterson [X] (Inactive)
                Reviewers:
                Maria Patterson [X] (Inactive)
                Watchers:
                Andy Salnikov, Colin Slater, Darko Jevremovic, Gregory Dubois-Felsmann, John Swinbank, Kian-Tat Lim, Maria Patterson [X] (Inactive), Scott Daniel, Tim Jenness, Veljko Vujcic
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel