Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-13375

DPDD does not mention DRP-generated DIAObject/DIASource

    XMLWordPrintable

    Details

    • Team:
      DM Science

      Description

      The DPDD currently lists schemas for Object, Source, and ForcedSource under §4.3 - "The Level 2 Catalogs". This omits the DIASource and DIAObject tables, which LDM-151 says will also be generated during data release production. 

      A discussion of these tables should be added to the DRP section of the DPDD so that their rationale and usage can be explained, particularly in relation to other characterizations of variable objects in the Object and prompt-processing-generated DIAObject tables. Alternatively, if either of these tables are meant to be for internal usage only (e.g. solely as inputs to Object), LDM-151 should make that clear.

      Assigning to Mario Juric with the intention that it will be reassigned to the next subsystem scientist.

        Attachments

          Issue Links

            Activity

            No builds found.
            ctslater Colin Slater created issue -
            Hide
            swinbank John Swinbank added a comment -

            Reassigning to Zeljko Ivezic, on the basis that he is acting DMSS until Mario' is appointed.

            Show
            swinbank John Swinbank added a comment - Reassigning to Zeljko Ivezic , on the basis that he is acting DMSS until Mario' is appointed.
            swinbank John Swinbank made changes -
            Field Original Value New Value
            Watchers Colin Slater, Eric Bellm, Gregory Dubois-Felsmann, Jim Bosch [ Colin Slater, Eric Bellm, Gregory Dubois-Felsmann, Jim Bosch ] Colin Slater, Eric Bellm, Gregory Dubois-Felsmann, Jim Bosch, Mario Juric [ Colin Slater, Eric Bellm, Gregory Dubois-Felsmann, Jim Bosch, Mario Juric ]
            Assignee Mario Juric [ mjuric ] Zeljko Ivezic [ zivezic ]
            Hide
            zivezic Zeljko Ivezic added a comment -

            I need more clarification from the reporter (Colin) and Jim. In DPDD, Section 4.2 about DRP says in bullet 5:

            "The next stage in the pipeline, which we will for simplicity just call the deblender, will synthesize a list of unique objects. In doing so it will consider the catalogs of CoaddSources, catalogs of DIASources, DIAObjects and SSObjects detected on difference images, and objects from external catalogs." 

            It's not clear from this text if DIASources, DIAObjects and SSObjects come from the real-time AP processing, or if the AP code is rerun during DRP. 

            I need Jim to provide more details about what is planned in this context, and what he would like to be added to DPDD to address Colin's comments. 

             

            Show
            zivezic Zeljko Ivezic added a comment - I need more clarification from the reporter (Colin) and Jim. In DPDD, Section 4.2 about DRP says in bullet 5: "The next stage in the pipeline, which we will for simplicity just call the deblender, will synthesize a list of unique objects. In doing so it will consider the catalogs of CoaddSources, catalogs of DIASources, DIAObjects and SSObjects detected on difference images, and objects from external catalogs."  It's not clear from this text if DIASources, DIAObjects and SSObjects come from the real-time AP processing, or if the AP code is rerun during DRP.  I need Jim to provide more details about what is planned in this context, and what he would like to be added to DPDD to address Colin's comments.   
            Hide
            jbosch Jim Bosch added a comment -

            No DRP data products will come from the real-time AP processing.  SSObject would be the one worth considering, but I believe Mario Juric wants to make the DRP SSObjects completely independent so they can be used to estimate LSST-only selection functions.

            My best guess right now is that the quantities we measure specifically on DRP DIAObjects will be essentially the same as those measured on AP DIAObjects, but each DIAObject will also be associated with an Object, and some Object measurements may scientifically supersede some DIAObject measurements; I haven't really thought much about this in detail.

            The low-level code to generate DRP DIASources will be a variant of the AP code.  The processing steps that proceed it will be quite different, but not in ways that should significantly effect the DIASource detections or measurements.  I don't really know how much the DRP DIAObject association code can have in common with the AP DIAObject association code, because AP DIAObjects are built incrementally and DRP DIAObjects are built all at once.

            Show
            jbosch Jim Bosch added a comment - No DRP data products will come from the real-time AP processing.  SSObject would be the one worth considering, but I believe Mario Juric wants to make the DRP SSObjects completely independent so they can be used to estimate LSST-only selection functions. My best guess right now is that the quantities we measure specifically on DRP DIAObjects will be essentially the same as those measured on AP DIAObjects, but each DIAObject will also be associated with an Object, and some Object measurements may scientifically supersede some DIAObject measurements; I haven't really thought much about this in detail. The low-level code to generate DRP DIASources will be a variant of the AP code.  The processing steps that proceed it will be quite different, but not in ways that should significantly effect the DIASource detections or measurements.  I don't really know how much the DRP DIAObject association code can have in common with the AP DIAObject association code, because AP DIAObjects are built incrementally and DRP DIAObjects are built all at once.
            Hide
            mjuric Mario Juric added a comment -

            SSObject would be the one worth considering, but I believe Mario Juric wants to make the DRP SSObjects completely independent so they can be used to estimate LSST-only selection functions.

            Correct – the changes we're looking wrt. SSObjects (the orbit catalog) in AP would make them a product we import from the Minor Planet Center; in DRP, however, the proposal will be to still build our own orbit catalog. See http://ls.st/presentation-582 for more detail (an RFC will follow once the details are finalized).

            Show
            mjuric Mario Juric added a comment - SSObject would be the one worth considering, but I believe  Mario Juric  wants to make the DRP SSObjects completely independent so they can be used to estimate LSST-only selection functions. Correct – the changes we're looking wrt. SSObjects (the orbit catalog) in AP would make them a product we import from the Minor Planet Center; in DRP, however, the proposal will be to still build our own orbit catalog. See http://ls.st/presentation-582  for more detail (an RFC will follow once the details are finalized).
            Hide
            jbosch Jim Bosch added a comment - - edited

            Just took a closer look at the DPDD section on DIAObject, and here's a more thorough report on how I think DRP DIAObjects ought to differ from Prompt ones:

            diaObjectId: could and probably should just be an objectId (all DIAObjects will be Objects, but not all Objects will be DIAObjects).

            radec, parallax, pm*, psFlux*: Object.ps* fields should usually provide better-measured versions of all of these quantities (more optimal estimators, at least in uncrowded regions), but I'm sure we'll find cases (crowded fields?) where these are more robust.  And having two very different ways to measure these quantities is good for diagnostic purposes.

            fpFlux*: DRP is currently slated to do both difference image forced photometry and direct forced photometry (though I believe Robert Lupton thinks the former will almost certainly be uniformly better, and I tentatively agree).  As long as these are a mean of the difference image forced photometry, they're quite useful (see https://community.lsst.org/t/data-model-for-variable-sources-at-time-of-data-releases/2695/5) as an indication of the offset between the flux in the difference image template and the flux measured in all epochs.  We should probably document this as its main purpose in DRP (I think this is not a role it can play in Prompt, where the forced photometry doesn't extend to all epochs and there are no deep fluxes to relate to anyway).  I don't think a version of this that averages direct forced photometry would be useful, unless it turns out to be a diagnostic on how well we can "project" coadd deblending results to single-epoch images when we do the direct forced photometry.  We should also definitely move these columns to Object in DRP, so that Object includes all quantities derived from ForcedSources and DIAObject includes only quantities derived from DIASources, and so we make these measurements for Objects that are not DIAObjects.

            lcPeriodic, lcNonPeriodic: Unless the DIAObject quantities are measured from DIASources instead of difference-image ForcedSources (not clear from the text, but I think these should almost certainly be measured on ForcedSources), these are exact duplicates of the quantities in the Object table and need not be repeated here.

            nearbyObj:* if we just use objectId instead of diaObjectId above, these are not needed at all.  If we still have a separate diaObjectId as the primary key, we should only need one objectId here because the association will be unique, and we should just call it "objectId" or maybe "derivedObjectId" instead of "nearbyObjectId".  In either case I don't think the distance or association probability is meaningful, as the Object will have been created from the DIAObject.

             

            Show
            jbosch Jim Bosch added a comment - - edited Just took a closer look at the DPDD section on DIAObject, and here's a more thorough report on how I think DRP DIAObjects ought to differ from Prompt ones: diaObjectId : could and probably should just be an objectId (all DIAObjects will be Objects, but not all Objects will be DIAObjects). radec , parallax , pm * , psFlux * :  Object.ps* fields should usually provide better-measured versions of all of these quantities (more optimal estimators, at least in uncrowded regions), but I'm sure we'll find cases (crowded fields?) where these are more robust.  And having two very different ways to measure these quantities is good for diagnostic purposes. fpFlux *: DRP is currently slated to do both difference image forced photometry and direct forced photometry (though I believe Robert Lupton thinks the former will almost certainly be uniformly better, and I tentatively agree).  As long as these are a mean of the difference image forced photometry, they're quite useful (see https://community.lsst.org/t/data-model-for-variable-sources-at-time-of-data-releases/2695/5)  as an indication of the offset between the flux in the difference image template and the flux measured in all epochs.  We should probably document this as its main purpose in DRP (I think this is not a role it can play in Prompt, where the forced photometry doesn't extend to all epochs and there are no deep fluxes to relate to anyway).  I don't think a version of this that averages direct forced photometry would be useful, unless it turns out to be a diagnostic on how well we can "project" coadd deblending results to single-epoch images when we do the direct forced photometry.  We should also definitely move these columns to Object in DRP, so that Object includes all quantities derived from ForcedSources and DIAObject includes only quantities derived from DIASources, and so we make these measurements for Objects that are not DIAObjects. lcPeriodic,   lcNonPeriodic:  Unless the DIAObject quantities are measured from DIASources instead of difference-image ForcedSources (not clear from the text, but I think these should almost certainly be measured on ForcedSources), these are exact duplicates of the quantities in the Object table and need not be repeated here. nearbyObj :* if we just use objectId instead of diaObjectId above, these are not needed at all.  If we still have a separate diaObjectId as the primary key, we should only need one objectId here because the association will be unique, and we should just call it "objectId" or maybe "derivedObjectId" instead of "nearbyObjectId".  In either case I don't think the distance or association probability is meaningful, as the Object will have been created from the DIAObject.  
            zivezic Zeljko Ivezic made changes -
            Assignee Zeljko Ivezic [ zivezic ] Leanne Guy [ lguy ]
            zivezic Zeljko Ivezic made changes -
            Priority Undefined [ 10000 ] Major [ 3 ]
            zivezic Zeljko Ivezic made changes -
            Due Date 01/Aug/18
            zivezic Zeljko Ivezic made changes -
            Status To Do [ 10001 ] To Do [ 10001 ]
            zivezic Zeljko Ivezic made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            zivezic Zeljko Ivezic made changes -
            Link This issue is duplicated by DM-13607 [ DM-13607 ]
            Hide
            swinbank John Swinbank added a comment -

            Note that LDM-151 is not normative in this regard: it has to reflect the DPDD, rather than vice versa.

            However, my reading of the DPDD (§3.3.5), the DMSR (DMS-REQ-0325) and the OSS (OSS-REQ-0135) is that we are required to regenerate and publish all prompt products (except alerts) as part of the data release. Since the table schemas should be ~the same, duplicating them in the DPDD doesn't seem useful, but perhaps simply adding a note that the prompt schemas are included in the the data release section by reference would help clarify?

            Show
            swinbank John Swinbank added a comment - Note that LDM-151 is not normative in this regard: it has to reflect the DPDD, rather than vice versa. However, my reading of the DPDD (§3.3.5), the DMSR (DMS-REQ-0325) and the OSS (OSS-REQ-0135) is that we are required to regenerate and publish all prompt products (except alerts) as part of the data release. Since the table schemas should be ~the same, duplicating them in the DPDD doesn't seem useful, but perhaps simply adding a note that the prompt schemas are included in the the data release section by reference would help clarify?
            lguy Leanne Guy made changes -
            Component/s Requirements Documents [ 12815 ]
            Component/s Design Documents [ 12816 ]
            lguy Leanne Guy made changes -
            Risk Score 0
            lguy Leanne Guy made changes -
            Component/s DM Subsystem Science [ 12208 ]
            Hide
            lguy Leanne Guy added a comment -

            In addressing this ticket, I notice that the DPDD also does not mention the DIAForcedSource table in Section 3: Prompt Data Products.  Section 4: Data Release Data Products does describe the ForcedSource table. I would propose to expand the scope of this ticket to include addition of the schema for the DIAForcedSource (described in LDM-153) in Section 3.3 and reference it in Section 4 as for DIASource and DIAObject table

            Show
            lguy Leanne Guy added a comment - In addressing this ticket, I notice that the DPDD also does not mention the DIAForcedSource table in Section 3: Prompt Data Products.  Section 4: Data Release Data Products does describe the ForcedSource table. I would propose to expand the scope of this ticket to include addition of the schema for the DIAForcedSource (described in LDM-153) in Section 3.3 and reference it in Section 4 as for  DIASource  and  DIAObject  table
            Hide
            jbosch Jim Bosch added a comment - - edited

            The inclusion of a DIAForcedSource table LDM-153 looks to me like it might be a mistake; it at least merits some discussion.  I just glanced at that doc now, but a diagram of table relationships at the beginning suggests that DIAForcedSource means "forced photometry on difference images at the positions of DIAObjects", while a description later clearly states that it's "forced photometry on difference images at the positions of Objects".

            We have definitely decided (though possibly not recorded) that we will do "forced photometry on difference images at the positions of Objects", but that doesn't need a new table - we could just have columns for both direct-image and difference-image forced photometry in the ForcedSource table.  That seems (to my naive eyes) more efficient, since it doesn't duplicate the ID fields the way two separate tables would, but of course a DB expert should weigh in on that.

            Doing forced photometry at the positions of DIAObjects in DRP should be unnecessary, as the set of DIAObjects will be a subset of the set of Objects, and I believe we can and should settle on a single best (per-Object) centroid to use for all kinds of forced photometry.

            Show
            jbosch Jim Bosch added a comment - - edited The inclusion of a DIAForcedSource table LDM-153 looks to me like it might be a mistake; it at least merits some discussion.  I just glanced at that doc now, but a diagram of table relationships at the beginning suggests that DIAForcedSource means "forced photometry on difference images at the positions of DIAObjects", while a description later clearly states that it's "forced photometry on difference images at the positions of Objects". We have definitely decided (though possibly not recorded) that we will do "forced photometry on difference images at the positions of Objects", but that doesn't need a new table - we could just have columns for both direct-image and difference-image forced photometry in the ForcedSource table.  That seems (to my naive eyes) more efficient, since it doesn't duplicate the ID fields the way two separate tables would, but of course a DB expert should weigh in on that. Doing forced photometry at the positions of DIAObjects in DRP  should be unnecessary, as the set of DIAObjects will be a subset of the set of Objects, and I believe we can and should settle on a single best (per-Object) centroid to use for all kinds of forced photometry.
            Hide
            swinbank John Swinbank added a comment -

            The issue of DIAForcedSources came up last week at the PCW.

            Note that LSE-163 does provide for DIAForcedSources in prompt processing in footnote 26 on page 12:

            For the purposes of this document, we’re treating the DIASources generated by forced photometry or precov-
            ery measurements to be the same as DIASources detected in difference images (but flagged appropriately). In the logical schema, these may be divided into two separate tables.

            In other words, even though some DIASources may be forced, LSE-163 at least suggests that they need not be stored in a separate table. Given that, it's arguably fine not to add a DIAForcedSource table for prompt processing: that seems like an implementation detail.

            The baseline database schema does contain a DIAForcedSource table. It describes it as:

            Forced-photometry source measurement on an individual difference Exposure based on a Multifit shape model derived from a deep detection

            That's clearly something that would happen in data release, rather than prompt, processing: per the DPDD, in prompt processing we will only perform forced point-source photometry at the position of DIAObjects. I'll defer to Jim Bosch on whether this DIAForcedSource table is actually necessary in DRP.

            Show
            swinbank John Swinbank added a comment - The issue of DIAForcedSources came up last week at the PCW. Note that LSE-163 does provide for DIAForcedSources in prompt processing in footnote 26 on page 12: For the purposes of this document, we’re treating the DIASources generated by forced photometry or precov- ery measurements to be the same as DIASources detected in difference images (but flagged appropriately). In the logical schema, these may be divided into two separate tables. In other words, even though some DIASources may be forced, LSE-163 at least suggests that they need not be stored in a separate table. Given that, it's arguably fine not to add a DIAForcedSource table for prompt processing: that seems like an implementation detail. The baseline database schema does contain a DIAForcedSource table. It describes it as: Forced-photometry source measurement on an individual difference Exposure based on a Multifit shape model derived from a deep detection That's clearly something that would happen in data release, rather than prompt, processing: per the DPDD, in prompt processing we will only perform forced point-source photometry at the position of DIAObjects. I'll defer to Jim Bosch on whether this DIAForcedSource table is actually necessary in DRP.
            Hide
            ctslater Colin Slater added a comment -

            I agree with Jim Bosch's statements about DRP. This is another case where our time-domain data model differs in DRP from AP, which is why I've been particularly unhappy with statements saying that DRP will "reprocess" the formerly-called-Level 1 dataset. Having a clear discussion about the DRP time-domain model was the original purpose for this ticket.

            [paragraph deleted as John Swinbank beat me to the discussion of DIASource vs DIAForceSource in AP]. I agree that it would be very useful to have a  decision made and recorded in the DPDD on where AP force photometry lives and what measurements need to be made, and we can flow down physical schemas from there (Our force photometry tables are typically very narrow, but could merit a wider set of columns in AP?). Eric Bellm might want to propose a resolution? A new ticket for the AP issue may help organize the separate questions here.

             

             

            Show
            ctslater Colin Slater added a comment - I agree with Jim Bosch 's statements about DRP. This is another case where our time-domain data model differs in DRP from AP, which is why I've been particularly unhappy with statements saying that DRP will "reprocess" the formerly-called-Level 1 dataset. Having a clear discussion about the DRP time-domain model was the original purpose for this ticket. [paragraph deleted as John Swinbank beat me to the discussion of DIASource vs DIAForceSource in AP]. I agree that it would be very useful to have a  decision made and recorded in the DPDD on where AP force photometry lives and what measurements need to be made, and we can flow down physical schemas from there (Our force photometry tables are typically very narrow, but could merit a wider set of columns in AP?).  Eric Bellm  might want to propose a resolution? A new ticket for the AP issue may help organize the separate questions here.    
            Hide
            swinbank John Swinbank added a comment - - edited

            ...it would be very useful to have a  decision made and recorded in the DPDD on where AP force photometry lives... Eric Bellm might want to propose a resolution?

            Eric may disagree, but this reads to me more like a question of database implementation than of scientific utility. I'd start by asking the DAX team to weigh in on performance (or other) implications.

            Show
            swinbank John Swinbank added a comment - - edited ...it would be very useful to have a  decision made and recorded in the DPDD on where AP force photometry lives... Eric Bellm might want to propose a resolution? Eric may disagree, but this reads to me more like a question of database implementation than of scientific utility. I'd start by asking the DAX team to weigh in on performance (or other) implications.
            Hide
            ctslater Colin Slater added a comment -

            Sorry, to be more precise, the question is: Do AP force photometry records mostly "look like" full DIASource records, or do they look more like (as an example) ForceSource records with only one or two fluxes. Similarly, should queries for DIASources also potentially include force photometry (filterable with a flag if the user doesn't want it), or should the user need to run a separate query for force photometry? 

            Show
            ctslater Colin Slater added a comment - Sorry, to be more precise, the question is: Do AP force photometry records mostly "look like" full DIASource records, or do they look more like (as an example) ForceSource records with only one or two fluxes. Similarly, should queries for DIASources also potentially include force photometry (filterable with a flag if the user doesn't want it), or should the user need to run a separate query for force photometry? 
            Hide
            ebellm Eric Bellm added a comment -

            I think the question of where to put AP forced photometry in the PPDB depends in part on whether past forced photometry records are included in alerts. It's clear that precovery and forced photometry measurements on the current visit are not. But if forced sources are just rows in the DIASource table with forced=True, it would be trivial to include any that exist in the past 12 months of lightcurve history. This would be scientifically far better than only reporting crude upper limits (see https://jira.lsstcorp.org/browse/RFC-348, which I think Leanne Guy is also shepherding to the CCB...), and since we'd be replacing an upper limit with a measurement the packet size would not be strongly affected. If folks are amenable I'll RFC this.

            I agree with John Swinbank that whether the DIAForcedSource table exists separately is likely an implementation detail. Since we are only forcing PSF fluxes the table will be narrower.

            Show
            ebellm Eric Bellm added a comment - I think the question of where to put AP forced photometry in the PPDB depends in part on whether past forced photometry records are included in alerts. It's clear that precovery and forced photometry measurements on the current visit are not. But if forced sources are just rows in the DIASource table with forced=True, it would be trivial to include any that exist in the past 12 months of lightcurve history. This would be scientifically far better than only reporting crude upper limits (see https://jira.lsstcorp.org/browse/RFC-348 , which I think Leanne Guy is also shepherding to the CCB...), and since we'd be replacing an upper limit with a measurement the packet size would not be strongly affected. If folks are amenable I'll RFC this. I agree with John Swinbank that whether the DIAForcedSource table exists separately is likely an implementation detail. Since we are only forcing PSF fluxes the table will be narrower.
            Hide
            ebellm Eric Bellm added a comment -

            My answer to Colin Slater's questions would be:

            DIAForcedSource records don't look like DIASource records (they don't include dipole or trailed source fits/fluxes, spuriousness, etc.)

            It would be convenient for users to be able to retrieve both DIASource and forced photometry at once, but I don't think it's essential.

            Show
            ebellm Eric Bellm added a comment - My answer to Colin Slater 's questions would be: DIAForcedSource records don't look like DIASource records (they don't include dipole or trailed source fits/fluxes, spuriousness, etc.) It would be convenient for users to be able to retrieve both DIASource and forced photometry at once, but I don't think it's essential.
            Hide
            swinbank John Swinbank added a comment -

            ... whether past forced photometry records are included in alerts ...

            Given that the DPDD as currently written:

            • Includes forced photometry in the DIASource table;
            • States that alert packets include the “previous 12 months of DIASource records”, without qualification;

            I'd suggest that the naïve answer to that is that they should be, and I'm not even sure it needs an RFC (although I support making one, for clarity).

            Show
            swinbank John Swinbank added a comment - ... whether past forced photometry records are included in alerts ... Given that the DPDD as currently written: Includes forced photometry in the DIASource table; States that alert packets include the “previous 12 months of DIASource records”, without qualification; I'd suggest that the naïve answer to that is that they should be, and I'm not even sure it needs an RFC (although I support making one, for clarity).
            lguy Leanne Guy made changes -
            Remote Link This issue links to "Page (Confluence)" [ 17855 ]
            lguy Leanne Guy made changes -
            Remote Link This issue links to "Page (Confluence)" [ 17855 ]
            Hide
            lguy Leanne Guy added a comment - - edited

            This topic was discussed at length at the dm-sst meeting 2018-08-24. A separate RFC will be created to address the PPDB tables DIASource/DIAForcedSource. 

            Show
            lguy Leanne Guy added a comment - - edited This topic was discussed at length at the dm-sst meeting 2018-08-24 . A separate RFC will be created to address the PPDB tables DIASource/DIAForcedSource. 
            lguy Leanne Guy made changes -
            Due Date 01/Aug/18 14/Sep/18
            frossie Frossie Economou made changes -
            Status Admin Review [ 3 ] In Progress [ 11605 ]
            frossie Frossie Economou made changes -
            Status Review [ 11605 ] In Progress [ 3 ]
            lguy Leanne Guy made changes -
            Labels LSE-163 dm-sst LSE-163
            lguy Leanne Guy made changes -
            Status In Progress [ 3 ] To Do [ 10001 ]
            lguy Leanne Guy made changes -
            Rank Ranked higher
            lguy Leanne Guy made changes -
            Remote Link This issue links to "Page (Confluence)" [ 28326 ]

              People

              Assignee:
              lguy Leanne Guy
              Reporter:
              ctslater Colin Slater
              Watchers:
              Colin Slater, Eric Bellm, Gregory Dubois-Felsmann, Jim Bosch, John Swinbank, Leanne Guy, Zeljko Ivezic
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Due:
                Created:
                Updated:

                  Jenkins

                  No builds found.