Details
-
Type:
Story
-
Status: To Do
-
Resolution: Unresolved
-
Fix Version/s: None
-
Component/s: DM Subsystem Science, Requirements Documents
-
Labels:
-
Team:DM Science
Description
The DPDD currently lists schemas for Object, Source, and ForcedSource under §4.3 - "The Level 2 Catalogs". This omits the DIASource and DIAObject tables, which LDM-151 says will also be generated during data release production.
A discussion of these tables should be added to the DRP section of the DPDD so that their rationale and usage can be explained, particularly in relation to other characterizations of variable objects in the Object and prompt-processing-generated DIAObject tables. Alternatively, if either of these tables are meant to be for internal usage only (e.g. solely as inputs to Object), LDM-151 should make that clear.
Assigning to Mario Juric with the intention that it will be reassigned to the next subsystem scientist.
Attachments
Issue Links
Activity
I need more clarification from the reporter (Colin) and Jim. In DPDD, Section 4.2 about DRP says in bullet 5:
"The next stage in the pipeline, which we will for simplicity just call the deblender, will synthesize a list of unique objects. In doing so it will consider the catalogs of CoaddSources, catalogs of DIASources, DIAObjects and SSObjects detected on difference images, and objects from external catalogs."
It's not clear from this text if DIASources, DIAObjects and SSObjects come from the real-time AP processing, or if the AP code is rerun during DRP.
I need Jim to provide more details about what is planned in this context, and what he would like to be added to DPDD to address Colin's comments.
No DRP data products will come from the real-time AP processing. SSObject would be the one worth considering, but I believe Mario Juric wants to make the DRP SSObjects completely independent so they can be used to estimate LSST-only selection functions.
My best guess right now is that the quantities we measure specifically on DRP DIAObjects will be essentially the same as those measured on AP DIAObjects, but each DIAObject will also be associated with an Object, and some Object measurements may scientifically supersede some DIAObject measurements; I haven't really thought much about this in detail.
The low-level code to generate DRP DIASources will be a variant of the AP code. The processing steps that proceed it will be quite different, but not in ways that should significantly effect the DIASource detections or measurements. I don't really know how much the DRP DIAObject association code can have in common with the AP DIAObject association code, because AP DIAObjects are built incrementally and DRP DIAObjects are built all at once.
SSObject would be the one worth considering, but I believe Mario Juric wants to make the DRP SSObjects completely independent so they can be used to estimate LSST-only selection functions.
Correct – the changes we're looking wrt. SSObjects (the orbit catalog) in AP would make them a product we import from the Minor Planet Center; in DRP, however, the proposal will be to still build our own orbit catalog. See http://ls.st/presentation-582 for more detail (an RFC will follow once the details are finalized).
Just took a closer look at the DPDD section on DIAObject, and here's a more thorough report on how I think DRP DIAObjects ought to differ from Prompt ones:
diaObjectId: could and probably should just be an objectId (all DIAObjects will be Objects, but not all Objects will be DIAObjects).
radec, parallax, pm*, psFlux*: Object.ps* fields should usually provide better-measured versions of all of these quantities (more optimal estimators, at least in uncrowded regions), but I'm sure we'll find cases (crowded fields?) where these are more robust. And having two very different ways to measure these quantities is good for diagnostic purposes.
fpFlux*: DRP is currently slated to do both difference image forced photometry and direct forced photometry (though I believe Robert Lupton thinks the former will almost certainly be uniformly better, and I tentatively agree). As long as these are a mean of the difference image forced photometry, they're quite useful (see https://community.lsst.org/t/data-model-for-variable-sources-at-time-of-data-releases/2695/5) as an indication of the offset between the flux in the difference image template and the flux measured in all epochs. We should probably document this as its main purpose in DRP (I think this is not a role it can play in Prompt, where the forced photometry doesn't extend to all epochs and there are no deep fluxes to relate to anyway). I don't think a version of this that averages direct forced photometry would be useful, unless it turns out to be a diagnostic on how well we can "project" coadd deblending results to single-epoch images when we do the direct forced photometry. We should also definitely move these columns to Object in DRP, so that Object includes all quantities derived from ForcedSources and DIAObject includes only quantities derived from DIASources, and so we make these measurements for Objects that are not DIAObjects.
lcPeriodic, lcNonPeriodic: Unless the DIAObject quantities are measured from DIASources instead of difference-image ForcedSources (not clear from the text, but I think these should almost certainly be measured on ForcedSources), these are exact duplicates of the quantities in the Object table and need not be repeated here.
nearbyObj:* if we just use objectId instead of diaObjectId above, these are not needed at all. If we still have a separate diaObjectId as the primary key, we should only need one objectId here because the association will be unique, and we should just call it "objectId" or maybe "derivedObjectId" instead of "nearbyObjectId". In either case I don't think the distance or association probability is meaningful, as the Object will have been created from the DIAObject.
Note that LDM-151 is not normative in this regard: it has to reflect the DPDD, rather than vice versa.
However, my reading of the DPDD (§3.3.5), the DMSR (DMS-REQ-0325) and the OSS (OSS-REQ-0135) is that we are required to regenerate and publish all prompt products (except alerts) as part of the data release. Since the table schemas should be ~the same, duplicating them in the DPDD doesn't seem useful, but perhaps simply adding a note that the prompt schemas are included in the the data release section by reference would help clarify?
In addressing this ticket, I notice that the DPDD also does not mention the DIAForcedSource table in Section 3: Prompt Data Products. Section 4: Data Release Data Products does describe the ForcedSource table. I would propose to expand the scope of this ticket to include addition of the schema for the DIAForcedSource (described in LDM-153) in Section 3.3 and reference it in Section 4 as for DIASource and DIAObject table
The inclusion of a DIAForcedSource table LDM-153 looks to me like it might be a mistake; it at least merits some discussion. I just glanced at that doc now, but a diagram of table relationships at the beginning suggests that DIAForcedSource means "forced photometry on difference images at the positions of DIAObjects", while a description later clearly states that it's "forced photometry on difference images at the positions of Objects".
We have definitely decided (though possibly not recorded) that we will do "forced photometry on difference images at the positions of Objects", but that doesn't need a new table - we could just have columns for both direct-image and difference-image forced photometry in the ForcedSource table. That seems (to my naive eyes) more efficient, since it doesn't duplicate the ID fields the way two separate tables would, but of course a DB expert should weigh in on that.
Doing forced photometry at the positions of DIAObjects in DRP should be unnecessary, as the set of DIAObjects will be a subset of the set of Objects, and I believe we can and should settle on a single best (per-Object) centroid to use for all kinds of forced photometry.
The issue of DIAForcedSources came up last week at the PCW.
Note that LSE-163 does provide for DIAForcedSources in prompt processing in footnote 26 on page 12:
For the purposes of this document, we’re treating the DIASources generated by forced photometry or precov-
ery measurements to be the same as DIASources detected in difference images (but flagged appropriately). In the logical schema, these may be divided into two separate tables.
In other words, even though some DIASources may be forced, LSE-163 at least suggests that they need not be stored in a separate table. Given that, it's arguably fine not to add a DIAForcedSource table for prompt processing: that seems like an implementation detail.
The baseline database schema does contain a DIAForcedSource table. It describes it as:
Forced-photometry source measurement on an individual difference Exposure based on a Multifit shape model derived from a deep detection
That's clearly something that would happen in data release, rather than prompt, processing: per the DPDD, in prompt processing we will only perform forced point-source photometry at the position of DIAObjects. I'll defer to Jim Bosch on whether this DIAForcedSource table is actually necessary in DRP.
I agree with Jim Bosch's statements about DRP. This is another case where our time-domain data model differs in DRP from AP, which is why I've been particularly unhappy with statements saying that DRP will "reprocess" the formerly-called-Level 1 dataset. Having a clear discussion about the DRP time-domain model was the original purpose for this ticket.
[paragraph deleted as John Swinbank beat me to the discussion of DIASource vs DIAForceSource in AP]. I agree that it would be very useful to have a decision made and recorded in the DPDD on where AP force photometry lives and what measurements need to be made, and we can flow down physical schemas from there (Our force photometry tables are typically very narrow, but could merit a wider set of columns in AP?). Eric Bellm might want to propose a resolution? A new ticket for the AP issue may help organize the separate questions here.
...it would be very useful to have a decision made and recorded in the DPDD on where AP force photometry lives... Eric Bellm might want to propose a resolution?
Eric may disagree, but this reads to me more like a question of database implementation than of scientific utility. I'd start by asking the DAX team to weigh in on performance (or other) implications.
Sorry, to be more precise, the question is: Do AP force photometry records mostly "look like" full DIASource records, or do they look more like (as an example) ForceSource records with only one or two fluxes. Similarly, should queries for DIASources also potentially include force photometry (filterable with a flag if the user doesn't want it), or should the user need to run a separate query for force photometry?
I think the question of where to put AP forced photometry in the PPDB depends in part on whether past forced photometry records are included in alerts. It's clear that precovery and forced photometry measurements on the current visit are not. But if forced sources are just rows in the DIASource table with forced=True, it would be trivial to include any that exist in the past 12 months of lightcurve history. This would be scientifically far better than only reporting crude upper limits (see https://jira.lsstcorp.org/browse/RFC-348, which I think Leanne Guy is also shepherding to the CCB...), and since we'd be replacing an upper limit with a measurement the packet size would not be strongly affected. If folks are amenable I'll RFC this.
I agree with John Swinbank that whether the DIAForcedSource table exists separately is likely an implementation detail. Since we are only forcing PSF fluxes the table will be narrower.
My answer to Colin Slater's questions would be:
DIAForcedSource records don't look like DIASource records (they don't include dipole or trailed source fits/fluxes, spuriousness, etc.)
It would be convenient for users to be able to retrieve both DIASource and forced photometry at once, but I don't think it's essential.
... whether past forced photometry records are included in alerts ...
Given that the DPDD as currently written:
- Includes forced photometry in the DIASource table;
- States that alert packets include the “previous 12 months of DIASource records”, without qualification;
I'd suggest that the naïve answer to that is that they should be, and I'm not even sure it needs an RFC (although I support making one, for clarity).
This topic was discussed at length at the dm-sst meeting 2018-08-24. A separate RFC will be created to address the PPDB tables DIASource/DIAForcedSource.
Reassigning to Zeljko Ivezic, on the basis that he is acting DMSS until Mario' is appointed.