Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15307

Include Sources in Prompt Products Database

    XMLWordPrintable

    Details

    • Type: Story
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Design Documents
    • Labels:
    • Team:
      DM Science

      Description

      Right now, the DPDD does not say that we will save Sources from science images during alert production; we only save DIASources and DIAObjects, so there the direct-image results are not guaranteed to be in a database until DRP.

      I'm opening this ticket to track potential use cases where Sources are important in AP, or potentially to track any reasons not to save Sources.

        Attachments

          Issue Links

            Activity

            No builds found.
            ctslater Colin Slater created issue -
            ctslater Colin Slater made changes -
            Field Original Value New Value
            Risk Score 0
            Hide
            ctslater Colin Slater added a comment -

            One case from a science collaboration member: they would like to (as I understand it) forward model supernova light curves on the direct images, not the difference images. To do so they need astrometric/photometric transformations between various pairs of images, which their past experience suggests can be more precise if the transformations are computed directly between the images rather than each being calibrated to an absolute frame.

             

            Totally switching gears: there are almost certainly dwarf galaxies that will be discoverable in a single epoch of LSST imaging. The standard way to find them is by making appropriate color cuts on the source catalogs and find overdensities of main sequence turn-off stars or giants. Because there are strong incentives to be the first person to discover the dwarfs, it is very likely that users will not wait for a data release and will instead try to download and photometer images on their own if source catalogs are not available (this happened with DES).

             

             

             

            Show
            ctslater Colin Slater added a comment - One case from a science collaboration member: they would like to (as I understand it) forward model supernova light curves on the direct images, not the difference images. To do so they need astrometric/photometric transformations between various pairs of images, which their past experience suggests can be more precise if the transformations are computed directly between the images rather than each being calibrated to an absolute frame.   Totally switching gears: there are almost certainly dwarf galaxies that will be discoverable in a single epoch of LSST imaging. The standard way to find them is by making appropriate color cuts on the source catalogs and find overdensities of main sequence turn-off stars or giants. Because there are strong incentives to be the first person to discover the dwarfs, it is very likely that users will not wait for a data release and will instead try to download and photometer images on their own if source catalogs are not available (this happened with DES).      
            Hide
            ebellm Eric Bellm added a comment - - edited

            Are we (or could we) store direct image Source catalogs on disk? This would enable both of those use cases without requiring additions to the PPDB. The number of Sources is much larger than the number of DIASources because most objects are not variable at LSST precision, so I am concerned about the implications for the PPDB sizing.

            Show
            ebellm Eric Bellm added a comment - - edited Are we (or could we) store direct image Source catalogs on disk? This would enable both of those use cases without requiring additions to the PPDB. The number of Sources is much larger than the number of DIASources because most objects are not variable at LSST precision, so I am concerned about the implications for the PPDB sizing.
            Hide
            ctslater Colin Slater added a comment -

            We could definitely just store catalogs in files on disk instead of in the database, and in many ways that might be the better solution. The one problem is that we have never specified a DPDD-level data product that is only delivered in this way; everything is generally a table in a database or an image (or attached to an image). Going that route would make the storage side easier but requires more work on the story for how users will access the data. I'll do some asking-around at all hands and see what that might look like on the technical side.

            Show
            ctslater Colin Slater added a comment - We could definitely just store catalogs in files on disk instead of in the database, and in many ways that might be the better solution. The one problem is that we have never specified a DPDD-level data product that is only delivered in this way; everything is generally a table in a database or an image (or attached to an image). Going that route would make the storage side easier but requires more work on the story for how users will access the data. I'll do some asking-around at all hands and see what that might look like on the technical side.
            gpdf Gregory Dubois-Felsmann made changes -
            Labels dm-sst lse-163 LSE-163 dm-sst
            Hide
            ktl Kian-Tat Lim added a comment -

            Sources, even for only a year (or 18 months) until the next DR comes out, are still over a PB including various overheads.  The addition of extra data products of this size needs careful consideration.

            If we could somehow keep only the first Source at any given position, that would satisfy the dwarf galaxy use case without needing as much storage.

            Show
            ktl Kian-Tat Lim added a comment - Sources, even for only a year (or 18 months) until the next DR comes out, are still over a PB including various overheads.  The addition of extra data products of this size needs careful consideration. If we could somehow keep only the first Source at any given position, that would satisfy the dwarf galaxy use case without needing as much storage.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            Pretty worried about erratic coverage quality if we literally did that, with some parts of the sky poorly covered because their first observation was on a bad night. But maybe that's OK given that uniform coverage is available relatively soon thereafter - the early discoveries might be misleading because of non-uniform sky coverage, but still interesting individually, and then the picture would be clarified at the next DR.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - Pretty worried about erratic coverage quality if we literally did that, with some parts of the sky poorly covered because their first observation was on a bad night. But maybe that's OK given that uniform coverage is available relatively soon thereafter - the early discoveries might be misleading because of non-uniform sky coverage, but still interesting individually, and then the picture would be clarified at the next DR.
            Hide
            ebellm Eric Bellm added a comment -

            Regarding Colin Slater's initial use case of forward-modeling lightcurves, I would be more than happy to let that be in user-generated processing--unlike the dwarf galaxy case there are a finite number of supernovae (and hence images) you'd need to do it for. It's not even clear to me that there's urgency to do it before the Data Release, which would yield the highest-precision results anyway.

            Show
            ebellm Eric Bellm added a comment - Regarding Colin Slater 's initial use case of forward-modeling lightcurves, I would be more than happy to let that be in user-generated processing--unlike the dwarf galaxy case there are a finite number of supernovae (and hence images) you'd need to do it for. It's not even clear to me that there's urgency to do it before the Data Release, which would yield the highest-precision results anyway.
            lguy Leanne Guy made changes -
            Due Date 30/Nov/18
            Hide
            ctslater Colin Slater added a comment -

            That's a fair point on the data volume; I had not looked up the numbers myself. I could imagine another minimal solution would be to store ~14 or 30 days of Sources. (One could reasonably argue that if you really need Sources from 6 months ago, you should just wait for the DR.) There might also be diagnostic merit in having that data, at least in the early stages of the survey. 

            Show
            ctslater Colin Slater added a comment - That's a fair point on the data volume; I had not looked up the numbers myself. I could imagine another minimal solution would be to store ~14 or 30 days of Sources. (One could reasonably argue that if you really need Sources from 6 months ago, you should just wait for the DR.) There might also be diagnostic merit in having that data, at least in the early stages of the survey. 
            lguy Leanne Guy made changes -
            Labels LSE-163 dm-sst LSE-163
            lguy Leanne Guy made changes -
            Remote Link This issue links to "Page (Confluence)" [ 28320 ]
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            Repeating a point from the SST meeting today: this question is also tightly bound with the "what payload are we actually running on new data in LOY1?" question. For sky regions where we don't have templates yet, will there be any catalog data products at all, then, or just calibrated images (that we only keep around for 30 days before dropping them and leaving only lossy-compressed versions)? It's clear that there would be even more demand for Source-like data in some of these cases?

            Some of this was already discussed in DMTN-107, Options for Alert Production in LSST Operations Year 1.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - Repeating a point from the SST meeting today: this question is also tightly bound with the "what payload are we actually running on new data in LOY1?" question. For sky regions where we don't have templates yet, will there be any catalog data products at all, then, or just calibrated images (that we only keep around for 30 days before dropping them and leaving only lossy-compressed versions)? It's clear that there would be even more demand for Source -like data in some of these cases? Some of this was already discussed in DMTN-107, Options for Alert Production in LSST Operations Year 1 .
            Hide
            mjuric Mario Juric added a comment -

            This was raised at the 2021 PCW today, so I wanted to chime in that for some science cases (e.g., searching for Solar System objects where there are no templates) even severely paired down source catalogs (just ra, dec, psfMag) would be extremely useful (and even if delivered as a series of FITS files). More generally, if the volume of the data is an issue, I suspect a subset of columns may satisfy a number of user cases discussed here.

            Show
            mjuric Mario Juric added a comment - This was raised at the 2021 PCW today, so I wanted to chime in that for some science cases (e.g., searching for Solar System objects where there are no templates) even severely paired down source catalogs (just ra, dec, psfMag) would be extremely useful (and even if delivered as a series of FITS files). More generally, if the volume of the data is an issue, I suspect a subset of columns may satisfy a number of user cases discussed here.
            Hide
            lguy Leanne Guy added a comment -

            I'll add this to a DM-SST meeting in September to discuss

            Show
            lguy Leanne Guy added a comment - I'll add this to a DM-SST meeting in September to discuss
            lguy Leanne Guy made changes -
            Remote Link This issue links to "Page (Confluence)" [ 29434 ]
            Hide
            zivezic Zeljko Ivezic added a comment -

            Just in case I miss DM-SST meeting - it seems to me that it would be very useful for variety of reasons to have Source catalogs for the last k months (k~1) available in some form. As for the data volume, we can pair down the columns as Mario already said, and also play rounding tricks, similar to those that were used with early versions of the SDSS quasar catalog (e.g., don't store mag=12.3456789 but rather 12.35 or 12.346). 

            Show
            zivezic Zeljko Ivezic added a comment - Just in case I miss DM-SST meeting - it seems to me that it would be very useful  for variety of reasons to have Source catalogs for the last k months (k~1) available in some form. As for the data volume, we can pair down the columns as Mario already said, and also play rounding tricks, similar to those that were used with early versions of the SDSS quasar catalog (e.g., don't store mag=12.3456789 but rather 12.35 or 12.346). 
            lguy Leanne Guy made changes -
            Remote Link This issue links to "Page (Confluence)" [ 30402 ]

              People

              Assignee:
              ctslater Colin Slater
              Reporter:
              ctslater Colin Slater
              Watchers:
              Colin Slater, Eric Bellm, Gregory Dubois-Felsmann, Kian-Tat Lim, Leanne Guy, Mario Juric, Zeljko Ivezic
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Due:
                Created:
                Updated:

                  Jenkins

                  No builds found.