Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15307

Include Sources in Prompt Products Database

    Details

    • Type: Story
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Design Documents
    • Labels:
    • Team:
      DM Science

      Description

      Right now, the DPDD does not say that we will save Sources from science images during alert production; we only save DIASources and DIAObjects, so there the direct-image results are not guaranteed to be in a database until DRP.

      I'm opening this ticket to track potential use cases where Sources are important in AP, or potentially to track any reasons not to save Sources.

        Attachments

          Activity

          Hide
          ctslater Colin Slater added a comment -

          One case from a science collaboration member: they would like to (as I understand it) forward model supernova light curves on the direct images, not the difference images. To do so they need astrometric/photometric transformations between various pairs of images, which their past experience suggests can be more precise if the transformations are computed directly between the images rather than each being calibrated to an absolute frame.

           

          Totally switching gears: there are almost certainly dwarf galaxies that will be discoverable in a single epoch of LSST imaging. The standard way to find them is by making appropriate color cuts on the source catalogs and find overdensities of main sequence turn-off stars or giants. Because there are strong incentives to be the first person to discover the dwarfs, it is very likely that users will not wait for a data release and will instead try to download and photometer images on their own if source catalogs are not available (this happened with DES).

           

           

           

          Show
          ctslater Colin Slater added a comment - One case from a science collaboration member: they would like to (as I understand it) forward model supernova light curves on the direct images, not the difference images. To do so they need astrometric/photometric transformations between various pairs of images, which their past experience suggests can be more precise if the transformations are computed directly between the images rather than each being calibrated to an absolute frame.   Totally switching gears: there are almost certainly dwarf galaxies that will be discoverable in a single epoch of LSST imaging. The standard way to find them is by making appropriate color cuts on the source catalogs and find overdensities of main sequence turn-off stars or giants. Because there are strong incentives to be the first person to discover the dwarfs, it is very likely that users will not wait for a data release and will instead try to download and photometer images on their own if source catalogs are not available (this happened with DES).      
          Hide
          ebellm Eric Bellm added a comment - - edited

          Are we (or could we) store direct image Source catalogs on disk? This would enable both of those use cases without requiring additions to the PPDB. The number of Sources is much larger than the number of DIASources because most objects are not variable at LSST precision, so I am concerned about the implications for the PPDB sizing.

          Show
          ebellm Eric Bellm added a comment - - edited Are we (or could we) store direct image Source catalogs on disk? This would enable both of those use cases without requiring additions to the PPDB. The number of Sources is much larger than the number of DIASources because most objects are not variable at LSST precision, so I am concerned about the implications for the PPDB sizing.
          Hide
          ctslater Colin Slater added a comment -

          We could definitely just store catalogs in files on disk instead of in the database, and in many ways that might be the better solution. The one problem is that we have never specified a DPDD-level data product that is only delivered in this way; everything is generally a table in a database or an image (or attached to an image). Going that route would make the storage side easier but requires more work on the story for how users will access the data. I'll do some asking-around at all hands and see what that might look like on the technical side.

          Show
          ctslater Colin Slater added a comment - We could definitely just store catalogs in files on disk instead of in the database, and in many ways that might be the better solution. The one problem is that we have never specified a DPDD-level data product that is only delivered in this way; everything is generally a table in a database or an image (or attached to an image). Going that route would make the storage side easier but requires more work on the story for how users will access the data. I'll do some asking-around at all hands and see what that might look like on the technical side.
          Hide
          ktl Kian-Tat Lim added a comment -

          Sources, even for only a year (or 18 months) until the next DR comes out, are still over a PB including various overheads.  The addition of extra data products of this size needs careful consideration.

          If we could somehow keep only the first Source at any given position, that would satisfy the dwarf galaxy use case without needing as much storage.

          Show
          ktl Kian-Tat Lim added a comment - Sources, even for only a year (or 18 months) until the next DR comes out, are still over a PB including various overheads.  The addition of extra data products of this size needs careful consideration. If we could somehow keep only the first Source at any given position, that would satisfy the dwarf galaxy use case without needing as much storage.
          Hide
          gpdf Gregory Dubois-Felsmann added a comment -

          Pretty worried about erratic coverage quality if we literally did that, with some parts of the sky poorly covered because their first observation was on a bad night. But maybe that's OK given that uniform coverage is available relatively soon thereafter - the early discoveries might be misleading because of non-uniform sky coverage, but still interesting individually, and then the picture would be clarified at the next DR.

          Show
          gpdf Gregory Dubois-Felsmann added a comment - Pretty worried about erratic coverage quality if we literally did that, with some parts of the sky poorly covered because their first observation was on a bad night. But maybe that's OK given that uniform coverage is available relatively soon thereafter - the early discoveries might be misleading because of non-uniform sky coverage, but still interesting individually, and then the picture would be clarified at the next DR.
          Hide
          ebellm Eric Bellm added a comment -

          Regarding Colin Slater's initial use case of forward-modeling lightcurves, I would be more than happy to let that be in user-generated processing--unlike the dwarf galaxy case there are a finite number of supernovae (and hence images) you'd need to do it for. It's not even clear to me that there's urgency to do it before the Data Release, which would yield the highest-precision results anyway.

          Show
          ebellm Eric Bellm added a comment - Regarding Colin Slater 's initial use case of forward-modeling lightcurves, I would be more than happy to let that be in user-generated processing--unlike the dwarf galaxy case there are a finite number of supernovae (and hence images) you'd need to do it for. It's not even clear to me that there's urgency to do it before the Data Release, which would yield the highest-precision results anyway.
          Hide
          ctslater Colin Slater added a comment -

          That's a fair point on the data volume; I had not looked up the numbers myself. I could imagine another minimal solution would be to store ~14 or 30 days of Sources. (One could reasonably argue that if you really need Sources from 6 months ago, you should just wait for the DR.) There might also be diagnostic merit in having that data, at least in the early stages of the survey. 

          Show
          ctslater Colin Slater added a comment - That's a fair point on the data volume; I had not looked up the numbers myself. I could imagine another minimal solution would be to store ~14 or 30 days of Sources. (One could reasonably argue that if you really need Sources from 6 months ago, you should just wait for the DR.) There might also be diagnostic merit in having that data, at least in the early stages of the survey. 

            People

            • Assignee:
              ctslater Colin Slater
              Reporter:
              ctslater Colin Slater
              Watchers:
              Colin Slater, Eric Bellm, Gregory Dubois-Felsmann, John Swinbank, Kian-Tat Lim, Leanne Guy
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:

                Summary Panel