Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-14555

Compare Gen3 Butler schema with CAOM

    XMLWordPrintable

    Details

    • Team:
      Architecture

      Description

      Write up a comment for RFC-484 that compares Jim's Gen3 Butler schema (dmtn-073.lsst.io) with CAOM and documents anything that should be changed to align them.

      Due 2018-06-19, but any major inconsistencies should be identified by 2018-06-01.

        Attachments

          Issue Links

            Activity

            Hide
            bvan Brian Van Klaveren added a comment -

            For CAOM2, the main sort-of problem I see is that it appears Exposures can't be (easily) grouped as a full focal plane observation without a Visit defined, so it appears there's no way to logically group a "snap". That said, that just means an Exposure is not analogous to an Observation in CAOM2, and especially if a Visit includes a single snap, that means a Visit itself, as defined in the Visit Table, can be both a CompositeObservation (e.g. 2x15s Snaps) and a SimpleObservation (e.g. 1x30s Snap). That's probably fine, though it seems there's no easy way of determining which Observation a Visit is without attempting to do a join to the Exposure table and count the unique Snaps. You have to do a join throught the Exposure table in any case (with a few more) to actually determine the equivalent of the Collection field in the CAOM2 Observation. One major question is intent, though I think "intent" in CAOM2 speak is really implicit in the Gen3 definition of a Dataset, as is type and metaRelease.

            In terms of defining an Observation in CAOM via a Gen3 schema, we need to approximately execute the following:

            Create View Observation as Visit JOIN Exposure JOIN Dataset JOIN DatasestCollection (+ a few extras)

            We don't have, within the Gen3 Schema, a way of determining the algorithm used to pick the Snap (scheduler?), nor do we have the environmental information (e.g. Ambient Temperature). It would be necessary to define how the Visits/Snaps join to the EFD to determine this.

            CAOM2 definition for Instrument is roughly equivalent to our definition for Camera. Again, a few joins would be necessary to represent this.

            I think a Plane is roughly equivalent to a Combination of Visit, Exposure, and Dataset - though filter information would need to be converted into Energy information about the Observation/Visit, though it may also be a applicable for coadded images.

            I'm not quite sure how Coadded images/multi-camera observations are easily represented either in the gen3 (are they a logical visit?) or caom2 (though I think they are modeled as multiple planes for an observation, but then it appears the 1:1 relation of attributes of Observation falls apart, e.g. Environment). A full sky image might just be a single CompositeObservation as well.

            The artifacts attribute of a plane is roughly equivalent to the Dataset. One tricky thing is that the "access URL" would likely need to be computed from attributes of a Dataset and translated to an imgserv URL, for example, if we were modeling this as a View. That's not so bad, but it might mean we have multiple views in a database defining CAOM2 Tables (views) for each service instance. That's also not terrible - implementation wise we'd want to probably create a user/tablespace for each individual imgserv service we deploy, which makes sure to materialize access URLs that are relevant for that particular service.

            In short, I think Gen3 is fine for representing PVIs in CAOM2, and can mostly be executed by performing joins. Implementation wise, we'll want those joins to be fast but I believe the Foreign Key relations will make sure to have indices for most of those so that will be fine. I'm not fully sure how we model coadds/full sky images/multi-camera/etc... except to model them as a single composite observation and drop most the attributes of an "Observation" on the floor (we need to investigate prior art, if any). I think we'd potentially need another table in order to represent that. It's probably not easy to wholly adopt CAOM2 for butler gen3, especially as the CAOM2 nearly requires a materialize URL for accessing the data, and we don't have a single service defined to implement that.

            Show
            bvan Brian Van Klaveren added a comment - For CAOM2, the main sort-of problem I see is that it appears Exposures can't be (easily) grouped as a full focal plane observation without a Visit defined, so it appears there's no way to logically group a "snap". That said, that just means an Exposure is not analogous to an Observation in CAOM2, and especially if a Visit includes a single snap, that means a Visit itself, as defined in the Visit Table, can be both a CompositeObservation (e.g. 2x15s Snaps) and a SimpleObservation (e.g. 1x30s Snap). That's probably fine, though it seems there's no easy way of determining which Observation a Visit is without attempting to do a join to the Exposure table and count the unique Snaps. You have to do a join throught the Exposure table in any case (with a few more) to actually determine the equivalent of the Collection field in the CAOM2 Observation. One major question is intent, though I think "intent" in CAOM2 speak is really implicit in the Gen3 definition of a Dataset, as is type and metaRelease. In terms of defining an Observation in CAOM via a Gen3 schema, we need to approximately execute the following: Create View Observation as Visit JOIN Exposure JOIN Dataset JOIN DatasestCollection (+ a few extras) We don't have, within the Gen3 Schema, a way of determining the algorithm used to pick the Snap (scheduler?), nor do we have the environmental information (e.g. Ambient Temperature). It would be necessary to define how the Visits/Snaps join to the EFD to determine this. CAOM2 definition for Instrument is roughly equivalent to our definition for Camera. Again, a few joins would be necessary to represent this. I think a Plane is roughly equivalent to a Combination of Visit, Exposure, and Dataset - though filter information would need to be converted into Energy information about the Observation/Visit, though it may also be a applicable for coadded images. I'm not quite sure how Coadded images/multi-camera observations are easily represented either in the gen3 (are they a logical visit?) or caom2 (though I think they are modeled as multiple planes for an observation, but then it appears the 1:1 relation of attributes of Observation falls apart, e.g. Environment). A full sky image might just be a single CompositeObservation as well. The artifacts attribute of a plane is roughly equivalent to the Dataset. One tricky thing is that the "access URL" would likely need to be computed from attributes of a Dataset and translated to an imgserv URL, for example, if we were modeling this as a View. That's not so bad, but it might mean we have multiple views in a database defining CAOM2 Tables (views) for each service instance. That's also not terrible - implementation wise we'd want to probably create a user/tablespace for each individual imgserv service we deploy, which makes sure to materialize access URLs that are relevant for that particular service. In short, I think Gen3 is fine for representing PVIs in CAOM2, and can mostly be executed by performing joins. Implementation wise, we'll want those joins to be fast but I believe the Foreign Key relations will make sure to have indices for most of those so that will be fine. I'm not fully sure how we model coadds/full sky images/multi-camera/etc... except to model them as a single composite observation and drop most the attributes of an "Observation" on the floor (we need to investigate prior art, if any). I think we'd potentially need another table in order to represent that. It's probably not easy to wholly adopt CAOM2 for butler gen3, especially as the CAOM2 nearly requires a materialize URL for accessing the data, and we don't have a single service defined to implement that.
            Hide
            ktl Kian-Tat Lim added a comment -

            "Intent" in terms of which scheduler "proposal" "asked" for the image to be taken will come from the EFD, as will a bevy of ambient temperature readings.  If there's one particular temperature that we should provide, we can arrange for that to be generated by the EFD Transformation.

            I don't think coadds are or need to be Observations for CAOM at all.

            Show
            ktl Kian-Tat Lim added a comment - "Intent" in terms of which scheduler "proposal" "asked" for the image to be taken will come from the EFD, as will a bevy of ambient temperature readings.  If there's one particular temperature that we should provide, we can arrange for that to be generated by the EFD Transformation. I don't think coadds are or need to be Observations for CAOM at all.
            Hide
            jbosch Jim Bosch added a comment -

            Create View Observation as Visit JOIN Exposure JOIN Dataset JOIN DatasestCollection

            Do I understand correctly that to create a view for SimpleObservation, you'd have no GROUP BY here, while for CompositeObservation, you would GROUP BY Visit and somehow aggregate snap-level quantities from Exposure?  If so, I don't actually explain why Exposure can't be an Observation, but I'm willing to accept that if it means I don't need to educate myself about CAOM2 in detail now.

            Would it be a problem if we didn't have a CAOM2 representation of calibration Exposures?  I was intending to create Visits for those.

            It would be necessary to define how the Visits/Snaps join to the EFD to determine this.

            The Gen3 schema is permitted to also include additional per-Visit or per-Exposure tables that are specific to a particular Camera.  I think I'd advocate for putting LSST-specific values there, and populating those tables from the EFD at or around raw data ingest.

             

            Show
            jbosch Jim Bosch added a comment - Create View Observation as Visit JOIN Exposure JOIN Dataset JOIN DatasestCollection Do I understand correctly that to create a view for SimpleObservation, you'd have no GROUP BY here, while for CompositeObservation, you would GROUP BY Visit and somehow aggregate snap-level quantities from Exposure?  If so, I don't actually explain why Exposure can't be an Observation, but I'm willing to accept that if it means I don't need to educate myself about CAOM2 in detail now. Would it be a problem if we didn't have a CAOM2 representation of calibration Exposures?  I was intending to create Visits for those. It would be necessary to define how the Visits/Snaps join to the EFD to determine this. The Gen3 schema is permitted to also include additional per-Visit or per-Exposure tables that are specific to a particular Camera.  I think I'd advocate for putting LSST-specific values there, and populating those tables from the EFD at or around raw data ingest.  
            Hide
            tjenness Tim Jenness added a comment -

            I'm a little confused by the conversation here and I suggest that Brian Van Klaveren contacts Pat Dowler to clarify some concepts. In my previous telescope we used CAOM2 and it was fully able to represent multiple exposures within a single observation (even if they were at different wavelengths), data products derived from a single observation (e.g a PVI of a visit), and coadds combining multiple observations.

            Show
            tjenness Tim Jenness added a comment - I'm a little confused by the conversation here and I suggest that Brian Van Klaveren contacts Pat Dowler to clarify some concepts. In my previous telescope we used CAOM2 and it was fully able to represent multiple exposures within a single observation (even if they were at different wavelengths), data products derived from a single observation (e.g a PVI of a visit), and coadds combining multiple observations.
            Hide
            tjenness Tim Jenness added a comment -

            Gregory Dubois-Felsmann you may have missed this ticket in your ObsCore musings.

            Show
            tjenness Tim Jenness added a comment - Gregory Dubois-Felsmann you may have missed this ticket in your ObsCore musings.
            Hide
            tjenness Tim Jenness added a comment -

            Gregory Dubois-Felsmann do you feel that the ObsCore compliance of raws in gen3 registry is sufficient to allow this ticket to be shut down?

            Show
            tjenness Tim Jenness added a comment - Gregory Dubois-Felsmann do you feel that the ObsCore compliance of raws in gen3 registry is sufficient to allow this ticket to be shut down?

              People

              Assignee:
              gpdf Gregory Dubois-Felsmann
              Reporter:
              ktl Kian-Tat Lim
              Watchers:
              Brian Van Klaveren, Gregory Dubois-Felsmann, Jim Bosch, Kian-Tat Lim, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins

                  No builds found.