Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-13690

Write up Gen3 Butler / obs_* package interface sketch

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: butler
    • Labels:
    • Story Points:
      2
    • Sprint:
      BG3_S18_02, BG3_S18_03, BG3_S18_04, BG3_S18_05, BG3_F18_06, BG3_F18_07
    • Team:
      Data Release Production

      Description

      Output will be a confluence page that describes what Gen3 Butler will expect from obs_* packages.  Intended audience is Simon Krughoff and the obs_* package working group.

       

      Page is at https://confluence.lsstcorp.org/display/DM/Gen3+Middleware+Camera+Specialization+Interfaces

        Attachments

          Activity

          Hide
          jbosch Jim Bosch added a comment -

          Simon Krughoff, you represent the intended audience for this page, so I think it makes sense to ask you to review it.  Shall we say that the review includes making sure the document is clear, but does not involve actually resolving the actions within it?

          Show
          jbosch Jim Bosch added a comment - Simon Krughoff , you represent the intended audience for this page, so I think it makes sense to ask you to review it.  Shall we say that the review includes making sure the document is clear, but does not involve actually resolving the actions within it?
          Hide
          jbosch Jim Bosch added a comment -

          Simon Krughoff: ping on a review request.  I'm not sure it still makes sense to think of the obs WG as the target audience for this page, but I imagine you're still the right person to take a first look at it.

           

          Show
          jbosch Jim Bosch added a comment - Simon Krughoff : ping on a review request.  I'm not sure it still makes sense to think of the obs WG as the target audience for this page, but I imagine you're still the right person to take a first look at it.  
          Hide
          krughoff Simon Krughoff added a comment -

          Jim Bosch will do. Sorry for dragging my feet on getting to this.

          Show
          krughoff Simon Krughoff added a comment - Jim Bosch will do. Sorry for dragging my feet on getting to this.
          Hide
          krughoff Simon Krughoff added a comment -

          Jim Bosch very sorry for taking so long to get to this. Comments below:

          • You talk about limiting keys for the DataUnits. Is there a system for validating acceptable keys?
          • Is it true that every DataUnit key can have a tuple of values like patch, or will the keys that accept tuples need to be configured somehow?
          • I'm a little worried about doing away with templates completely since different datasets look different by construction, but I don't fully comprehend the implementation you are suggesting, so I'm ok with seeing how things shape up.
          • I not the tasks for the obs WG. I do think it would be good to convene enough of a WG to, at least, look through these documents.
          • I'm a little worried that it sounds like you are suggesting adding camera info to the raw data repository. I'd hoped any camera information other than the camera name would go in a calibration repository.
          • Re: monotonically increasing exposure ids, I know this would be nice, but it seems like a really easy way to introduce bugs. I.e. how do you check that this requirement is met by a particular dataset. It seems like a better requirement that exposures must supply the observation date and that we have utilities to convert that to a monotonically increasing id.
          • Master calibrations section: second bullet, the first sentence ends in what seems like a fragment.
          • It would be good to make sure acronyms are spelled out in the first usage.
          • I'm a little worried that the concept of a visit is special solely because of the current LSST survey design. Are there other similar concepts used by other surveys that should be called out as special concepts?
          • I made some typographic changes to the page. Hopefully that's ok.
          • I do think it would be good for more than just me to think about the tasks called out in that page. Maybe we could take a couple of hours at the all hands to get the obs WG together to consider those issues and then disband the WG in favor of a more focused implementation group.
          Show
          krughoff Simon Krughoff added a comment - Jim Bosch very sorry for taking so long to get to this. Comments below: You talk about limiting keys for the DataUnits. Is there a system for validating acceptable keys? Is it true that every DataUnit key can have a tuple of values like patch, or will the keys that accept tuples need to be configured somehow? I'm a little worried about doing away with templates completely since different datasets look different by construction, but I don't fully comprehend the implementation you are suggesting, so I'm ok with seeing how things shape up. I not the tasks for the obs WG. I do think it would be good to convene enough of a WG to, at least, look through these documents. I'm a little worried that it sounds like you are suggesting adding camera info to the raw data repository. I'd hoped any camera information other than the camera name would go in a calibration repository. Re: monotonically increasing exposure ids, I know this would be nice, but it seems like a really easy way to introduce bugs. I.e. how do you check that this requirement is met by a particular dataset. It seems like a better requirement that exposures must supply the observation date and that we have utilities to convert that to a monotonically increasing id. Master calibrations section: second bullet, the first sentence ends in what seems like a fragment. It would be good to make sure acronyms are spelled out in the first usage. I'm a little worried that the concept of a visit is special solely because of the current LSST survey design. Are there other similar concepts used by other surveys that should be called out as special concepts? I made some typographic changes to the page. Hopefully that's ok. I do think it would be good for more than just me to think about the tasks called out in that page. Maybe we could take a couple of hours at the all hands to get the obs WG together to consider those issues and then disband the WG in favor of a more focused implementation group.
          Hide
          jbosch Jim Bosch added a comment -

           

          You talk about limiting keys for the DataUnits. Is there a system for validating acceptable keys?

          At what point?  The set of keys is decided at design time (it's effectively RFC-484).  We'll certainly validate keys (and values) when they're passed to get and put.  I could imagine it being useful for programs to be able to check that a set of keys is valid for a particular DatasetType at other points, but I haven't come across any concrete use cases for that so I'm not sure what the API ought to be. I imagine anything along those lines would be easy to do.

           

          Is it true that every DataUnit key can have a tuple of values like patch, or will the keys that accept tuples need to be configured somehow?

          Actually, none of them will be allowed to have tuple values.  Patches will be switched to a single sequential integer (ala RFC-365).  In fact, the value for a particular key will at some level be strongly typed, because they'll appear in a SQL schema.  But we also will have separate cell_x and cell_y integers fields in the Patch table that would allow something very much like the old indexes to be used in expressions.

           

          I'm a little worried about doing away with templates completely since different datasets look different by construction, but I don't fully comprehend the implementation you are suggesting, so I'm ok with seeing how things shape up.

          We aren't getting rid of them completely - the main (POSIX) Datastore will still use templates in put; we'll just then put the full filename in the database and use a query to look it up from the data ID when reading, so there will be no re-insertion of the data ID values into the template in get.

           

          I note the tasks for the obs WG. I do think it would be good to convene enough of a WG to, at least, look through these documents.

          I do think it would be good for more than just me to think about the tasks called out in that page. Maybe we could take a couple of hours at the all hands to get the obs WG together to consider those issues and then disband the WG in favor of a more focused implementation group.

          Agreed, though I need to get moving on at least some of these tasks before then, so it's likely there will be at least placeholder interfaces/objects in the works by the time the WG has a chance to take a look.  And the various "review DMTN-073" tasks may be better done under the auspices of the more structured review of that document that I believe Fritz Mueller will be organizing, though of course the Obs WG very much includes many of the most important people to look at that.

           

          I'm a little worried that it sounds like you are suggesting adding camera info to the raw data repository. I'd hoped any camera information other than the camera name would go in a calibration repository.

          The camera information would definitely go into the Gen3 equivalent of a calibration repository, but there won't be such a strong boundary between calibration repositories and raw data repositories in Gen3; instead there will be one master repository that includes raw data, multiples sets of calibrations, and many sets of processing outputs.  Essentially, because Gen3 has a more general system for expressing many-to-many relationships between datasets, the relationship between calibrations and raw data is no longer as special as it once was, so declaring the set of calibrations you want to use in a processing ru will be very similar to declaring a set of intermediate main-pipeline outputs you want to start from.

           

          Re: monotonically increasing exposure ids, I know this would be nice, but it seems like a really easy way to introduce bugs. I.e. how do you check that this requirement is met by a particular dataset. It seems like a better requirement that exposures must supply the observation date and that we have utilities to convert that to a monotonically increasing id.

          We will definitely have the observation date, too, but I'd very much like to minimize having multiple integer IDs for the same exposure (i.e. by inventing our own in addition to having an externally meaningful one).  So I think the alternative would be to just use timestamps directly for the raw->calibration lookup.  That feels less "clean" than using monotonic integer ranges, but I don't have a really concrete argument for why it'd be worse - though Robert Lupton might; this was originally his idea.

           

           

          I'm a little worried that the concept of a visit is special solely because of the current LSST survey design. Are there other similar concepts used by other surveys that should be called out as special concepts?

          Visit very much is special because of the current LSST survey design, but happily it just represents a generalization of what other similar surveys are doing.  The only other surveys I can think of at the moment that involve core concepts that I think our data model doesn't capture well are SDSS and Gaia, but I think those really are rather unusual features (drift-scan in very long stripes, simultaneous observations with a rigid angle between them), and that it's not in our interest to try to include them in our data model.  The same goes for spectrographic data - I don't think PFS will be able to get away with using our set of DataUnits, for example.  But it should be possible to use the rest of the system with our own DataUnit schema, and that's what I'd expect any effort to apply our code to fundamentally different observations to do.

           

          I made some typographic changes to the page. Hopefully that's ok.

           

          Certainly, thanks!

           

          I'm closing this ticket now, but I'm certainly happy to continue the conversation (either at LSST2018 or earlier).

          Show
          jbosch Jim Bosch added a comment -   You talk about limiting keys for the DataUnits. Is there a system for validating acceptable keys? At what point?  The set of keys is decided at design time (it's effectively RFC-484 ).  We'll certainly validate keys (and values) when they're passed to get and put .  I could imagine it being useful for programs to be able to check that a set of keys is valid for a particular DatasetType at other points, but I haven't come across any concrete use cases for that so I'm not sure what the API ought to be. I imagine anything along those lines would be easy to do.   Is it true that every DataUnit key can have a tuple of values like patch, or will the keys that accept tuples need to be configured somehow? Actually, none of them will be allowed to have tuple values.  Patches will be switched to a single sequential integer (ala RFC-365 ).  In fact, the value for a particular key will at some level be strongly typed, because they'll appear in a SQL schema.  But we also will have separate cell_x and cell_y integers fields in the Patch table that would allow something very much like the old indexes to be used in expressions.   I'm a little worried about doing away with templates completely since different datasets look different by construction, but I don't fully comprehend the implementation you are suggesting, so I'm ok with seeing how things shape up. We aren't getting rid of them completely - the main (POSIX) Datastore will still use templates in put ; we'll just then put the full filename in the database and use a query to look it up from the data ID when reading, so there will be no re-insertion of the data ID values into the template in get .   I note the tasks for the obs WG. I do think it would be good to convene enough of a WG to, at least, look through these documents. I do think it would be good for more than just me to think about the tasks called out in that page. Maybe we could take a couple of hours at the all hands to get the obs WG together to consider those issues and then disband the WG in favor of a more focused implementation group. Agreed, though I need to get moving on at least some of these tasks before then, so it's likely there will be at least placeholder interfaces/objects in the works by the time the WG has a chance to take a look.  And the various "review DMTN-073" tasks may be better done under the auspices of the more structured review of that document that I believe Fritz Mueller will be organizing, though of course the Obs WG very much includes many of the most important people to look at that.   I'm a little worried that it sounds like you are suggesting adding camera info to the raw data repository. I'd hoped any camera information other than the camera name would go in a calibration repository. The camera information would definitely go into the Gen3 equivalent of a calibration repository, but there won't be such a strong boundary between calibration repositories and raw data repositories in Gen3; instead there will be one master repository that includes raw data, multiples sets of calibrations, and many sets of processing outputs.  Essentially, because Gen3 has a more general system for expressing many-to-many relationships between datasets, the relationship between calibrations and raw data is no longer as special as it once was, so declaring the set of calibrations you want to use in a processing ru will be very similar to declaring a set of intermediate main-pipeline outputs you want to start from.   Re: monotonically increasing exposure ids, I know this would be nice, but it seems like a really easy way to introduce bugs. I.e. how do you check that this requirement is met by a particular dataset. It seems like a better requirement that exposures must supply the observation date and that we have utilities to convert that to a monotonically increasing id. We will definitely have the observation date, too, but I'd very much like to minimize having multiple integer IDs for the same exposure (i.e. by inventing our own in addition to having an externally meaningful one).  So I think the alternative would be to just use timestamps directly for the raw->calibration lookup.  That feels less "clean" than using monotonic integer ranges, but I don't have a really concrete argument for why it'd be worse - though Robert Lupton might; this was originally his idea.     I'm a little worried that the concept of a visit is special solely because of the current LSST survey design. Are there other similar concepts used by other surveys that should be called out as special concepts? Visit very much is special because of the current LSST survey design, but happily it just represents a generalization of what other similar surveys are doing.  The only other surveys I can think of at the moment that involve core concepts that I think our data model doesn't capture well are SDSS and Gaia, but I think those really are rather unusual features (drift-scan in very long stripes, simultaneous observations with a rigid angle between them), and that it's not in our interest to try to include them in our data model.  The same goes for spectrographic data - I don't think PFS will be able to get away with using our set of DataUnits, for example.  But it should be possible to use the rest of the system with our own DataUnit schema, and that's what I'd expect any effort to apply our code to fundamentally different observations to do.   I made some typographic changes to the page. Hopefully that's ok.   Certainly, thanks!   I'm closing this ticket now, but I'm certainly happy to continue the conversation (either at LSST2018 or earlier).
          Hide
          krughoff Simon Krughoff added a comment -

          Great. Sounds good. I suspect LSST2018 is the earliest you can expect
          focused effort from me.

          Show
          krughoff Simon Krughoff added a comment - Great. Sounds good. I suspect LSST2018 is the earliest you can expect focused effort from me.

            People

            Assignee:
            jbosch Jim Bosch
            Reporter:
            jbosch Jim Bosch
            Reviewers:
            Simon Krughoff
            Watchers:
            Iain Goodenow, Jim Bosch, Simon Krughoff
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: