Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-33702

Add group dimension to butler schema

    XMLWordPrintable

    Details

    • Team:
      Architecture
    • Urgent?:
      No

      Description

      A "visit" is defined as being an on-sky science observation that includes a sky region in its definition. This is a very restricted definition of grouping and a new concept is needed.

      During observing a "visit" is defined by a single call to the takeImages command. Independently each observing script is allocated a group name by the script queue. This group name is stored in the headers and will be (effectively) fixed for all exposures taken by that single execution of the script (there is an API to allow a script to modify the group name to make subgroups).

      Allowing a group dimension would let us collect DDF observations where we take multiple visits in a single script and would allow calibration observations to be grouped (currently they can not be grouped and people must try to remember sequence number ranges).

      It may be that "group_system" is needed to allow people to modify grouping schemes.

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment - - edited

            I am not at all convinced that we want this.  We can already use groups to collect DDF observations and group calibration observations (or will be able to once group_id ceases to imply visit), and we have not identified a need for a group data ID key to represent output datasets that aggregate those kinds of groups (and if we did, I think a group of visits is more likely to be useful than a group of exposures).

            Show
            jbosch Jim Bosch added a comment - - edited I am not at all convinced that we want this.  We can already use groups to collect DDF observations and group calibration observations (or will be able to once group_id ceases to imply visit), and we have not identified a need for a group data ID key to represent output datasets that aggregate those kinds of groups (and if we did, I think a group of visits is more likely to be useful than a group of exposures).
            Hide
            tjenness Tim Jenness added a comment -

            I think at issue is how we want to deal with the grouping when processing data. If you've done a focus sweep then surely you want a way to ensure that the 5, say, exposures at different focus offsets are given to the relevant task as one set of inputs. If we want to re-process all the focus data from a night you'd want to use a query like "where day_obs = YYYYMMDD and observation_type = focus" and have it do the right thing. You don't want to have to go through and work out that observations 10 to 14 are one focus and 15 to 19 are another and submit them as separate jobs. I'm coming at this from my experience processing data from other telescopes. We already have sufficient information in the headers to do the right thing so I don't really understand why we'd not want to use it. Are we really proposing an external tool that goes through the exposure records itself and calculates all the grouping and then submits distinct jobs?

            Show
            tjenness Tim Jenness added a comment - I think at issue is how we want to deal with the grouping when processing data. If you've done a focus sweep then surely you want a way to ensure that the 5, say, exposures at different focus offsets are given to the relevant task as one set of inputs. If we want to re-process all the focus data from a night you'd want to use a query like "where day_obs = YYYYMMDD and observation_type = focus" and have it do the right thing. You don't want to have to go through and work out that observations 10 to 14 are one focus and 15 to 19 are another and submit them as separate jobs. I'm coming at this from my experience processing data from other telescopes. We already have sufficient information in the headers to do the right thing so I don't really understand why we'd not want to use it. Are we really proposing an external tool that goes through the exposure records itself and calculates all the grouping and then submits distinct jobs?
            Hide
            jbosch Jim Bosch added a comment -

            I am wary of defining many many different group_systems for all possible groups a visit might belong to, and storing those in the database in dimension tables.  I think that works for snap/visit membership because in practice the number of possible membership definitions is extremely small, and it's really useful for visit membership definitions to mean the same thing across collections.

            For other kinds of groups, like focus sweeps, I think the tendency is in the opposite direction: we don't want somebody to have to insert dimension records just to define a new group just to combine a bunch of observations that they want to try processing together one time.  You're right that we don't have a good way to give the outputs of that kind of processing a data ID, but I'd like to solve that via new functionality related to DM-33751 (more flexible data IDs) + DM-33621 (reading a list of data IDs from file) - basically we'd have the user pass group memberships in via a file or something else external, and we'd use that in QG gen without trying to make the new data ID key for the outputs mean anything consistent outside that particular collection.

            That, of course, is how Robert Lupton has always wished we had done visits, and while I'm still confident that we are doing visits the right way (I think we'd go crazy if a visit ID could mean different things in different collections, so I'm glad the DB enforces that it can't), I think he's right for most non-visit groups.  And that'll be a big subject in that campaign-management technote I need to write.

            Show
            jbosch Jim Bosch added a comment - I am wary of defining many many different group_systems for all possible groups a visit might belong to, and storing those in the database in dimension tables.  I think that works for snap/visit membership because in practice the number of possible membership definitions is extremely small, and it's really useful for visit membership definitions to mean the same thing across collections. For other kinds of groups, like focus sweeps, I think the tendency is in the opposite direction: we don't want somebody to have to insert dimension records just to define a new group just to combine a bunch of observations that they want to try processing together one time.  You're right that we don't have a good way to give the outputs of that kind of processing a data ID, but I'd like to solve that via new functionality related to DM-33751 (more flexible data IDs) + DM-33621 (reading a list of data IDs from file) - basically we'd have the user pass group memberships in via a file or something else external, and we'd use that in QG gen without trying to make the new data ID key for the outputs mean anything consistent outside that particular collection. That, of course, is how Robert Lupton has always wished we had done visits, and while I'm still confident that we are doing visits the right way (I think we'd go crazy if a visit ID could mean different things in different collections, so I'm glad the DB enforces that it can't), I think he's right for most non-visit groups.  And that'll be a big subject in that campaign-management technote I need to write.

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              tjenness Tim Jenness
              Watchers:
              Jim Bosch, Kian-Tat Lim, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins

                  No builds found.