Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-12549

ap_pipe must call AssociationTask in a reproducible order

    Details

      Description

      In order to let ap_verify produce reproducible results, ap.pipe.doAssociation must always submit visits to the association database in the same order. The specific order should not matter in terms of the scientific validity of the associations (Chris Morrison, priv. comm.), so it can be chosen to optimize the code.

      Note that this ticket is meaningless until DM-11372 or DM-12314 is resolved, since currently the order in which visits are handled must be externally imposed (see DM-12535).

        Attachments

          Issue Links

            Activity

            Hide
            ctslater Colin Slater added a comment -

            You can get parallelism and reproducibility both by parallelizing CCDs within visits but processing visits sequentially in time-order.

            Show
            ctslater Colin Slater added a comment - You can get parallelism and reproducibility both by parallelizing CCDs within visits but processing visits sequentially in time-order.
            Hide
            krzys Krzysztof Findeisen added a comment -

            I assume "visit" was being used broadly in the sense of "observation", and that a nondeterministic CCD order will still give minor catalog changes?

            Show
            krzys Krzysztof Findeisen added a comment - I assume "visit" was being used broadly in the sense of "observation", and that a nondeterministic CCD order will still give minor catalog changes?
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            Possibly relevant are the new DM requirements DMS-REQ-0388, DMS-REQ-0389, and DMS-REQ-0390. These imply that we may need to worry about processing order in operations as well as in CI; see DM-18684.

            Show
            krzys Krzysztof Findeisen added a comment - - edited Possibly relevant are the new DM requirements DMS-REQ-0388 , DMS-REQ-0389 , and DMS-REQ-0390 . These imply that we may need to worry about processing order in operations as well as in CI; see DM-18684 .
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            Following discussion on #dm-middleware, we have the following options for doing this in Gen 3:

            1. Have the Jenkins script execute pipetask separately for each visit (can be done now; requires hard-coding the visit IDs for each dataset)
            2. Generate a quantum graph for the full dataset, then use command-line tools to break it up into units that can be fed into serial pipetask calls (requires complex scripting, but avoids option 1's need for a "playlist" for each dataset)
            3. Implement our own activator that organizes work by visit (Andy Salnikov says this is doable by overriding the scheduler, but that the current activator code isn't quite extensible enough; see DM-24370)
            4. instrument the pipeline (DM-21939) with a virtual dataset and/or task that forces association to see the data as "ready" in a specific order

            We only need to constrain the order in which DiaPipelineTask's quanta are called, and not any of their dependencies; options 3 and 4 may be able to take advantage of this.

            Show
            krzys Krzysztof Findeisen added a comment - - edited Following discussion on #dm-middleware , we have the following options for doing this in Gen 3: Have the Jenkins script execute pipetask separately for each visit (can be done now; requires hard-coding the visit IDs for each dataset) Generate a quantum graph for the full dataset, then use command-line tools to break it up into units that can be fed into serial pipetask calls (requires complex scripting, but avoids option 1's need for a "playlist" for each dataset) Implement our own activator that organizes work by visit ( Andy Salnikov says this is doable by overriding the scheduler, but that the current activator code isn't quite extensible enough; see DM-24370 ) instrument the pipeline ( DM-21939 ) with a virtual dataset and/or task that forces association to see the data as "ready" in a specific order We only need to constrain the order in which DiaPipelineTask 's quanta are called, and not any of their dependencies; options 3 and 4 may be able to take advantage of this.
            Hide
            krzys Krzysztof Findeisen added a comment -

            DM-24370 provides a post-processing script for the quanta; all we should need to do is create a subclass that's configured appropriately for ap_verify, then pass it to pipetask.

            Show
            krzys Krzysztof Findeisen added a comment - DM-24370 provides a post-processing script for the quanta; all we should need to do is create a subclass that's configured appropriately for ap_verify , then pass it to pipetask .

              People

              • Assignee:
                Unassigned
                Reporter:
                krzys Krzysztof Findeisen
                Watchers:
                Chris Morrison, Colin Slater, Eric Bellm, John Swinbank, Krzysztof Findeisen
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Summary Panel