Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-8703

Decouple makeCoaddTempExpTask.createTempExp() from Butler

    XMLWordPrintable

    Details

      Description

      makeCoaddTempExp's lowest-level work method createTempExp interacts with disk via the Butler. When building coadds in Spark, or example, this required us to re-write an implementation of createTempExp instead of reusing the stack. createTempExp currently takes a list of dataRefs corresponding to the calexps that overlap the patch for a particular visit. These reads could be pushed to the higher-level calling method at the expense of memory.

      Making createTempExp more reusable would require either:
      a) Making createTempExp able to accept either a list of dataRefs or just a list of calexps. This actually appears to be the intent of the docstring writer). <-- naively, I suspect this will win.
      b) Holding 4 full calexps in memory (or 4 partial calexps in memory)
      c) Easily reproducible container that can interact with the disk, database etc...

      This ticket involve investigating the computational feasibility, and implementing best option.

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            Coaddition is one of several high-level algorithms we'll have for which the I/O is somewhat intrinsic to the algorithm, and I worry that trying to make these usable without a butler really amounts to circumventing the butler as the abstraction layer for all I/O. It seems like it might be better to write an alternate butler backend, or if the butler isn't mature enough to support that yet, mock up some DataRef-like objects that we can pass to the task (maybe this is something like your option C).

            Show
            jbosch Jim Bosch added a comment - Coaddition is one of several high-level algorithms we'll have for which the I/O is somewhat intrinsic to the algorithm, and I worry that trying to make these usable without a butler really amounts to circumventing the butler as the abstraction layer for all I/O. It seems like it might be better to write an alternate butler backend, or if the butler isn't mature enough to support that yet, mock up some DataRef -like objects that we can pass to the task (maybe this is something like your option C).
            Hide
            rhl Robert Lupton added a comment -

            I made the same pragmatic comment to Yusra AlSayyad when we chatted about this. In some cases (astrometric catalogues; coadds) we may need to provide some sort of proxy object that can optionally be passed to the run method. It is possible that we'll come up with some examples where we need to permit an exception to the "no butlers/dataRefs in run() calls" rule (upon decision by competent authorities)

            Show
            rhl Robert Lupton added a comment - I made the same pragmatic comment to Yusra AlSayyad when we chatted about this. In some cases (astrometric catalogues; coadds) we may need to provide some sort of proxy object that can optionally be passed to the run method. It is possible that we'll come up with some examples where we need to permit an exception to the "no butlers/dataRefs in run() calls" rule (upon decision by competent authorities)
            Hide
            jbosch Jim Bosch added a comment -

            I believe that's actually exactly what we do in the new reference catalog system Simon Krughoff wrote: we pass a proxy object down into regular Tasks that's currently implemented by holding onto a Butler, but the interface of that object doesn't mandate an implementation that uses the Butler. Of course, because we don't yet have an implementation that doesn't use the Butler, Tasks that need a reference catalog still can't be used in a context where there's no Butler available at all.

            Show
            jbosch Jim Bosch added a comment - I believe that's actually exactly what we do in the new reference catalog system Simon Krughoff wrote: we pass a proxy object down into regular Tasks that's currently implemented by holding onto a Butler, but the interface of that object doesn't mandate an implementation that uses the Butler. Of course, because we don't yet have an implementation that doesn't use the Butler, Tasks that need a reference catalog still can't be used in a context where there's no Butler available at all.
            Hide
            krughoff Simon Krughoff added a comment -

            That's right. The calibration tasks need to read data from disk. Our mechanism for I/O abstraction is the butler. Thus, the calibration tasks need an object that can interact with the butler.

            Show
            krughoff Simon Krughoff added a comment - That's right. The calibration tasks need to read data from disk. Our mechanism for I/O abstraction is the butler. Thus, the calibration tasks need an object that can interact with the butler.
            Hide
            price Paul Price added a comment -

            makeCoaddTempExp and assembleCoadd are both on my list for rewrites. In particular, I'm concerned about the front-end (choosing inputs for a particular patch), but there's a lot of cleanup to do all over. I think makeCoaddTempExp suffers a bit from the attempt to make it like coaddition (inherits from CoaddBaseTask). Note that coaddition simply cannot take a list of calexps because they won't all fit in memory, but you can get away with that in makeCoaddTempExp because the number of inputs is limited and small.

            Show
            price Paul Price added a comment - makeCoaddTempExp and assembleCoadd are both on my list for rewrites. In particular, I'm concerned about the front-end (choosing inputs for a particular patch), but there's a lot of cleanup to do all over. I think makeCoaddTempExp suffers a bit from the attempt to make it like coaddition (inherits from CoaddBaseTask ). Note that coaddition simply cannot take a list of calexps because they won't all fit in memory, but you can get away with that in makeCoaddTempExp because the number of inputs is limited and small.
            Show
            yusra Yusra AlSayyad added a comment - - edited Superseded by DM-17208 : https://github.com/lsst/pipe_tasks/commit/62e839c2e8686beb4b9883a710c8e7fb8af9ab5b

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              yusra Yusra AlSayyad
              Watchers:
              Jim Bosch, John Swinbank, Paul Price, Robert Lupton, Simon Krughoff, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.