Fix Version/s: None
makeCoaddTempExp's lowest-level work method createTempExp interacts with disk via the Butler. When building coadds in Spark, or example, this required us to re-write an implementation of createTempExp instead of reusing the stack. createTempExp currently takes a list of dataRefs corresponding to the calexps that overlap the patch for a particular visit. These reads could be pushed to the higher-level calling method at the expense of memory.
Making createTempExp more reusable would require either:
a) Making createTempExp able to accept either a list of dataRefs or just a list of calexps. This actually appears to be the intent of the docstring writer). <-- naively, I suspect this will win.
b) Holding 4 full calexps in memory (or 4 partial calexps in memory)
c) Easily reproducible container that can interact with the disk, database etc...
This ticket involve investigating the computational feasibility, and implementing best option.
I made the same pragmatic comment to Yusra AlSayyad when we chatted about this. In some cases (astrometric catalogues; coadds) we may need to provide some sort of proxy object that can optionally be passed to the run method. It is possible that we'll come up with some examples where we need to permit an exception to the "no butlers/dataRefs in run() calls" rule (upon decision by competent authorities)
I believe that's actually exactly what we do in the new reference catalog system Simon Krughoff wrote: we pass a proxy object down into regular Tasks that's currently implemented by holding onto a Butler, but the interface of that object doesn't mandate an implementation that uses the Butler. Of course, because we don't yet have an implementation that doesn't use the Butler, Tasks that need a reference catalog still can't be used in a context where there's no Butler available at all.
That's right. The calibration tasks need to read data from disk. Our mechanism for I/O abstraction is the butler. Thus, the calibration tasks need an object that can interact with the butler.
makeCoaddTempExp and assembleCoadd are both on my list for rewrites. In particular, I'm concerned about the front-end (choosing inputs for a particular patch), but there's a lot of cleanup to do all over. I think makeCoaddTempExp suffers a bit from the attempt to make it like coaddition (inherits from CoaddBaseTask). Note that coaddition simply cannot take a list of calexps because they won't all fit in memory, but you can get away with that in makeCoaddTempExp because the number of inputs is limited and small.
Superseded by DM-17208 : https://github.com/lsst/pipe_tasks/commit/62e839c2e8686beb4b9883a710c8e7fb8af9ab5b
Coaddition is one of several high-level algorithms we'll have for which the I/O is somewhat intrinsic to the algorithm, and I worry that trying to make these usable without a butler really amounts to circumventing the butler as the abstraction layer for all I/O. It seems like it might be better to write an alternate butler backend, or if the butler isn't mature enough to support that yet, mock up some DataRef-like objects that we can pass to the task (maybe this is something like your option C).