Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-2191

Define command line tasks for pre-ingest transformation

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      DM-1903 provided a command line task which would transform a src catalogue into calibrated form. Here, we build on that to provide command line tasks for all source catalogues which will need to be ingested; will include at least deepCoadd_src, goodSeeingCoadd_src, chiSquaredCoadd_src.

        Attachments

          Issue Links

            Activity

            Hide
            swinbank John Swinbank added a comment -

            Jim, I am assigning this review to you because I'd actually appreciate a sanity check on whether this provides the functionality required. I'm worried about overloading you with too much reviewing though, so if you'd like to just give the design a quick check and then, if you're happy, hand it back to me to find somebody to do a detailed review, that would be fine. (That said, it's not an enormous amount of code, so you might prefer to just do it all yourself.)

            The goal here was to provide a means of transforming "all" the different types of source. I did a brief audit of some obs_ repositories, and discovered quite a variety of different _src types. I am reluctant to attempt to hard-code transformation command line tasks for all of them: partly it seems like a lot of copy & paste effort, and partly because I suspect that several of them are very infrequently (...maybe never) used.

            What I've done instead is provide a function that can generate an appropriate task on demand and demonstrated its use with obs_test as a proof of concept. It is then easy to provide tasks for other cameras as & when needed.

            Show
            swinbank John Swinbank added a comment - Jim, I am assigning this review to you because I'd actually appreciate a sanity check on whether this provides the functionality required. I'm worried about overloading you with too much reviewing though, so if you'd like to just give the design a quick check and then, if you're happy, hand it back to me to find somebody to do a detailed review, that would be fine. (That said, it's not an enormous amount of code, so you might prefer to just do it all yourself.) The goal here was to provide a means of transforming "all" the different types of source. I did a brief audit of some obs_ repositories, and discovered quite a variety of different _src types. I am reluctant to attempt to hard-code transformation command line tasks for all of them: partly it seems like a lot of copy & paste effort, and partly because I suspect that several of them are very infrequently (...maybe never) used. What I've done instead is provide a function that can generate an appropriate task on demand and demonstrated its use with obs_test as a proof of concept. It is then easy to provide tasks for other cameras as & when needed.
            Hide
            swinbank John Swinbank added a comment -

            Should have added – changes are on tickets/DM-2191 in obs_test and pipe_tasks.

            Show
            swinbank John Swinbank added a comment - Should have added – changes are on tickets/ DM-2191 in obs_test and pipe_tasks .
            Hide
            swinbank John Swinbank added a comment -

            Actually, wait – I want to give this a little bit more thought before it goes for review. I'll get back to you shortly.

            Show
            swinbank John Swinbank added a comment - Actually, wait – I want to give this a little bit more thought before it goes for review. I'll get back to you shortly.
            Hide
            swinbank John Swinbank added a comment -

            Right, would you mind having a look now, please? Same comments as above still apply.

            Show
            swinbank John Swinbank added a comment - Right, would you mind having a look now, please? Same comments as above still apply.
            Hide
            jbosch Jim Bosch added a comment -

            I've completed my review on the GitHub PR for pipe_tasks. As you've probably seen, I'm afraid I've requested some rather major changes.

            Show
            jbosch Jim Bosch added a comment - I've completed my review on the GitHub PR for pipe_tasks. As you've probably seen, I'm afraid I've requested some rather major changes.
            Hide
            swinbank John Swinbank added a comment - - edited

            Thanks for the review!

            In fact, I don't think the changes you've suggested are too onerous – at least, it means reworking almost all the (non-test) lines of code here, but there weren't very many lines to begin with, so it could be worse.

            However, it makes me worry a bit that this story is still under-specified. I've not yet carefully digested all your comments, but quite a lot seem to be along the lines of "this does not take account of ${MAGIC} that happens in this dark corner of ...", and at least in my head the various dark corners are not clearly enumerated: there are a lot of variations of XXXcoadd lurking in the codebase which aren't documented anywhere.

            I'll try reworking quickly following your suggestions and see how it goes, but I think doing this properly might involve a considerably wider rethink than the one you are suggesting.

            Show
            swinbank John Swinbank added a comment - - edited Thanks for the review! In fact, I don't think the changes you've suggested are too onerous – at least, it means reworking almost all the (non-test) lines of code here, but there weren't very many lines to begin with, so it could be worse. However, it makes me worry a bit that this story is still under-specified. I've not yet carefully digested all your comments, but quite a lot seem to be along the lines of "this does not take account of ${MAGIC} that happens in this dark corner of ...", and at least in my head the various dark corners are not clearly enumerated: there are a lot of variations of XXXcoadd lurking in the codebase which aren't documented anywhere. I'll try reworking quickly following your suggestions and see how it goes, but I think doing this properly might involve a considerably wider rethink than the one you are suggesting.
            Hide
            jbosch Jim Bosch added a comment - - edited

            The measurement-inside-calibrate is the only one that really has me worried, but let's ignore that for now. It's enough of a special case that I'd be okay writing a completely separate CmdLineTask for that.

            For everything else, if the derived class task has the opportunity to redefine the various strings via an instance method that has access to the input task's config object (as in my proposal), I think that gives us the flexibility we need to handle essentially all the other weirdness that I know of, while still maximizing code reuse.

            Show
            jbosch Jim Bosch added a comment - - edited The measurement-inside-calibrate is the only one that really has me worried, but let's ignore that for now. It's enough of a special case that I'd be okay writing a completely separate CmdLineTask for that. For everything else, if the derived class task has the opportunity to redefine the various strings via an instance method that has access to the input task's config object (as in my proposal), I think that gives us the flexibility we need to handle essentially all the other weirdness that I know of, while still maximizing code reuse.
            Hide
            swinbank John Swinbank added a comment -

            Jim Bosch – I fixed things up to be more like your recommendations on my flight yesterday. Unfortunately I won't get chance to finish polishing this up for the next few days, so I'm not putting it out for review, but if you have chance to comment as to whether tickets/DM-2191a on pipe_tasks would be more to your taste that would be very useful.

            On a related note: I would like to produce some tests which actually demonstrate the operation of this code on coadds. However, the contents of obs_test don't seem up to the task, and pipe_task's own testCoadds.py does a lot of magic with its own custom mapper classes – if it's worthwhile investing the time to learn what's going on there I will, but you've warned me off sinking too much time into understanding Butler-related issues in the past. I wondered if you could suggest any alternative routes to quickly put together the equivalent of RunTransformTestCase for coadds.

            Show
            swinbank John Swinbank added a comment - Jim Bosch – I fixed things up to be more like your recommendations on my flight yesterday. Unfortunately I won't get chance to finish polishing this up for the next few days, so I'm not putting it out for review, but if you have chance to comment as to whether tickets/DM-2191a on pipe_tasks would be more to your taste that would be very useful. On a related note: I would like to produce some tests which actually demonstrate the operation of this code on coadds. However, the contents of obs_test don't seem up to the task, and pipe_task 's own testCoadds.py does a lot of magic with its own custom mapper classes – if it's worthwhile investing the time to learn what's going on there I will, but you've warned me off sinking too much time into understanding Butler-related issues in the past. I wondered if you could suggest any alternative routes to quickly put together the equivalent of RunTransformTestCase for coadds.
            Hide
            swinbank John Swinbank added a comment -

            With apologies for the delay, a reworked version of this is now available on tickets/DM-2191 in obs_test and pipe_tasks. Are you happier with this approach?

            Show
            swinbank John Swinbank added a comment - With apologies for the delay, a reworked version of this is now available on tickets/ DM-2191 in obs_test and pipe_tasks . Are you happier with this approach?
            Hide
            jbosch Jim Bosch added a comment -

            This approach looks great, and modulo the minor comments on the new GitHub PR, I think it's ready to merge.

            There are a couple of remaining concerns I don't think should delay this issue, but are worth bringing up now as we may want to spawn some new issues now:

            • We might need a top-level transform task for the "icSrc" dataset, which is produced by CalibrateTask. I'm worried that the tasks here aren't sufficiently general to handle that, but I don't think we should generalize them further to handle it - if that would be necessary, it'd probably be better to create a new CmdLineTask from scratch, as "icSrc" is a very weird bird that involves a SchemaMapper between measurement and actually writing the files to disk. I'm also anticipating a CalibrateTask rewrite at some point, and we might just want to wait to write a transform CmdLineTask until that happens.
            • The multiband stuff Lauren MacArthur is transferring from the HSC side will need some new transform {{CmdLineTask}}s. I'm pretty confident this system will work for those, too, so it's just a matter of writing a few more specializations. We should create issues in the HSC deblender epic for these.
            Show
            jbosch Jim Bosch added a comment - This approach looks great, and modulo the minor comments on the new GitHub PR, I think it's ready to merge. There are a couple of remaining concerns I don't think should delay this issue, but are worth bringing up now as we may want to spawn some new issues now: We might need a top-level transform task for the "icSrc" dataset, which is produced by CalibrateTask . I'm worried that the tasks here aren't sufficiently general to handle that, but I don't think we should generalize them further to handle it - if that would be necessary, it'd probably be better to create a new CmdLineTask from scratch, as "icSrc" is a very weird bird that involves a SchemaMapper between measurement and actually writing the files to disk. I'm also anticipating a CalibrateTask rewrite at some point, and we might just want to wait to write a transform CmdLineTask until that happens. The multiband stuff Lauren MacArthur is transferring from the HSC side will need some new transform {{CmdLineTask}}s. I'm pretty confident this system will work for those, too, so it's just a matter of writing a few more specializations. We should create issues in the HSC deblender epic for these.
            Hide
            swinbank John Swinbank added a comment -

            Thanks for the review. I've addressed your comments and merged to master.

            I've also created the issues DM-2879 and DM-2880 to cover the points you raise above.

            Show
            swinbank John Swinbank added a comment - Thanks for the review. I've addressed your comments and merged to master. I've also created the issues DM-2879 and DM-2880 to cover the points you raise above.
            Hide
            swinbank John Swinbank added a comment -

            Also bumping SPs to reflect the time spent on this issue.

            Show
            swinbank John Swinbank added a comment - Also bumping SPs to reflect the time spent on this issue.

              People

              Assignee:
              swinbank John Swinbank
              Reporter:
              swinbank John Swinbank
              Reviewers:
              Jim Bosch
              Watchers:
              Jim Bosch, John Swinbank
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: