Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-9447

Process HSC RC data using pybind11 prototype

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      5
    • Sprint:
      DRP S17-4
    • Team:
      Data Release Production

      Description

      Using the prototype pybind11-based stack, process the HSC RC dataset. Work with Pim Schellart [X] to identify and resolve errors.

        Attachments

          Issue Links

            Activity

            Hide
            lauren Lauren MacArthur added a comment - - edited

            One caveat: meas_mosaic has not been pybind11-wrapped (and does include SWIGed C++ code), so this part of the processing run will be skipped.

            Show
            lauren Lauren MacArthur added a comment - - edited One caveat: meas_mosaic has not been pybind11-wrapped (and does include SWIGed C++ code), so this part of the processing run will be skipped.
            Hide
            lauren Lauren MacArthur added a comment -

            Another caveat: meas_modelfit has not yet been pybind11-wrapped so will not be exercised here.

            Show
            lauren Lauren MacArthur added a comment - Another caveat: meas_modelfit has not yet been pybind11-wrapped so will not be exercised here.
            Hide
            lauren Lauren MacArthur added a comment -

            First bump in coaddDriver for HSC-Y COSMOS subset:

            SystemError on tiger-r8c2n1:10633 in reduce: Negative size passed to PyString_FromStringAndSize
            Traceback (most recent call last):
              File "/tigress/HSC/users/lauren/pybind11/lsstsw/stack/Linux64/ctrl_pool/12.1-7-gb57f33e+6/python/lsst/ctrl/pool/pool.py", line 113, in wrapper
                return func(*args, **kwargs)
              File "/tigress/HSC/users/lauren/pybind11/lsstsw/stack/Linux64/ctrl_pool/12.1-7-gb57f33e+6/python/lsst/ctrl/pool/pool.py", line 237, in wrapper
                return func(*args, **kwargs)
              File "/tigress/HSC/users/lauren/pybind11/lsstsw/stack/Linux64/ctrl_pool/12.1-7-gb57f33e+6/python/lsst/ctrl/pool/pool.py", line 747, in reduce
                results = self.comm.gather(None, root=self.root)
              File "MPI/Comm.pyx", line 1281, in mpi4py.MPI.Comm.gather (src/mpi4py.MPI.c:108949)
              File "MPI/msgpickle.pxi", line 664, in mpi4py.MPI.PyMPI_gather (src/mpi4py.MPI.c:47643)
              File "MPI/msgpickle.pxi", line 179, in mpi4py.MPI.Pickle.allocv (src/mpi4py.MPI.c:41800)
              File "MPI/msgpickle.pxi", line 127, in mpi4py.MPI.Pickle.alloc (src/mpi4py.MPI.c:40945)
            SystemError: Negative size passed to PyString_FromStringAndSize
            application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
            

            Does this look like it could be pybind11-related Pim Schellart [X]? Full log is at:
            /tigress/HSC/users/lauren/DM-9447/DM-9447-cosmos_noJunk-y-coaddDriver.o2977141

            Show
            lauren Lauren MacArthur added a comment - First bump in coaddDriver for HSC-Y COSMOS subset: SystemError on tiger-r8c2n1:10633 in reduce: Negative size passed to PyString_FromStringAndSize Traceback (most recent call last): File "/tigress/HSC/users/lauren/pybind11/lsstsw/stack/Linux64/ctrl_pool/12.1-7-gb57f33e+6/python/lsst/ctrl/pool/pool.py", line 113, in wrapper return func(*args, **kwargs) File "/tigress/HSC/users/lauren/pybind11/lsstsw/stack/Linux64/ctrl_pool/12.1-7-gb57f33e+6/python/lsst/ctrl/pool/pool.py", line 237, in wrapper return func(*args, **kwargs) File "/tigress/HSC/users/lauren/pybind11/lsstsw/stack/Linux64/ctrl_pool/12.1-7-gb57f33e+6/python/lsst/ctrl/pool/pool.py", line 747, in reduce results = self.comm.gather(None, root=self.root) File "MPI/Comm.pyx", line 1281, in mpi4py.MPI.Comm.gather (src/mpi4py.MPI.c:108949) File "MPI/msgpickle.pxi", line 664, in mpi4py.MPI.PyMPI_gather (src/mpi4py.MPI.c:47643) File "MPI/msgpickle.pxi", line 179, in mpi4py.MPI.Pickle.allocv (src/mpi4py.MPI.c:41800) File "MPI/msgpickle.pxi", line 127, in mpi4py.MPI.Pickle.alloc (src/mpi4py.MPI.c:40945) SystemError: Negative size passed to PyString_FromStringAndSize application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 Does this look like it could be pybind11-related Pim Schellart [X] ? Full log is at: /tigress/HSC/users/lauren/ DM-9447 / DM-9447 -cosmos_noJunk-y-coaddDriver.o2977141
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment -

            Yes, this looks like pickling is broken for some type. We should find out which type it is and add a tickets for it.

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - Yes, this looks like pickling is broken for some type. We should find out which type it is and add a tickets for it.
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment -

            Actually that is probably not it. This `reduce` seems more like it is part of `map - reduce`.

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - Actually that is probably not it. This `reduce` seems more like it is part of `map - reduce`.
            Hide
            lauren Lauren MacArthur added a comment -

            Indeed, the above may have been a false alarm due to my neglecting to turn off applying uberCal (which we can't do here since we are not running meas_mosaic). I'm rerunning coaddDriver now.

            Show
            lauren Lauren MacArthur added a comment - Indeed, the above may have been a false alarm due to my neglecting to turn off applying uberCal (which we can't do here since we are not running meas_mosaic ). I'm rerunning coaddDriver now.
            Hide
            lauren Lauren MacArthur added a comment - - edited

            I have run the COSMOS subset of the HSC RC dataset with the pybind11 branch (DM-8467) through singleFrameDriver.py. All went smoothly and the outputs look good. Just for interest's sake, I've attached some plots comparing the Gaussian flux measurements of matched catalogs between this run and a recent one of the same dataset on a recent "swig-stack".

            There are stochastic difference that appear when the FLUXMAG0 zeropoint is applied and the sky-stars.png plot reveals they are ccd-specific (e.g. 14, 19, 53, 82, 85, 87, 102). While I'm not sure of the cause [UPDATE: I'm now fairly convinced that this is an artifact of running with processCcd.calibrate.measurePsf.reserveFraction>0.0, and 0.2 is the default for obs_subaru. I have since seen a case where these CCD-to-CCD zeropoint variations disappear when setting the reserve fraction to 0.0 in both runs under comparison], I don't think it's pybind11-related (and would almost certainly be calibrated out with a meas_mosaic/jointcal run.

            In all, I would say this is a success in that the pybind11 port has been exercised thoroughly through single frame processing and passed with flying colours.

            Show
            lauren Lauren MacArthur added a comment - - edited I have run the COSMOS subset of the HSC RC dataset with the pybind11 branch ( DM-8467 ) through singleFrameDriver.py . All went smoothly and the outputs look good. Just for interest's sake, I've attached some plots comparing the Gaussian flux measurements of matched catalogs between this run and a recent one of the same dataset on a recent "swig-stack". There are stochastic difference that appear when the FLUXMAG0 zeropoint is applied and the sky-stars.png plot reveals they are ccd-specific (e.g. 14, 19, 53, 82, 85, 87, 102). While I'm not sure of the cause [ UPDATE: I'm now fairly convinced that this is an artifact of running with processCcd.calibrate.measurePsf.reserveFraction>0.0 , and 0.2 is the default for obs_subaru . I have since seen a case where these CCD-to-CCD zeropoint variations disappear when setting the reserve fraction to 0.0 in both runs under comparison ] , I don't think it's pybind11-related (and would almost certainly be calibrated out with a meas_mosaic / jointcal run. In all, I would say this is a success in that the pybind11 port has been exercised thoroughly through single frame processing and passed with flying colours.
            Hide
            lauren Lauren MacArthur added a comment -

            Back to the MPI pickling error noted above, I just tried running the same dataset with the current weekly set up on tiger here at Princeton, w_2017_8. It is indeed bombing with the same error, so it seems this is an issue related to the main (swigged) stack (and inherited by the pybind11 branch).
            (Note that I tried running coaddDriver.py with the "swig" stack against SFM outputs from the current pybind11 stack as well as those from the "swig" stack run of DM-6816...both bombed.)

            Show
            lauren Lauren MacArthur added a comment - Back to the MPI pickling error noted above, I just tried running the same dataset with the current weekly set up on tiger here at Princeton, w_2017_8 . It is indeed bombing with the same error, so it seems this is an issue related to the main (swigged) stack (and inherited by the pybind11 branch). (Note that I tried running coaddDriver.py with the "swig" stack against SFM outputs from the current pybind11 stack as well as those from the "swig" stack run of DM-6816 ...both bombed.)
            Hide
            lauren Lauren MacArthur added a comment -

            Pim Schellart [X] Shall I go ahead and run multiBandDriver.py but omitting the HSC-Y band data, or would you prefer that the above issue gets resolved first?

            Show
            lauren Lauren MacArthur added a comment - Pim Schellart [X] Shall I go ahead and run multiBandDriver.py but omitting the HSC-Y band data, or would you prefer that the above issue gets resolved first?
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment -

            Yes, please proceed. What we are looking for is no regressions from Swig, not general code problems. Of course please file a separate ticket for the issue you encountered.

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - Yes, please proceed. What we are looking for is no regressions from Swig, not general code problems. Of course please file a separate ticket for the issue you encountered.
            Hide
            lauren Lauren MacArthur added a comment -

            See DM-9541.

            Show
            lauren Lauren MacArthur added a comment - See DM-9541 .
            Hide
            lauren Lauren MacArthur added a comment -

            The COSMOS subset of the HSC RC dataset has successfully run through multiBandDriver.py (omitting the HSC-Y band). The results look sensible. Direct comparisons are moot at this point due to the lack of meas_mosaic calibration and CModel numbers, but I did look at the distributions of Gaussian vs. PSF fluxes and they look very similar.

            The WIDE subset has run through coaddDriver.py. multiBandDriver.py is running now.

            Show
            lauren Lauren MacArthur added a comment - The COSMOS subset of the HSC RC dataset has successfully run through multiBandDriver.py (omitting the HSC-Y band). The results look sensible. Direct comparisons are moot at this point due to the lack of meas_mosaic calibration and CModel numbers, but I did look at the distributions of Gaussian vs. PSF fluxes and they look very similar. The WIDE subset has run through coaddDriver.py . multiBandDriver.py is running now.
            Hide
            lauren Lauren MacArthur added a comment -

            The WIDE subset has now run successfully through multiBandDriver.py.

            Show
            lauren Lauren MacArthur added a comment - The WIDE subset has now run successfully through multiBandDriver.py .
            Hide
            lauren Lauren MacArthur added a comment - - edited

            Modulo the caveats noted above about meas_mosaic, meas_modelfit, and the cosmos HSC-Y subset, the HSC RC dataset has been fully processed on the pybind11 stack with no issues (kudos, Pim Schellart [X] et al.!). Pim Schellart [X], are you satisfied and consider this ticket complete?

            Show
            lauren Lauren MacArthur added a comment - - edited Modulo the caveats noted above about meas_mosaic , meas_modelfit , and the cosmos HSC-Y subset, the HSC RC dataset has been fully processed on the pybind11 stack with no issues (kudos, Pim Schellart [X] et al.!). Pim Schellart [X] , are you satisfied and consider this ticket complete?
            Hide
            jbosch Jim Bosch added a comment - - edited

            Lauren MacArthur, from the above comments it sounds like setting reserveFraction > 0 can make results nonreproduceable, I'm guessing because it's choosing different random subsets on different runs. Is that right? (If so, we should definitely fix that by making the random seed used deterministic).

            Show
            jbosch Jim Bosch added a comment - - edited Lauren MacArthur , from the above comments it sounds like setting reserveFraction > 0 can make results nonreproduceable, I'm guessing because it's choosing different random subsets on different runs. Is that right? (If so, we should definitely fix that by making the random seed used deterministic).
            Hide
            lauren Lauren MacArthur added a comment -

            Yeah, I thought that should've been deterministic too, but I definitely saw the ccd-to-ccd variation in calibration go away when I tried setting it to 0.

            Show
            lauren Lauren MacArthur added a comment - Yeah, I thought that should've been deterministic too, but I definitely saw the ccd-to-ccd variation in calibration go away when I tried setting it to 0.
            Show
            lauren Lauren MacArthur added a comment - Hmmm...it does look like it should be: https://github.com/lsst/pipe_tasks/blob/master/python/lsst/pipe/tasks/measurePsf.py#L280
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment -

            Yes! Although, meas_modelfit is now also wrapped (see DM-8465), I think if ci_hsc doesn't show any issues with it we don't have to run this again. John Swinbank agreed?

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - Yes! Although, meas_modelfit is now also wrapped (see DM-8465 ), I think if ci_hsc doesn't show any issues with it we don't have to run this again. John Swinbank agreed?
            Hide
            lauren Lauren MacArthur added a comment -
            Show
            lauren Lauren MacArthur added a comment - Jim Bosch , should we be passing in an exposure ID here: https://github.com/lsst/pipe_tasks/blob/master/python/lsst/pipe/tasks/measurePsf.py#L280 ?
            Hide
            swinbank John Swinbank added a comment -

            John Swinbank agreed?

            Show
            swinbank John Swinbank added a comment - John Swinbank agreed?
            Hide
            lauren Lauren MacArthur added a comment -

            Thanks both!

            Show
            lauren Lauren MacArthur added a comment - Thanks both!
            Hide
            jbosch Jim Bosch added a comment -

            I created DM-9579 for the nondeterministic behavior. I have approximately half of a theory as to what's going wrong and what I hope is a way to fix it.

            Show
            jbosch Jim Bosch added a comment - I created DM-9579 for the nondeterministic behavior. I have approximately half of a theory as to what's going wrong and what I hope is a way to fix it.
            Hide
            lauren Lauren MacArthur added a comment -

            Show
            lauren Lauren MacArthur added a comment -

              People

              Assignee:
              lauren Lauren MacArthur
              Reporter:
              swinbank John Swinbank
              Reviewers:
              Pim Schellart [X] (Inactive)
              Watchers:
              Jim Bosch, John Swinbank, Lauren MacArthur, Pim Schellart [X] (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.