Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-4368

Duration for various ShapeletPsfApprox Models

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      This is just a report of the amount of time it takes to run ShapeletPsfApprox and CModel over 10000 galaxies from GalSim

        Attachments

        1. DM-4368.odt
          20 kB
        2. DoubleShapelet.max.1000.logs
          11 kB
        3. Full.max.1000.logs
          7 kB
        4. Full.max.2000.logs
          24 kB

          Activity

          Hide
          pgee Perry Gee added a comment -

          These are runs which show that the higher order fits for ShapeletPsfApprox are widely varying. I will plot a distribution next, but the widely varying stdevs and avgs are probably caused by some very large outliers.

          psfs_0.5.fits
          SingleGaussian: average for 306 psfs: 0.0023 stdev: 0.003821
          DoubleGaussian: average for 306 psfs: 0.0118 stdev: 0.117214
          DoubleShapelet: average for 306 psfs: 0.0300 stdev: 0.235367
          Full: average for 306 psfs: 0.2188 stdev: 1.849481
          Test1: average for 306 psfs: 0.5648 stdev: 4.341135
          Test2: average for 306 psfs: 0.2538 stdev: 0.484847
          .psfs_0.7.fits
          SingleGaussian: average for 349 psfs: 0.0051 stdev: 0.043088
          DoubleGaussian: average for 349 psfs: 0.0053 stdev: 0.004594
          DoubleShapelet: average for 349 psfs: 0.0347 stdev: 0.343586
          Full: average for 349 psfs: 0.2773 stdev: 2.568652
          Test1: average for 349 psfs: 0.6854 stdev: 6.723544
          Test2: average for 349 psfs: 0.5440 stdev: 5.086115
          .psfs_0.9.fits
          SingleGaussian: average for 354 psfs: 0.0046 stdev: 0.040344
          DoubleGaussian: average for 354 psfs: 0.0084 stdev: 0.043921
          DoubleShapelet: average for 354 psfs: 0.0158 stdev: 0.022809
          Full: average for 354 psfs: 0.2617 stdev: 2.471886
          Test1: average for 354 psfs: 0.7728 stdev: 5.927591
          Test2: average for 354 psfs: 0.3068 stdev: 0.967577

          -------------------------------------------------
          Here is a second set of runs with different Psf libraries.

          psfs_0.5.fits
          SingleGaussian: average for 387 psfs: 0.0052 stdev: 0.043604
          DoubleGaussian: average for 387 psfs: 0.0095 stdev: 0.064276
          DoubleShapelet: average for 387 psfs: 0.0217 stdev: 0.102503
          Full: average for 387 psfs: 0.1971 stdev: 0.993619
          Test1: average for 387 psfs: 0.4779 stdev: 1.878722
          Test2: average for 387 psfs: 0.4675 stdev: 3.582227
          .psfs_0.7.fits
          SingleGaussian: average for 369 psfs: 0.0105 stdev: 0.080098
          DoubleGaussian: average for 369 psfs: 0.0115 stdev: 0.105081
          DoubleShapelet: average for 369 psfs: 0.0605 stdev: 0.515485
          Full: average for 369 psfs: 0.2194 stdev: 1.803057
          Test1: average for 369 psfs: 0.6539 stdev: 6.036394
          Test2: average for 369 psfs: 0.2457 stdev: 0.352249
          .psfs_0.9.fits
          SingleGaussian: average for 352 psfs: 0.0029 stdev: 0.010248
          DoubleGaussian: average for 352 psfs: 0.0169 stdev: 0.156496
          DoubleShapelet: average for 352 psfs: 0.0364 stdev: 0.288886
          Full: average for 352 psfs: 0.1539 stdev: 0.543054
          Test1: average for 352 psfs: 0.7054 stdev: 6.704170
          Test2: average for 352 psfs: 0.5094 stdev: 4.921248

          Show
          pgee Perry Gee added a comment - These are runs which show that the higher order fits for ShapeletPsfApprox are widely varying. I will plot a distribution next, but the widely varying stdevs and avgs are probably caused by some very large outliers. psfs_0.5.fits SingleGaussian: average for 306 psfs: 0.0023 stdev: 0.003821 DoubleGaussian: average for 306 psfs: 0.0118 stdev: 0.117214 DoubleShapelet: average for 306 psfs: 0.0300 stdev: 0.235367 Full: average for 306 psfs: 0.2188 stdev: 1.849481 Test1: average for 306 psfs: 0.5648 stdev: 4.341135 Test2: average for 306 psfs: 0.2538 stdev: 0.484847 .psfs_0.7.fits SingleGaussian: average for 349 psfs: 0.0051 stdev: 0.043088 DoubleGaussian: average for 349 psfs: 0.0053 stdev: 0.004594 DoubleShapelet: average for 349 psfs: 0.0347 stdev: 0.343586 Full: average for 349 psfs: 0.2773 stdev: 2.568652 Test1: average for 349 psfs: 0.6854 stdev: 6.723544 Test2: average for 349 psfs: 0.5440 stdev: 5.086115 .psfs_0.9.fits SingleGaussian: average for 354 psfs: 0.0046 stdev: 0.040344 DoubleGaussian: average for 354 psfs: 0.0084 stdev: 0.043921 DoubleShapelet: average for 354 psfs: 0.0158 stdev: 0.022809 Full: average for 354 psfs: 0.2617 stdev: 2.471886 Test1: average for 354 psfs: 0.7728 stdev: 5.927591 Test2: average for 354 psfs: 0.3068 stdev: 0.967577 ------------------------------------------------- Here is a second set of runs with different Psf libraries. psfs_0.5.fits SingleGaussian: average for 387 psfs: 0.0052 stdev: 0.043604 DoubleGaussian: average for 387 psfs: 0.0095 stdev: 0.064276 DoubleShapelet: average for 387 psfs: 0.0217 stdev: 0.102503 Full: average for 387 psfs: 0.1971 stdev: 0.993619 Test1: average for 387 psfs: 0.4779 stdev: 1.878722 Test2: average for 387 psfs: 0.4675 stdev: 3.582227 .psfs_0.7.fits SingleGaussian: average for 369 psfs: 0.0105 stdev: 0.080098 DoubleGaussian: average for 369 psfs: 0.0115 stdev: 0.105081 DoubleShapelet: average for 369 psfs: 0.0605 stdev: 0.515485 Full: average for 369 psfs: 0.2194 stdev: 1.803057 Test1: average for 369 psfs: 0.6539 stdev: 6.036394 Test2: average for 369 psfs: 0.2457 stdev: 0.352249 .psfs_0.9.fits SingleGaussian: average for 352 psfs: 0.0029 stdev: 0.010248 DoubleGaussian: average for 352 psfs: 0.0169 stdev: 0.156496 DoubleShapelet: average for 352 psfs: 0.0364 stdev: 0.288886 Full: average for 352 psfs: 0.1539 stdev: 0.543054 Test1: average for 352 psfs: 0.7054 stdev: 6.704170 Test2: average for 352 psfs: 0.5094 stdev: 4.921248
          Hide
          pgee Perry Gee added a comment -

          The following study was done with two rounds of 7 sigma clipping, which should only remove extreme outliers. It shows that the large standard deviations in the previous runs were caused by some extreme outliers.

          psfs_0.5.fits
          SingleGaussian: average for 387 psfs: 0.0024 stdev: 0.004687
          clipped values > 0.1075: [0.28041911125183105, 0.8109719753265381]
          DoubleGaussian: average for 387 psfs: 0.0051 stdev: 0.000779
          clipped values > 0.0176: [1.1540520191192627, 0.03627610206604004, 0.5256130695343018]
          DoubleShapelet: average for 387 psfs: 0.0141 stdev: 0.001438
          clipped values > 0.0609: [1.7314538955688477, 1.0571620464324951, 0.06220102310180664, 0.13170194625854492]
          Full: average for 387 psfs: 0.1134 stdev: 0.084518
          clipped values > 1.7492: [15.541382074356079, 3.824735164642334, 2.155457019805908, 11.185655117034912]
          Test1: average for 387 psfs: 0.2929 stdev: 0.097276
          clipped values > 2.7615: [17.389684915542603, 4.001487970352173, 5.731444835662842, 22.650147199630737, 23.304425954818726]
          Test2: average for 387 psfs: 0.2554 stdev: 0.348485
          clipped values > 5.7946: [69.13292789459229, 14.105606079101562]
          .psfs_0.7.fits
          SingleGaussian: average for 369 psfs: 0.0021 stdev: 0.000040
          clipped values > 0.0024: [0.002516031265258789, 0.7539308071136475, 0.7510437965393066, 0.7741520404815674, 0.8092930316925049]
          DoubleGaussian: average for 369 psfs: 0.0051 stdev: 0.001552
          clipped values > 0.0962: [2.000887155532837, 0.21076488494873047, 0.13927197456359863]
          DoubleShapelet: average for 369 psfs: 0.0141 stdev: 0.000160
          clipped values > 0.0360: [0.0733940601348877, 6.129197835922241, 6.105515956878662, 4.8287341594696045]
          Full: average for 369 psfs: 0.1141 stdev: 0.099120
          clipped values > 1.9049: [4.603094100952148, 34.23759579658508]
          Test1: average for 369 psfs: 0.2943 stdev: 0.153182
          clipped values > 5.2502: [11.893396139144897, 6.470481872558594, 115.38624501228333]
          Test2: average for 369 psfs: 0.2175 stdev: 0.020017
          clipped values > 1.1176: [5.853317975997925, 3.2104039192199707, 2.6327810287475586]
          .psfs_0.9.fits
          SingleGaussian: average for 352 psfs: 0.0021 stdev: 0.000094
          clipped values > 0.0244: [0.03729820251464844, 0.18493080139160156, 0.0496518611907959]
          DoubleGaussian: average for 352 psfs: 0.0050 stdev: 0.000046
          clipped values > 0.0083: [2.0939059257507324, 0.00985407829284668, 2.063598871231079, 0.01220703125]
          DoubleShapelet: average for 352 psfs: 0.0141 stdev: 0.000775
          clipped values > 0.1152: [3.2652430534362793, 4.3427510261535645, 0.2819490432739258]
          Full: average for 352 psfs: 0.1079 stdev: 0.015193
          clipped values > 1.0762: [8.40607213973999, 2.660454034805298, 5.388958930969238]
          Test1: average for 352 psfs: 0.3000 stdev: 0.252734
          clipped values > 5.3710: [8.429428100585938, 125.1890299320221, 9.923501014709473]
          Test2: average for 352 psfs: 0.2238 stdev: 0.131330
          clipped values > 2.9361: [5.458249092102051, 93.49560594558716, 4.508358001708984]

          Show
          pgee Perry Gee added a comment - The following study was done with two rounds of 7 sigma clipping, which should only remove extreme outliers. It shows that the large standard deviations in the previous runs were caused by some extreme outliers. psfs_0.5.fits SingleGaussian: average for 387 psfs: 0.0024 stdev: 0.004687 clipped values > 0.1075: [0.28041911125183105, 0.8109719753265381] DoubleGaussian: average for 387 psfs: 0.0051 stdev: 0.000779 clipped values > 0.0176: [1.1540520191192627, 0.03627610206604004, 0.5256130695343018] DoubleShapelet: average for 387 psfs: 0.0141 stdev: 0.001438 clipped values > 0.0609: [1.7314538955688477, 1.0571620464324951, 0.06220102310180664, 0.13170194625854492] Full: average for 387 psfs: 0.1134 stdev: 0.084518 clipped values > 1.7492: [15.541382074356079, 3.824735164642334, 2.155457019805908, 11.185655117034912] Test1: average for 387 psfs: 0.2929 stdev: 0.097276 clipped values > 2.7615: [17.389684915542603, 4.001487970352173, 5.731444835662842, 22.650147199630737, 23.304425954818726] Test2: average for 387 psfs: 0.2554 stdev: 0.348485 clipped values > 5.7946: [69.13292789459229, 14.105606079101562] .psfs_0.7.fits SingleGaussian: average for 369 psfs: 0.0021 stdev: 0.000040 clipped values > 0.0024: [0.002516031265258789, 0.7539308071136475, 0.7510437965393066, 0.7741520404815674, 0.8092930316925049] DoubleGaussian: average for 369 psfs: 0.0051 stdev: 0.001552 clipped values > 0.0962: [2.000887155532837, 0.21076488494873047, 0.13927197456359863] DoubleShapelet: average for 369 psfs: 0.0141 stdev: 0.000160 clipped values > 0.0360: [0.0733940601348877, 6.129197835922241, 6.105515956878662, 4.8287341594696045] Full: average for 369 psfs: 0.1141 stdev: 0.099120 clipped values > 1.9049: [4.603094100952148, 34.23759579658508] Test1: average for 369 psfs: 0.2943 stdev: 0.153182 clipped values > 5.2502: [11.893396139144897, 6.470481872558594, 115.38624501228333] Test2: average for 369 psfs: 0.2175 stdev: 0.020017 clipped values > 1.1176: [5.853317975997925, 3.2104039192199707, 2.6327810287475586] .psfs_0.9.fits SingleGaussian: average for 352 psfs: 0.0021 stdev: 0.000094 clipped values > 0.0244: [0.03729820251464844, 0.18493080139160156, 0.0496518611907959] DoubleGaussian: average for 352 psfs: 0.0050 stdev: 0.000046 clipped values > 0.0083: [2.0939059257507324, 0.00985407829284668, 2.063598871231079, 0.01220703125] DoubleShapelet: average for 352 psfs: 0.0141 stdev: 0.000775 clipped values > 0.1152: [3.2652430534362793, 4.3427510261535645, 0.2819490432739258] Full: average for 352 psfs: 0.1079 stdev: 0.015193 clipped values > 1.0762: [8.40607213973999, 2.660454034805298, 5.388958930969238] Test1: average for 352 psfs: 0.3000 stdev: 0.252734 clipped values > 5.3710: [8.429428100585938, 125.1890299320221, 9.923501014709473] Test2: average for 352 psfs: 0.2238 stdev: 0.131330 clipped values > 2.9361: [5.458249092102051, 93.49560594558716, 4.508358001708984]
          Hide
          pgee Perry Gee added a comment -

          This is the result of a comparison of the time ShapletPsfApprox takes vs. Cmodel for 100 galaxies with 0.7 arcsec seeing. The galaxies are randomly selected, as are the Psfs.

          Note that CModel does not increase greatly with model, while ShapeletPsfApprox does. However, it would probably be best to look at the previous comment for SPA, which was done with outlier rejection.

          SingleGaussian: average for 100 exps: 0.0252 stdev: 0.010615
          CModel: average for 100 exps: 0.0124 stdev: 0.003567
          DoubleGaussian: average for 100 exps: 0.1466 stdev: 0.019766
          CModel: average for 100 exps: 0.0230 stdev: 0.008743
          DoubleShapelet: average for 100 exps: 3.0702 stdev: 1.194606
          CModel: average for 100 exps: 0.0302 stdev: 0.007098
          Full: average for 100 exps: 48.8366 stdev: 2.655909
          CModel: average for 100 exps: 0.0848 stdev: 0.021048

          Show
          pgee Perry Gee added a comment - This is the result of a comparison of the time ShapletPsfApprox takes vs. Cmodel for 100 galaxies with 0.7 arcsec seeing. The galaxies are randomly selected, as are the Psfs. Note that CModel does not increase greatly with model, while ShapeletPsfApprox does. However, it would probably be best to look at the previous comment for SPA, which was done with outlier rejection. SingleGaussian: average for 100 exps: 0.0252 stdev: 0.010615 CModel: average for 100 exps: 0.0124 stdev: 0.003567 DoubleGaussian: average for 100 exps: 0.1466 stdev: 0.019766 CModel: average for 100 exps: 0.0230 stdev: 0.008743 DoubleShapelet: average for 100 exps: 3.0702 stdev: 1.194606 CModel: average for 100 exps: 0.0302 stdev: 0.007098 Full: average for 100 exps: 48.8366 stdev: 2.655909 CModel: average for 100 exps: 0.0848 stdev: 0.021048
          Hide
          jbosch Jim Bosch added a comment -

          Assuming all of these numbers are seconds per galaxy or PSF image (rather than seconds for a group), these numbers are indeed way too large, and it's clear I need to do some work on ShapeletPsfApprox to fix that.

          I know you already added code to only run ShapeletPsfApprox once per subfield; I'm hoping the fact that this means you only run ShapeletPsfApprox once for every 10k galaxies means it's still not a large fraction of the overall time, even when it's horribly slow. If that's the case, I suggest you just proceed as-is, while I work on speeding it up. I'm reasonably confident I can do that in a way that doesn't adversely affect the fitting results.

          I'm more concerned that the fitting that goes incredibly slow might also converge to a bad fit. To explore that possibility, it'd be useful to look for cases where a simple ShapeletPsfApprox model (e.g. DoubleGaussian) runs quite quickly, but a more complex one that's strictly a superset of the simple one (e.g. Full) is extremely slow. If we could then just look at the models compared to the original image, we might learn something more about what's going on. Just looking at the parameters could be enlightening as well - we may be spending a lot of time trying to constrain components with extremely low amplitudes.

          If you can package up any of the extremely slow fits as unit tests on ShapeletPsfApprox, I'll take a look at them. I can't promise it will be soon, as December will be very busy, but this will be a fairly high priority for me, since getting the LSST versions of ShapeletPsfApprox and CModel working on real data is going to be important for the HSC merge that everyone else at Princeton is working on right now.

          Show
          jbosch Jim Bosch added a comment - Assuming all of these numbers are seconds per galaxy or PSF image (rather than seconds for a group), these numbers are indeed way too large, and it's clear I need to do some work on ShapeletPsfApprox to fix that. I know you already added code to only run ShapeletPsfApprox once per subfield; I'm hoping the fact that this means you only run ShapeletPsfApprox once for every 10k galaxies means it's still not a large fraction of the overall time, even when it's horribly slow. If that's the case, I suggest you just proceed as-is, while I work on speeding it up. I'm reasonably confident I can do that in a way that doesn't adversely affect the fitting results. I'm more concerned that the fitting that goes incredibly slow might also converge to a bad fit. To explore that possibility, it'd be useful to look for cases where a simple ShapeletPsfApprox model (e.g. DoubleGaussian) runs quite quickly, but a more complex one that's strictly a superset of the simple one (e.g. Full) is extremely slow. If we could then just look at the models compared to the original image, we might learn something more about what's going on. Just looking at the parameters could be enlightening as well - we may be spending a lot of time trying to constrain components with extremely low amplitudes. If you can package up any of the extremely slow fits as unit tests on ShapeletPsfApprox, I'll take a look at them. I can't promise it will be soon, as December will be very busy, but this will be a fairly high priority for me, since getting the LSST versions of ShapeletPsfApprox and CModel working on real data is going to be important for the HSC merge that everyone else at Princeton is working on right now.
          Hide
          pgee Perry Gee added a comment - - edited

          These results differ in a few cases from the ones I sent you earlier. I reran several of the tests to be sure that they were correct

          I also have done the iterations count you asked for in several cases, though I did not try to include it here.

          The DM-4368.odt file contains the latest results.

          Show
          pgee Perry Gee added a comment - - edited These results differ in a few cases from the ones I sent you earlier. I reran several of the tests to be sure that they were correct I also have done the iterations count you asked for in several cases, though I did not try to include it here. The DM-4368 .odt file contains the latest results.
          Hide
          jbosch Jim Bosch added a comment -

          Review complete. I have nothing new to add, but I'll paste some of my statements from our previously off-line conversation here for posterity:

          It's interesting that the largest PSF images caused that many more failures (suggesting that it takes more iterations to fit more pixels on average), but looking at the other sizes I don't see a clear trend in terms of PSF image size vs. failure rate.

          I'm actually encouraged to see the the CModel speed with Full is only ~8x slower than with SingleGaussian; I expected it to be much worse than that, actually.

          Mostly, it's clear that I need to do some work on speeding up ShapeletPsfApprox to make it usable in practice, even for simple models. I'm pretty confident I can do that if I can just get some time to work on it. (I'm much less optimistic about being able to speed up CModel, which is why I've been paying more attention to how it scales with the PSF approximation complexity).

          Show
          jbosch Jim Bosch added a comment - Review complete. I have nothing new to add, but I'll paste some of my statements from our previously off-line conversation here for posterity: It's interesting that the largest PSF images caused that many more failures (suggesting that it takes more iterations to fit more pixels on average), but looking at the other sizes I don't see a clear trend in terms of PSF image size vs. failure rate. I'm actually encouraged to see the the CModel speed with Full is only ~8x slower than with SingleGaussian; I expected it to be much worse than that, actually. Mostly, it's clear that I need to do some work on speeding up ShapeletPsfApprox to make it usable in practice, even for simple models. I'm pretty confident I can do that if I can just get some time to work on it. (I'm much less optimistic about being able to speed up CModel, which is why I've been paying more attention to how it scales with the PSF approximation complexity).
          Hide
          pgee Perry Gee added a comment -

          I've added a couple of Full and DoubleShapelet model logs which have printouts of time and number of iterations from the size of the history recorder catalog. All but 2 are outer iterations.

          Show
          pgee Perry Gee added a comment - I've added a couple of Full and DoubleShapelet model logs which have printouts of time and number of iterations from the size of the history recorder catalog. All but 2 are outer iterations.
          Hide
          pgee Perry Gee added a comment -

          Verbose logs, including number of iterations

          Show
          pgee Perry Gee added a comment - Verbose logs, including number of iterations
          Hide
          pgee Perry Gee added a comment -

          Results attached to this ticket.

          Show
          pgee Perry Gee added a comment - Results attached to this ticket.

            People

            Assignee:
            pgee Perry Gee
            Reporter:
            pgee Perry Gee
            Reviewers:
            Jim Bosch
            Watchers:
            Jim Bosch, John Swinbank, Perry Gee
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: