Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-3490

Quick-and-dirty n-way spatial matching

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: afw
    • Labels:
      None
    • Story Points:
      2
    • Sprint:
      Science Pipelines DM-S15-6, Science Pipelines DM-W16-1
    • Team:
      Data Release Production

      Description

      This issue will add limited N-way spatial matching of multiple catalogs with identical schemas, sufficient for measuring FY15 KPMs. It will be a simple wrapper on our existing 2-way matching code in afw, and will not be intended for long term use (as it won't be an efficient algorithm or an ideal interrface).

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment - - edited

            Copying the review here:

            Lauren MacArthur, on the MultiMatch constructor:

            Should it be noted/documented somewhere that the default radius is 0.5 arcsec if None is provided?

            Is 0.5 arcsec a bit generous for matching on astrometrically calibrated (although roughly) positions? Should this be set based on the estimated astrometric scatter? This may be moot as I think your rejection criteria account for any mismatches leaking in due to a large match radius, but restricting the match radius can shrink the initial number of "goodMatches", e.g.
            ========================================================================
            radius = 0.5
            Number of goodMatches = 89155
            Number of safeMatches = 3468
            RMS = 0.019516 magnitudes, with 0.49% (143 of 29362) measurements clipped as outliers.
            ========================================================================
            radius = 0.21 # rougly 3*astrom scatter
            Number of goodMatches = 64889
            Number of safeMatches = 3518
            RMS = 0.019513 magnitudes, with 0.49% (145 of 29811) measurements clipped as outliers.
            ========================================================================

            Do you need to add radius = radius*lsst.afw.geom.arcseconds in the case where the call provides a float/int?

            Define/describe idField and RecordClass in Arguments list. Indicate expected units for radius.

            Is there a reason you didn't use the @param[in] format here?

            Jim Bosch:

            I'll document the default radius and I think I'll make it throw if you pass in a float, not an angle, just to avoid the case where someone passes in a float they think will be interpreted in some other unit.

            Will add the other missing arguments to the docs too.

            In pure Python code, I don't think we're required to use Doxygen markup, and Doxygen will handle it fine without it as long as we don't start the string with a !. I'm also aware that we'll be switching away from Doxygen to Sphinx in the near future (with yet another markup language), so this is probably actually a bit more future proof than using Doxygen markup right now.

            Lauren MacArthur:

            Copy that. But just to be clear, I was really wondering why you didn't use the @param[in] format here, but did use it for many functions below (e.g. build, aggregate, apply).

            Jim Bosch:

            Oh, I guess I didn't realize I was being inconsistent within a particular file. I don't really have a reason; it may have been code that began life as part of an IPython notebook vs. code that began life elsewhere.

            If I'd seen this a moment earlier, I think I'd have fixed it because I do think consistency within a file is moderately important. But given that I've just merged it and it'll all be reformatted into restructed text soon enough, I'm just going to leave it.

            Lauren MacArthur:

            Would it be better to use, e.g., dataType instead of type here (since type is a python internal)?

            Jim Bosch:

            Will do.

            Lauren MacArthur, on the class docs for GroupView:

            Funny sentence construction here: "that provides an convenient operations on the"

            Jim Bosch

            Will remove the "an".

            Show
            jbosch Jim Bosch added a comment - - edited Copying the review here: Lauren MacArthur , on the MultiMatch constructor: Should it be noted/documented somewhere that the default radius is 0.5 arcsec if None is provided? Is 0.5 arcsec a bit generous for matching on astrometrically calibrated (although roughly) positions? Should this be set based on the estimated astrometric scatter? This may be moot as I think your rejection criteria account for any mismatches leaking in due to a large match radius, but restricting the match radius can shrink the initial number of "goodMatches", e.g. ======================================================================== radius = 0.5 Number of goodMatches = 89155 Number of safeMatches = 3468 RMS = 0.019516 magnitudes, with 0.49% (143 of 29362) measurements clipped as outliers. ======================================================================== radius = 0.21 # rougly 3*astrom scatter Number of goodMatches = 64889 Number of safeMatches = 3518 RMS = 0.019513 magnitudes, with 0.49% (145 of 29811) measurements clipped as outliers. ======================================================================== Do you need to add radius = radius*lsst.afw.geom.arcseconds in the case where the call provides a float/int? Define/describe idField and RecordClass in Arguments list. Indicate expected units for radius. Is there a reason you didn't use the @param[in] format here? Jim Bosch : I'll document the default radius and I think I'll make it throw if you pass in a float, not an angle, just to avoid the case where someone passes in a float they think will be interpreted in some other unit. Will add the other missing arguments to the docs too. In pure Python code, I don't think we're required to use Doxygen markup, and Doxygen will handle it fine without it as long as we don't start the string with a !. I'm also aware that we'll be switching away from Doxygen to Sphinx in the near future (with yet another markup language), so this is probably actually a bit more future proof than using Doxygen markup right now. Lauren MacArthur : Copy that. But just to be clear, I was really wondering why you didn't use the @param[in] format here, but did use it for many functions below (e.g. build, aggregate, apply). Jim Bosch : Oh, I guess I didn't realize I was being inconsistent within a particular file. I don't really have a reason; it may have been code that began life as part of an IPython notebook vs. code that began life elsewhere. If I'd seen this a moment earlier, I think I'd have fixed it because I do think consistency within a file is moderately important. But given that I've just merged it and it'll all be reformatted into restructed text soon enough, I'm just going to leave it. Lauren MacArthur : Would it be better to use, e.g., dataType instead of type here (since type is a python internal)? Jim Bosch : Will do. Lauren MacArthur , on the class docs for GroupView : Funny sentence construction here: "that provides an convenient operations on the" Jim Bosch Will remove the "an".
            Hide
            jbosch Jim Bosch added a comment -

            Rebased and merged to master.

            Show
            jbosch Jim Bosch added a comment - Rebased and merged to master.
            Hide
            lauren Lauren MacArthur added a comment -

            Jim Bosch Did you also see the comments for this commit:
            https://github.com/lsst/afw/commit/f9bfe4543b8c75c342cca0ece11c8803a6dc3a81

            Not a huge deal, but I do believe there is a typo in your notebook in box [13] (which likely filtered into Vishal Kasliwal [X]'s script).

            This is the comment:

            In [13] I think you meant get(modelMagKey) in:
            modelMag = numpy.mean(cat.get(psfMagKey))

            I ran this with a full focal plane run of the same visits. With the above fix the sigma went down slightly from 0.223 to 0.195. The difference in the number of safe matches was:
            Old Number of safeMatches = 4024 # (before modelMagKey fix in notebook)
            Number of safeMatches = 3468

            Redundant call in [15]

            Truncate sigma printing in [17]:
            ax2.plot(gaussian, y, 'r-', label="Gaussian with sigma={0:0.4f}".format(sigma))

            Maybe indicate the total number of measurements (before and after clipping) somewhere?

            Show
            lauren Lauren MacArthur added a comment - Jim Bosch Did you also see the comments for this commit: https://github.com/lsst/afw/commit/f9bfe4543b8c75c342cca0ece11c8803a6dc3a81 Not a huge deal, but I do believe there is a typo in your notebook in box [13] (which likely filtered into Vishal Kasliwal [X] 's script). This is the comment: In [13] I think you meant get(modelMagKey) in: modelMag = numpy.mean(cat.get(psfMagKey)) I ran this with a full focal plane run of the same visits. With the above fix the sigma went down slightly from 0.223 to 0.195. The difference in the number of safe matches was: Old Number of safeMatches = 4024 # (before modelMagKey fix in notebook) Number of safeMatches = 3468 Redundant call in [15] Truncate sigma printing in [17]: ax2.plot(gaussian, y, 'r-', label="Gaussian with sigma={0:0.4f}".format(sigma)) Maybe indicate the total number of measurements (before and after clipping) somewhere?
            Hide
            vpk24 Vishal Kasliwal [X] (Inactive) added a comment -

            I saw that typo too and changed it in my version of Jim's script.

            Show
            vpk24 Vishal Kasliwal [X] (Inactive) added a comment - I saw that typo too and changed it in my version of Jim's script.
            Hide
            jbosch Jim Bosch added a comment -

            Lauren MacArthur thanks for the ping - I did not see that. I'll fix that on master on another issue.

            Show
            jbosch Jim Bosch added a comment - Lauren MacArthur thanks for the ping - I did not see that. I'll fix that on master on another issue.

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Reviewers:
              Lauren MacArthur
              Watchers:
              Jim Bosch, Lauren MacArthur, Vishal Kasliwal [X] (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.