Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-11559

Transform HSC Reprocessing Analysis into Test Report form

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Per request from project management, make a Test Report from the HSC full reprocessing analysis.

        Attachments

        1. DMTR-31_pr1.pdf
          1.05 MB
        2. DMTR-31_pr2.pdf
          1.05 MB
        3. DMTR-31_v1.1.pdf
          1.05 MB

          Issue Links

            Activity

            Hide
            womullan Wil O'Mullane added a comment -

            That is a lot of points to get something in Docushare. If that is what this is .. its a bit light on description for 4 points.

            You make this sound like I am asking you do do something extraordinary (you are part of DM right ? so I assume you mean DMPM) - documentation is part of all of our jobs. Let me reiterate - confluence and google do not count as documentation. See LDM-294 section 3.4.3.

            Show
            womullan Wil O'Mullane added a comment - That is a lot of points to get something in Docushare. If that is what this is .. its a bit light on description for 4 points. You make this sound like I am asking you do do something extraordinary (you are part of DM right ? so I assume you mean DMPM) - documentation is part of all of our jobs. Let me reiterate - confluence and google do not count as documentation. See LDM-294 section 3.4.3.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            Apparently there are a number of ways to configure Confluence's Word and PDF exporters to use custom templates (both with the built-in exporters and, more flexibly and with API support, with third-party plugins). Perhaps we should look into this? Confluence is actually pretty useful for assembling a document and allowing people to discover it and comment on it as it develops, but terrible for authoritative documentation. If we could make export to a real document reasonably painless, it would let us have it both ways.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - Apparently there are a number of ways to configure Confluence's Word and PDF exporters to use custom templates (both with the built-in exporters and, more flexibly and with API support, with third-party plugins). Perhaps we should look into this? Confluence is actually pretty useful for assembling a document and allowing people to discover it and comment on it as it develops, but terrible for authoritative documentation. If we could make export to a real document reasonably painless, it would let us have it both ways.
            Hide
            tjenness Tim Jenness added a comment -

            For now I think the quickest approach is to export the .doc file from confluence; download the Document-11920 template and paste the content from the exported file into the Word LSST template.

            Show
            tjenness Tim Jenness added a comment - For now I think the quickest approach is to export the .doc file from confluence; download the Document-11920 template and paste the content from the exported file into the Word LSST template.
            Hide
            plutchak Joel Plutchak (Inactive) added a comment -

            I think you are reading more into this than is there. I was approached by my staff asking what this is about. They had not found anything in LSST documentation about test reports, and asked me. A quick search revealed little to me. When I asked how long it would take them to learn about the (new?) requirements, format, and process, as well as converting a somewhat informal report into a more formal document, they gave an estimate of two solid days. Hence, 4 Story Points.

            My question about who is responsible for documentation production comes from my experience in the workforce. Almost every company and organization, large or small, I have worked for had dedicated documentation staff. I'm still moderately new to this project, and thought this might be something handled in a consistent, centralized manner.

            Show
            plutchak Joel Plutchak (Inactive) added a comment - I think you are reading more into this than is there. I was approached by my staff asking what this is about. They had not found anything in LSST documentation about test reports, and asked me. A quick search revealed little to me. When I asked how long it would take them to learn about the (new?) requirements, format, and process, as well as converting a somewhat informal report into a more formal document, they gave an estimate of two solid days. Hence, 4 Story Points. My question about who is responsible for documentation production comes from my experience in the workforce. Almost every company and organization, large or small, I have worked for had dedicated documentation staff. I'm still moderately new to this project, and thought this might be something handled in a consistent, centralized manner.
            Hide
            tjenness Tim Jenness added a comment -

            Thanks. We haven't quite formalized the definition of how to layout test reports as they are a relatively new phenomenon in DM. For now we are trying to collect al the informal test reports into a single document series, DMTR, making it easy for reviewers to find the content and for us internally to be able to track progress. The recent HSC reprocessing is a very important demonstration of DM progress that we want to make available to the Joint Status Review in September. The document pack for that is due on August 21st.

            You can see the existing Test Reports in Collection-5639 on DocuShare. The content layout is very inconsistent but they all consistently use the LSST document template.

            For that reason, I think that doing the Word export from confluence and throwing it in the template is a good initial attempt at this.

            In the future, we would expect test reports such as this to be written natively as DMTRs. One way to approach that might be to use the scheme outlined by Gregory Dubois-Felsmann above. We eventually hope that many test reports can be generated automatically but not all will be able to be done that way.

            Show
            tjenness Tim Jenness added a comment - Thanks. We haven't quite formalized the definition of how to layout test reports as they are a relatively new phenomenon in DM. For now we are trying to collect al the informal test reports into a single document series, DMTR, making it easy for reviewers to find the content and for us internally to be able to track progress. The recent HSC reprocessing is a very important demonstration of DM progress that we want to make available to the Joint Status Review in September. The document pack for that is due on August 21st. You can see the existing Test Reports in Collection-5639 on DocuShare. The content layout is very inconsistent but they all consistently use the LSST document template. For that reason, I think that doing the Word export from confluence and throwing it in the template is a good initial attempt at this. In the future, we would expect test reports such as this to be written natively as DMTRs. One way to approach that might be to use the scheme outlined by Gregory Dubois-Felsmann above. We eventually hope that many test reports can be generated automatically but not all will be able to be done that way.
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            I've installed lsst-texmf and seem to get something going. Will work on porting the contents from confluence and then likely need to reorganize and formalize the report

            Show
            hchiang2 Hsin-Fang Chiang added a comment - I've installed lsst-texmf and seem to get something going. Will work on porting the contents from confluence and then likely need to reorganize and formalize the report
            Hide
            tjenness Tim Jenness added a comment -

            It's fine if it's a Word document (see instructions above). If you want to tackle this as a DMTR Latex document then please feel free to contact me any time about it. Also, pandoc is very helpful here because you can export the .doc from confluence, save it as .docx and then use pandoc to convert that to latex. I can create a DMTR-31 git repo if you need one.

            Show
            tjenness Tim Jenness added a comment - It's fine if it's a Word document (see instructions above). If you want to tackle this as a DMTR Latex document then please feel free to contact me any time about it. Also, pandoc is very helpful here because you can export the .doc from confluence, save it as .docx and then use pandoc to convert that to latex. I can create a DMTR-31 git repo if you need one.
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            John Swinbank Lauren MacArthur Greg Daues I'm attaching the first version of the DMTR here. I've copied over the confluence page with a little bit of restructuring and adding extra info. I didn't add the DRP team to the author list yet; John Swinbank should I add you and your team?

            I'm closing my first PR for now but expect to edit more later. Just wanted to get one version out first.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - John Swinbank Lauren MacArthur Greg Daues I'm attaching the first version of the DMTR here. I've copied over the confluence page with a little bit of restructuring and adding extra info. I didn't add the DRP team to the author list yet; John Swinbank should I add you and your team? I'm closing my first PR for now but expect to edit more later. Just wanted to get one version out first.
            Hide
            tjenness Tim Jenness added a comment -

            This looks great. Thanks. My opinion is that the people listed as authors should be people who contributed to the report rather than people who worked on the processing. People who contributed to the processing could be in the acknowledgments.

            Show
            tjenness Tim Jenness added a comment - This looks great. Thanks. My opinion is that the people listed as authors should be people who contributed to the report rather than people who worked on the processing. People who contributed to the processing could be in the acknowledgments.
            Hide
            swinbank John Swinbank added a comment -

            Thanks for doing this.

            I tend to agree with Tim re the author list, but don't have particularly strong feelings about it.

            Two things that jumped out at me when flicking through (I have not read the document carefully):

            • The comment in §1 that "in S17B, more tracts than listed were processed" — why not list everything?
            • It would be helpful to be specific about the resolution of the "reproducible failures" in §4.3. You mention a couple of JIRA tickets here, but it's not clear whether we should be worrying about the other failures.

            Although there's little fundamentally new in this report relative to the Confluence page, flicking through it made me realise that we should discuss whether and how information about low-level processing should be fed back to Pipelines and inform our future development. We saw recently that Fabio Hernandez was able to provide immediately actionable feedback based on his experiences of processing HSC data; it seems like we're really close to being able to do the same here, but haven't quite joined the dots yet.

            (This last point is obviously not a blocker on this document being regarded as complete, just a suggestion for how we could refine our workflow for the future.)

            Show
            swinbank John Swinbank added a comment - Thanks for doing this. I tend to agree with Tim re the author list, but don't have particularly strong feelings about it. Two things that jumped out at me when flicking through (I have not read the document carefully): The comment in §1 that "in S17B, more tracts than listed were processed" — why not list everything? It would be helpful to be specific about the resolution of the "reproducible failures" in §4.3. You mention a couple of JIRA tickets here, but it's not clear whether we should be worrying about the other failures. Although there's little fundamentally new in this report relative to the Confluence page, flicking through it made me realise that we should discuss whether and how information about low-level processing should be fed back to Pipelines and inform our future development. We saw recently that Fabio Hernandez was able to provide immediately actionable feedback based on his experiences of processing HSC data; it seems like we're really close to being able to do the same here, but haven't quite joined the dots yet. (This last point is obviously not a blocker on this document being regarded as complete, just a suggestion for how we could refine our workflow for the future.)
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            > The comment in §1 that "in S17B, more tracts than listed were processed" — why not list everything?

            Processing of extra edge tracts were attempted but not everything was sucessful. Some have sparse coverage so maybe failures were not surprising. I'd prefer to leave them out from this report. Half way during the run I realized and confirmed with Jim that those were not needed, and since then did not take notes on unintended tracts as carefully as the intended tracts.

            More explanations were in a later section but sections were not linked. In the new version I've added some clarifications.

            > It would be helpful to be specific about the resolution of the "reproducible failures" in §4.3. You mention a couple of JIRA tickets here, but it's not clear whether we should be worrying about the other failures

            Two tickets were cited there: one has been fixed (DM-10574) but not the other (DM-10755). It would be really nice if DM-10755 could be fixed before the next large run.

            I'm not sure filing tickets about the CCDs that fail processCcd. If I understand correctly those failures are known and somewhat being improved by the new astrometry matcher work. For the RC dataset the data IDs of those processCcd failures are informally tracked in the biweekly reprocessing.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - > The comment in §1 that "in S17B, more tracts than listed were processed" — why not list everything? Processing of extra edge tracts were attempted but not everything was sucessful. Some have sparse coverage so maybe failures were not surprising. I'd prefer to leave them out from this report. Half way during the run I realized and confirmed with Jim that those were not needed, and since then did not take notes on unintended tracts as carefully as the intended tracts. More explanations were in a later section but sections were not linked. In the new version I've added some clarifications. > It would be helpful to be specific about the resolution of the "reproducible failures" in §4.3. You mention a couple of JIRA tickets here, but it's not clear whether we should be worrying about the other failures Two tickets were cited there: one has been fixed ( DM-10574 ) but not the other ( DM-10755 ). It would be really nice if DM-10755 could be fixed before the next large run. I'm not sure filing tickets about the CCDs that fail processCcd. If I understand correctly those failures are known and somewhat being improved by the new astrometry matcher work. For the RC dataset the data IDs of those processCcd failures are informally tracked in the biweekly reprocessing.
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            About providing actionable feedback, I think I've been doing that throughout the reprocessing campaign. Many tickets have been filed and included in Section 6. If you include test runs too there would be even more relevant tickets.

            Among those tickets, it would be really great if DM-11171 and DM-10624 can be fixed before the next large run. DM-11171 may need quite some design thoughts; DM-10624 is much simpler. My understanding is that SuperTask won't tackle DM-11171 and it is not directly blocked by the SuperTask progress.

            About stack troubles in the low-level processing details, almost all of them are related to either Butler or CmdLineTask. With the ongoing Butler and SuperTask WGs I'm not sure filing tickets would be helpful or not.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - About providing actionable feedback, I think I've been doing that throughout the reprocessing campaign. Many tickets have been filed and included in Section 6. If you include test runs too there would be even more relevant tickets. Among those tickets, it would be really great if DM-11171 and DM-10624 can be fixed before the next large run. DM-11171 may need quite some design thoughts; DM-10624 is much simpler. My understanding is that SuperTask won't tackle DM-11171 and it is not directly blocked by the SuperTask progress. About stack troubles in the low-level processing details, almost all of them are related to either Butler or CmdLineTask. With the ongoing Butler and SuperTask WGs I'm not sure filing tickets would be helpful or not.
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            Many thanks to John Swinbank and Greg Daues for your comments! I've made some corrections and revisions about them.

            Although I still see quite some room for improvements, I'm not sure if I could have a much better version by tomorrow Aug 21 (the original request).

            Tim Jenness is the current version okay to be uploaded to Docushare?

            All edits so far have been merged to GitHub master branch.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - Many thanks to John Swinbank and Greg Daues for your comments! I've made some corrections and revisions about them. Although I still see quite some room for improvements, I'm not sure if I could have a much better version by tomorrow Aug 21 (the original request). Tim Jenness is the current version okay to be uploaded to Docushare? All edits so far have been merged to GitHub master branch.
            Hide
            swinbank John Swinbank added a comment -

            Thanks, Hsin-Fang Chiang!

            Two follow-up comments, which do not directly affect this report:

            I'm not sure filing tickets about the CCDs that fail processCcd...

            I think that any failure which cannot be otherwise explained should be ticketed. If you know it will be fixed by (e.g.) a particular update to the matcher, then it doesn't need to be ticketed directly, but it should be added to the ticket capturing the matcher work and, when the matcher changes are merged, we need to confirm that they really do fix the problem.

            About providing actionable feedback, I think I've been doing that throughout the reprocessing campaign. Many tickets have been filed and included in Section 6.

            Among those tickets, it would be really great if DM-11171 and DM-10624 can be fixed before the next large run...

            It certainly wasn't my intention to imply that no tickets had resulted from this work; sorry if I gave that impression. I do think we should find a way to feed your results more directly into the development process, though. For example, I wasn't aware that you regarded the above tickets as particularly important. Making sure concerns like this are taken into account is what the T/CAM coordination meetings are for: perhaps you could encourage Joel Plutchak to represent your priorities to the rest of DM there?

            Show
            swinbank John Swinbank added a comment - Thanks, Hsin-Fang Chiang ! Two follow-up comments, which do not directly affect this report: I'm not sure filing tickets about the CCDs that fail processCcd... I think that any failure which cannot be otherwise explained should be ticketed. If you know it will be fixed by (e.g.) a particular update to the matcher, then it doesn't need to be ticketed directly, but it should be added to the ticket capturing the matcher work and, when the matcher changes are merged, we need to confirm that they really do fix the problem. About providing actionable feedback, I think I've been doing that throughout the reprocessing campaign. Many tickets have been filed and included in Section 6. Among those tickets, it would be really great if DM-11171 and DM-10624 can be fixed before the next large run... It certainly wasn't my intention to imply that no tickets had resulted from this work; sorry if I gave that impression. I do think we should find a way to feed your results more directly into the development process, though. For example, I wasn't aware that you regarded the above tickets as particularly important. Making sure concerns like this are taken into account is what the T/CAM coordination meetings are for: perhaps you could encourage Joel Plutchak to represent your priorities to the rest of DM there?
            Hide
            tjenness Tim Jenness added a comment -

            Thanks Hsin-Fang Chiang. It looks great and I've uploaded it to Docushare and tagged the repo. You can still make changes if you want but we now have a preferred version on DocuShare so that's great. When you do another reprocessing it will be a different test report with a new number.

            Show
            tjenness Tim Jenness added a comment - Thanks Hsin-Fang Chiang . It looks great and I've uploaded it to Docushare and tagged the repo. You can still make changes if you want but we now have a preferred version on DocuShare so that's great. When you do another reprocessing it will be a different test report with a new number.
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            I uploaded a slightly revised version to DocuShare and tagged the repo. The same file is attached here DMTR-31_v1.1.pdf

            Show
            hchiang2 Hsin-Fang Chiang added a comment - I uploaded a slightly revised version to DocuShare and tagged the repo. The same file is attached here DMTR-31_v1.1.pdf
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            About the ProcessCcd failures, the 4 most common failures are: "Unable to match sources", "PSF star selector found [123] candidates", "No sources remaining in match list after magnitude limit cuts", and "No objects passed our cuts for consideration as psf stars". These errors are seen consistently in the biweekly reprocessing. I had assumed they were all related to the matcher (from DM-11090 and other informal conversations) but I don't know if matchPessimisticB is expected to solve all failures (DM-10399?). I'll continue to take notes in the biweekly reprocessing, for example, the week 32 statistics is here.

            Looking to the future, I agree we would like to think how to better feed back any results into the development, not only from me but also from other users probably. So far the process is mostly Slack-based I wonder if this is beyond the T/CAM coordination meetings but I'm happy to follow any advises. Among all pending issues the top priority to me personally is Butler/SuperTask/SupervisoryFramework and I know Joel Plutchak has brought that up to the T/CAM coordination meetings multiple times for a while. I'll start sending Joel Plutchak some "secondary" issues as well.

            Closing this ticket.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - About the ProcessCcd failures, the 4 most common failures are: "Unable to match sources", "PSF star selector found [123] candidates", "No sources remaining in match list after magnitude limit cuts", and "No objects passed our cuts for consideration as psf stars". These errors are seen consistently in the biweekly reprocessing. I had assumed they were all related to the matcher (from DM-11090 and other informal conversations) but I don't know if matchPessimisticB is expected to solve all failures ( DM-10399 ?). I'll continue to take notes in the biweekly reprocessing, for example, the week 32 statistics is here . Looking to the future, I agree we would like to think how to better feed back any results into the development, not only from me but also from other users probably. So far the process is mostly Slack-based I wonder if this is beyond the T/CAM coordination meetings but I'm happy to follow any advises. Among all pending issues the top priority to me personally is Butler/SuperTask/SupervisoryFramework and I know Joel Plutchak has brought that up to the T/CAM coordination meetings multiple times for a while. I'll start sending Joel Plutchak some "secondary" issues as well. Closing this ticket.

              People

              • Assignee:
                hchiang2 Hsin-Fang Chiang
                Reporter:
                plutchak Joel Plutchak (Inactive)
                Reviewers:
                Tim Jenness
                Watchers:
                Gregory Dubois-Felsmann, Hsin-Fang Chiang, Joel Plutchak (Inactive), John Swinbank, Tim Jenness, Wil O'Mullane
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel