Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-12853

Review design questions in ap_verify

    Details

    • Story Points:
      4
    • Epic Link:
    • Sprint:
      AP S18-2, AP S18-3
    • Team:
      Alert Production

      Description

      Work on ap_verify documentation for -DM-11592- has exposed a number of difficult-to-explain behaviors in ap_verify. Those with an obvious solution have been ticketed separately, but a number are UI/roadmap decisions that need to be agreed on before the "correct" behavior can be documented.

      This ticket is to set aside time for discussing the following questions, and updating the documentation accordingly (TODOs tagged with this issue ID):

      • How do we foresee installing datasets? Will they be automatically provided as part of lsst_distrib or another metapackage (ap_verify_hits2015 is owned by the lsst GitHub group, but as far as I know it's not distributed by any of the Stack installers)? If users will be installing them manually, how will they be versioned? [To be determined later; no updates to documentation for now]
      • What is the rationale for giving datasets a command-line name distinct from their repository or EUPS name (see the "HiTS" : "hits_data" example in -DM-11118-)? Now that this feature is implemented and documented, it seems to only add user friction (e.g., it is difficult to work out where to download a dataset from its command-line name). [Document that it's a placeholder for a future versioning system]
      • Following -DM-11118, we added a rerun parameter that superficially resembles the -rerun parameter for CmdLineTask, but places results in the dataset directory. Its behavior is so unlike repository chaining that it is likely to confuse rather than help veteran Stack users, it requires users to know where a dataset is installed (something the rest of the design tries to hide), and it potentially makes changes to otherwise read-only datasets. I would like to revisit the question of whether this argument is desirable and, if so, what its behavior should be. [To be modified as suggested by KSK; deferred to DM-13492 due to implementation constraints.]
      • Why does ap_verify take a parameter called --dataIdString rather than --dataId or, as command-line tasks do it, --id? [Change parameter to {{--id}}]
      • ap_verify currently returns 0 if the pipeline ran to completion, and an interpreter-dependent value otherwise. Should we impose more specific guarantees? (Making ap_verify count failed dataIds the way command-line tasks do may have implications for its error-handling policy.) [To be determined later; no updates to documentation for now]

        Attachments

          Issue Links

            Activity

            Hide
            krughoff Simon Krughoff added a comment -

            I'm don't have a strong opinion on the aliases that currently exist.  It seems like that is providing a functionality similar to being able to say give me the `current` HiTS dataset.  Unfortunately, I don't have a solution, but I think it's functionality we need.  Does that clear things up?

            Show
            krughoff Simon Krughoff added a comment - I'm don't have a strong opinion on the aliases that currently exist.  It seems like that is providing a functionality similar to being able to say give me the `current` HiTS dataset.  Unfortunately, I don't have a solution, but I think it's functionality we need.  Does that clear things up?
            Hide
            krzys Krzysztof Findeisen added a comment -

            When you say the aliases provide a similar functionality, are you saying that the names on the right hand sides of the mappings need to resolve to not just a particular dataset package, but a specific version of one?

            Show
            krzys Krzysztof Findeisen added a comment - When you say the aliases provide a similar functionality, are you saying that the names on the right hand sides of the mappings need to resolve to not just a particular dataset package, but a specific version of one?
            Hide
            krughoff Simon Krughoff added a comment -

            I think we would benefit from a solution that can refer to both a repository and a version of that repository.  I don't know if it is absolutely necessary.  We have been getting by with just repository names so far.  I really am open to suggestions.

            Show
            krughoff Simon Krughoff added a comment - I think we would benefit from a solution that can refer to both a repository and a version of that repository.  I don't know if it is absolutely necessary.  We have been getting by with just repository names so far.  I really am open to suggestions.
            Hide
            krzys Krzysztof Findeisen added a comment -

            Hi John Swinbank, since this is a policy/design ticket rather than an implementation ticket, would you be willing to review the documentation changes?

            Show
            krzys Krzysztof Findeisen added a comment - Hi John Swinbank , since this is a policy/design ticket rather than an implementation ticket, would you be willing to review the documentation changes?
            Hide
            swinbank John Swinbank added a comment -

            Thanks. These changes look fine to me; I left one minor comment on your PR.

            Show
            swinbank John Swinbank added a comment - Thanks. These changes look fine to me; I left one minor comment on your PR.

              People

              • Assignee:
                krzys Krzysztof Findeisen
                Reporter:
                krzys Krzysztof Findeisen
                Reviewers:
                John Swinbank
                Watchers:
                Eric Bellm, John Swinbank, Krzysztof Findeisen, Simon Krughoff
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel