Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-16646

Cross-link Pipelines tutorial to Task documentation

    XMLWordPrintable

    Details

    • Story Points:
      1
    • Team:
      Alert Production

      Description

      It's unfortunately hard to drill down from the Pipelines tutorial documentation to lower level material. For example, at https://pipelines.lsst.io/v/d_2018_11_30/getting-started/data-setup.html and https://pipelines.lsst.io/v/d_2018_11_30/getting-started/processccd.html, where command-line tasks are first being introduced, it would be useful to provide a signpost to https://pipelines.lsst.io/v/d_2018_11_30/modules/lsst.pipe.base/index.html#using-command-line-tasks to provide a fuller picture of how they work.

        Attachments

          Activity

          Hide
          swinbank John Swinbank added a comment -

          Jonathan Sick — I'm happy to whack some links in here myself, but before doing so I wondered if you have any words of advice. Is adding a simple “for more information, see...” parenthetical good style, or would you suggest some other approach?

          Show
          swinbank John Swinbank added a comment - Jonathan Sick — I'm happy to whack some links in here myself, but before doing so I wondered if you have any words of advice. Is adding a simple “for more information, see...” parenthetical good style, or would you suggest some other approach?
          Hide
          jsick Jonathan Sick added a comment -

          Great idea! It seems that the first place tasks / command-line tasks are used is in "Ingesting raw data into the Butler repository."
          In this case, it might be worth making a big deal of tasks/command-line tasks as part of this step.

          The section title could become "Ingesting raw data into the Butler repository with a command-line task" — this would elevate the "task" keyword as folks are scanning the page.

          Then how about starting the section with a brief discussion of tasks and their central importance with a link to the docs. So the section https://pipelines.lsst.io/v/d_2018_11_30/getting-started/data-setup.html#ingesting-raw-data-into-the-butler-repository would read like this:

          ####

          The LSST Science Pipelines organizes data processing functionality into <it>tasks</it>. Tasks are highly configurable and therefore reusable with data from different cameras. Tasks are Python objects, so you can use them from your own scripts. Many tasks are also available as <it>command-line tasks</it> that you can run from the shell without having to write your own scripts. <link>Learn more about using command-line tasks</link>.

          The next step in this tutorial is to populate the Butler repository you created in the previous step with data from <link>ci_hsc</link>. Run the <link>ingestImages.py</link> command-line task to do that:

          ingestImages.py DATA $CI_HSC_DIR/raw/*.fits --mode=link

          For most command-line tasks, the first argument is the Butler repository (the DATA directory). You can learn more about the command-line interface from <link https://pipelines.lsst.io/v/daily/modules/lsst.pipe.base/index.html#using-command-line-tasks>Using command-line tasks</link> documentation, or by running a task with the -h flag:

          ingestImages.py -h

          ####

          What do you think?

          Show
          jsick Jonathan Sick added a comment - Great idea! It seems that the first place tasks / command-line tasks are used is in "Ingesting raw data into the Butler repository." In this case, it might be worth making a big deal of tasks/command-line tasks as part of this step. The section title could become "Ingesting raw data into the Butler repository with a command-line task" — this would elevate the "task" keyword as folks are scanning the page. Then how about starting the section with a brief discussion of tasks and their central importance with a link to the docs. So the section https://pipelines.lsst.io/v/d_2018_11_30/getting-started/data-setup.html#ingesting-raw-data-into-the-butler-repository would read like this: #### The LSST Science Pipelines organizes data processing functionality into <it>tasks</it>. Tasks are highly configurable and therefore reusable with data from different cameras. Tasks are Python objects, so you can use them from your own scripts. Many tasks are also available as <it>command-line tasks</it> that you can run from the shell without having to write your own scripts. <link>Learn more about using command-line tasks</link>. The next step in this tutorial is to populate the Butler repository you created in the previous step with data from <link>ci_hsc</link>. Run the <link>ingestImages.py</link> command-line task to do that: ingestImages.py DATA $CI_HSC_DIR/raw/*.fits --mode=link For most command-line tasks, the first argument is the Butler repository (the DATA directory). You can learn more about the command-line interface from <link https://pipelines.lsst.io/v/daily/modules/lsst.pipe.base/index.html#using-command-line-tasks >Using command-line tasks</link> documentation, or by running a task with the -h flag: ingestImages.py -h #### What do you think?
          Hide
          swinbank John Swinbank added a comment -

          That looks great!

          My only concern is about the order in which information appears in the tutorial. As written, it starts from absolute scratch, in which case it's true that the first time people will meet a task is when they ingest some data. In practice, however, I hope this will become an increasingly rare scenario — most new users will be logging in to the Science Platform and accessing pre-ingested data. That means they can (and I think we should encourage them to!) skip over the arcana around ingestion... and that means they'll miss the helpful discussion of tasks.

          At some point, we might consider restructuring the tutorial to better take account of the Platform, but that's more effort than I (at least) can bite off for now. For now, I wonder if we could incorporate your text into the description of processCcd.py (which almost everybody will want to run) instead or as well. What do you think?

          Show
          swinbank John Swinbank added a comment - That looks great! My only concern is about the order in which information appears in the tutorial. As written, it starts from absolute scratch, in which case it's true that the first time people will meet a task is when they ingest some data. In practice, however, I hope this will become an increasingly rare scenario — most new users will be logging in to the Science Platform and accessing pre-ingested data. That means they can (and I think we should encourage them to!) skip over the arcana around ingestion... and that means they'll miss the helpful discussion of tasks. At some point, we might consider restructuring the tutorial to better take account of the Platform, but that's more effort than I (at least) can bite off for now. For now, I wonder if we could incorporate your text into the description of processCcd.py (which almost everybody will want to run) instead or as well. What do you think?

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            swinbank John Swinbank
            Watchers:
            John Swinbank, Jonathan Sick
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated: