Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-867

Modularizing source-filtering (and other operations) out of TransformDiaSourceCatalogTask

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Adopted
    • Resolution: Unresolved
    • Component/s: DM
    • Labels:
      None

      Description

      While working with TransformDiaSourceCatalogTask I noticed other undocumented operations being performed along with the "Transform" itself – e.g. filtering out sky_sources.

      This not only results in ambiguity and an unrepresentative name for the task, but also makes debugging difficult.

      I propose to either a) identify the undocumented operations and bring them out as a new task, or b) document them and change the name of the task to something that covers the performed operations properly.

      (This is my first RFC here, so please let me know if something is missing )

        Attachments

          Issue Links

            Activity

            Hide
            ebellm Eric Bellm added a comment -

            Correct: we don't want sky sources in the APDB nor in association.

            Show
            ebellm Eric Bellm added a comment - Correct: we don't want sky sources in the APDB nor in association.
            Hide
            sullivan Ian Sullivan added a comment -

            Apologies for coming to this RFC very late. I agree that it is conceptually cleaner to separate operations such as filtering from the Task that transforms the diaSource catalog to a pandas dataframe, but I am concerned about adding any extra overhead to the AP pipeline. Sky sources are an essential internal diagnostic tool, but they do need to be filtered before we write to the APDB. I can see several possible resolutions here:

            1. Leave everything as it is, treating sky sources as a special case.
            2. If we believe that we will want to introduce any other filtering of the diaSource table, we could remove the sky source filtering from the current Task and create a new FilterDiaSourceCatalogTask for more generic filtering of the table.
            3. We could remove the sky source filtering, and make downstream Tasks responsible for stripping sky sources out of the table.
            4. We could write sky sources to their own source catalog, rather than including them with the diaSources. This would require re-writing a fair amount of code, and would need a new RFC.

            I think option 3 above has the same flaw as the current code, since it forces downstream code to perform multiple different actions.

            Option 2 is the cleanest conceptually, but it does have drawbacks. If sky sources are the only objects that are being filtered, this would require creating a new data type for the filtered table, or more likely a temporary data type for the unfiltered table. We should be able to pass the unfiltered table in memory, but that is yet to be implemented. I think we would only want to do this if we wanted to support additional filtering before writing to the APDB.

            Let's discuss these options at our next AP pipelines meeting

            Show
            sullivan Ian Sullivan added a comment - Apologies for coming to this RFC very late. I agree that it is conceptually cleaner to separate operations such as filtering from the Task that transforms the diaSource catalog to a pandas dataframe, but I am concerned about adding any extra overhead to the AP pipeline. Sky sources are an essential internal diagnostic tool, but they do need to be filtered before we write to the APDB. I can see several possible resolutions here: Leave everything as it is, treating sky sources as a special case. If we believe that we will want to introduce any other filtering of the diaSource table, we could remove the sky source filtering from the current Task and create a new FilterDiaSourceCatalogTask for more generic filtering of the table. We could remove the sky source filtering, and make downstream Tasks responsible for stripping sky sources out of the table. We could write sky sources to their own source catalog, rather than including them with the diaSources. This would require re-writing a fair amount of code, and would need a new RFC. I think option 3 above has the same flaw as the current code, since it forces downstream code to perform multiple different actions. Option 2 is the cleanest conceptually, but it does have drawbacks. If sky sources are the only objects that are being filtered, this would require creating a new data type for the filtered table, or more likely a temporary data type for the unfiltered table. We should be able to pass the unfiltered table in memory, but that is yet to be implemented. I think we would only want to do this if we wanted to support additional filtering before writing to the APDB. Let's discuss these options at our next AP pipelines meeting
            Hide
            krzys Krzysztof Findeisen added a comment -

            The AP pipelines meeting decided to rename the task rather than rewrite it; can this RFC be adopted or withdrawn?

            Show
            krzys Krzysztof Findeisen added a comment - The AP pipelines meeting decided to rename the task rather than rewrite it; can this RFC be adopted or withdrawn?
            Hide
            nima Nima Sedaghat added a comment -

            Thanks for the reminder. As far as I remember, we agreed about renaming it to something like "StandardizeDiaSourceCatalogTask", as well as correcting the "definition" of the task at https://pipelines.lsst.io/py-api/lsst.ap.association.TransformDiaSourceCatalogTask.html (correcting = listing the operations not covered by "calibrates and renames columns").

            If there are no objections/comments, I'll go ahead and adopt it on Friday Nov. 18th.

            Show
            nima Nima Sedaghat added a comment - Thanks for the reminder. As far as I remember, we agreed about renaming it to something like "StandardizeDiaSourceCatalogTask", as well as correcting the "definition" of the task at https://pipelines.lsst.io/py-api/lsst.ap.association.TransformDiaSourceCatalogTask.html (correcting = listing the operations not covered by "calibrates and renames columns"). If there are no objections/comments, I'll go ahead and adopt it on Friday Nov. 18th.
            Hide
            nima Nima Sedaghat added a comment -
            • TransformDiaSourceCatalogTask needs to be renamed to StandardizeDiaSourceCatalogTask.
            • Operations such as removal of sky_sources need to be clearly listed in the description of the task.
            Show
            nima Nima Sedaghat added a comment - TransformDiaSourceCatalogTask needs to be renamed to StandardizeDiaSourceCatalogTask. Operations such as removal of sky_sources need to be clearly listed in the description of the task.

              People

              Assignee:
              nima Nima Sedaghat
              Reporter:
              nima Nima Sedaghat
              Watchers:
              Eric Bellm, Ian Sullivan, John Parejko, Kenneth Herner, Krzysztof Findeisen, Meredith Rawls, Nima Sedaghat, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Planned End:

                  Jenkins

                  No builds found.