Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-32046

Remove HTMIndexDiaPosition plugin from AP pipeline

    XMLWordPrintable

    Details

    • Story Points:
      1
    • Epic Link:
    • Sprint:
      DB_F21_06
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      DM-31458 made HTM indexing of DIAObjects an internal detail of the implementation. As a result HTMIndexDiaPosition plugin that is used by AP pipeline is no longer useful. The plugin should be removed and all dependencies updated.

      Also https://github.com/lsst/ap_association/blob/865756303c1e3ca1f97144b9c821eba3ed5cb342/data/DiaSource.yaml#L25 can be dropped.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            Yusra AlSayyad, I should be able to work on backporting tomorrow and hopefully get it in before Thursday deadline. I'm trying to understand how pixelId is going to affect things in pipelines, especially in the future. Depending on how DataFrame (possibly empty) is constructed it may or may not have that pixelId column. If presence of pixelId is critical for some things then we need to make sure that it is always there, though my preferred approach would be to make it totally optional (pixelId is an implementation detail for SQL database, it will not exist in Cassandra, for example). There was an additional development on the ticket that I merged today (DM-32131) which again potentially changes pixelId behavior, so it's better to have some idea how it affects things going forward.

            Show
            salnikov Andy Salnikov added a comment - Yusra AlSayyad , I should be able to work on backporting tomorrow and hopefully get it in before Thursday deadline. I'm trying to understand how pixelId is going to affect things in pipelines, especially in the future. Depending on how DataFrame (possibly empty) is constructed it may or may not have that pixelId column. If presence of pixelId is critical for some things then we need to make sure that it is always there, though my preferred approach would be to make it totally optional (pixelId is an implementation detail for SQL database, it will not exist in Cassandra, for example). There was an additional development on the ticket that I merged today ( DM-32131 ) which again potentially changes pixelId behavior, so it's better to have some idea how it affects things going forward.
            Hide
            yusra Yusra AlSayyad added a comment -

            Ah, OK. Thanks for the info. I had assumed that this ticket designated a moment in time that the APDB would be handling HtmId from now on and that pipelines wouldn't have to produce it in the diaSource parquet output.

            For DP0.2, we don't actually NEED to remove pixelId from the diaSource table. I can make the functor behave nicely on DM-32238 even if we don't use it on master but do on v23.

            I think you've talked me out of backporting this...

            Show
            yusra Yusra AlSayyad added a comment - Ah, OK. Thanks for the info. I had assumed that this ticket designated a moment in time that the APDB would be handling HtmId from now on and that pipelines wouldn't have to produce it in the diaSource parquet output. For DP0.2, we don't actually NEED to remove pixelId from the diaSource table. I can make the functor behave nicely on DM-32238 even if we don't use it on master but do on v23. I think you've talked me out of backporting this...
            Hide
            salnikov Andy Salnikov added a comment -

            Well, indeed starting with this ticket, or actually earlier DM-31458, pixelId is handled completely by APDB, meaning that:

            • Pipelines don't need to calculate pixelId, it is now calculated by Apdb based on ra/dec of DiaObject. If pixelId is provided by pipelines when storing data in Apdb it will be ignored. This ticket just removes pixelId calculation from pipelines.
            • When reading data back from Apdb the returned DatraFrame can still contain pixelId, even for empty DataFrames. Previously, I think, it was possible that pipelines built empty dataframe without pixelId column in some cases. Keeping pixelId in returned DataFrame is an optimization, removing it will cost some extra CPU cycles (pandas is not super-optimal, this is why I want to avoid potentially expensive column removal)

            I'm not sure how this all is related to parquet output, that part is something I need to learn.

            Backporting this to v23 should be easy, let me know how do you want to proceed.

            Show
            salnikov Andy Salnikov added a comment - Well, indeed starting with this ticket, or actually earlier DM-31458 , pixelId is handled completely by APDB, meaning that: Pipelines don't need to calculate pixelId, it is now calculated by Apdb based on ra/dec of DiaObject. If pixelId is provided by pipelines when storing data in Apdb it will be ignored. This ticket just removes pixelId calculation from pipelines. When reading data back from Apdb the returned DatraFrame can still contain pixelId, even for empty DataFrames. Previously, I think, it was possible that pipelines built empty dataframe without pixelId column in some cases. Keeping pixelId in returned DataFrame is an optimization, removing it will cost some extra CPU cycles (pandas is not super-optimal, this is why I want to avoid potentially expensive column removal) I'm not sure how this all is related to parquet output, that part is something I need to learn. Backporting this to v23 should be easy, let me know how do you want to proceed.
            Hide
            yusra Yusra AlSayyad added a comment -

            Because it'll cause a diaSource schema change and this ticket is now unnecessary to prevent drpAssociation failures, let's skip the backporting it.

            ...unless you think v23 users NEED it for some other reason...I am sure we don't need it for DP0.2.

            Show
            yusra Yusra AlSayyad added a comment - Because it'll cause a diaSource schema change and this ticket is now unnecessary to prevent drpAssociation failures, let's skip the backporting it. ...unless you think v23 users NEED it for some other reason...I am sure we don't need it for DP0.2.
            Hide
            salnikov Andy Salnikov added a comment -

            I don't think it is needed, the patch is optional, AP and APDB should work fine with or without it, except for the bugs that you discovered. to skip backporting.

            Show
            salnikov Andy Salnikov added a comment - I don't think it is needed, the patch is optional, AP and APDB should work fine with or without it, except for the bugs that you discovered. to skip backporting.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Reviewers:
              Chris Morrison
              Watchers:
              Andy Salnikov, Chris Morrison [X] (Inactive), Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.