Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-1720

Make secondary index for director table only

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:

      Description

      Following discussion on qserv-l, we only need to generate "secondary" index for director table, no other table is supposed to have it. Need to modify data loader to recognize which table is director table and generate index only for that table.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            Serge Monkewitz, I'm trying to avoid adding many more configuration parameters to the config files but there are some parameters that we need to load to CSS and which need to come from config files. In the first version of the loader I added (copied from our existing *.param files) few options, all at top level:

            • dirTable – name of the director table
            • dirColName – column name containing ID of object in director table
            • dirTable1,2; dirColName1,2 – same things for match tables
            • storageClass

            The dirColName I think is the same as the part.id parameter that is used by duplicator (will it be ever used by partitioner?) so we are duplicating the same info in one file. I was thinking about merging this into single set of options, e.g.:

            • part.director – replaces dirTable
            • part.id – instead of dirColName
            • part.director1,2; part.id1,2 – for match tables

            Does it make sense or do you think we should not mess with part.* option and create separate "section" for data loader and CSS parameters?

            The actual question that I wanted to ask is how to identify director table from configuration information. If part.director is defined then I could just compare that with the current table name but I'm not sure that it makes sense to specify part.director for director table itself. Another option is to see if part.pos is specified and part.id is not (or if that part.id == id) which would mean that this is a partitioned table but there is no director for it which implies that it is itself a director. I'm not sure I'm thinking clearly now, what do you think?

            Show
            salnikov Andy Salnikov added a comment - Serge Monkewitz , I'm trying to avoid adding many more configuration parameters to the config files but there are some parameters that we need to load to CSS and which need to come from config files. In the first version of the loader I added (copied from our existing *.param files) few options, all at top level: dirTable – name of the director table dirColName – column name containing ID of object in director table dirTable1,2; dirColName1,2 – same things for match tables storageClass The dirColName I think is the same as the part.id parameter that is used by duplicator (will it be ever used by partitioner?) so we are duplicating the same info in one file. I was thinking about merging this into single set of options, e.g.: part.director – replaces dirTable part.id – instead of dirColName part.director1,2; part.id1,2 – for match tables Does it make sense or do you think we should not mess with part.* option and create separate "section" for data loader and CSS parameters? The actual question that I wanted to ask is how to identify director table from configuration information. If part.director is defined then I could just compare that with the current table name but I'm not sure that it makes sense to specify part.director for director table itself. Another option is to see if part.pos is specified and part.id is not (or if that part.id == id) which would mean that this is a partitioned table but there is no director for it which implies that it is itself a director. I'm not sure I'm thinking clearly now, what do you think?
            Hide
            salnikov Andy Salnikov added a comment -

            We discussed this at the meeting, the conclusion is that relying on existing pos.* parameters is probably not the right thing to do. We should keep dirTable and dirColName as we did before. It may mean that we duplicate same info in few places (e.g. part.id and dirColName have to be the same) but it will buy us freedom in changing parameter names independently.

            Show
            salnikov Andy Salnikov added a comment - We discussed this at the meeting, the conclusion is that relying on existing pos.* parameters is probably not the right thing to do. We should keep dirTable and dirColName as we did before. It may mean that we duplicate same info in few places (e.g. part.id and dirColName have to be the same) but it will buy us freedom in changing parameter names independently.
            Hide
            salnikov Andy Salnikov added a comment -

            Fabrice, I made few simple changes both to data loader script (in qserv) and integration test (in qserv_testdata). Data loader will now make secondary index only for director table, this simplifies logic in integration script a little bit. Also data loader know how to handle non-partitioned tables so there is no need to pass special options for them.

            Could you please review the branch u/salnikov/DM-1720 in both repositories?

            Show
            salnikov Andy Salnikov added a comment - Fabrice, I made few simple changes both to data loader script (in qserv) and integration test (in qserv_testdata). Data loader will now make secondary index only for director table, this simplifies logic in integration script a little bit. Also data loader know how to handle non-partitioned tables so there is no need to pass special options for them. Could you please review the branch u/salnikov/ DM-1720 in both repositories?
            Hide
            jammes Fabrice Jammes added a comment -

            Thanks Andy Salnikov,

            I've got a few minor comments:

            • In qserv_testdata, this function is now unused:

              python/lsst/qserv/tests/dataconfig.py:    def partitionedTables(self):

              Maybe it could be removed in this ticket? And also related parameter could be removed from description.yaml files?

            • qserv_testdata/python/lsst/qserv/tests/qservDbLoader.py, l.83: shouldn't we use dirTable parameter to manage emptyChunks.txt creation? This would allow to remove director parameter from yaml. Is there a ticket for this?
            • In

              datasets/case01/data/common.cfg
              datasets/case01/data/Source.cfg

              • We certainly will have to handle RefSrcMatch use case, but it'' be done in a future ticket?
              • It seems that dirColName parameter is set if an only if a table is "child", maybe we should add it as a comment in cfg files?
            Show
            jammes Fabrice Jammes added a comment - Thanks Andy Salnikov , I've got a few minor comments: In qserv_testdata, this function is now unused: python/lsst/qserv/tests/dataconfig.py: def partitionedTables(self): Maybe it could be removed in this ticket? And also related parameter could be removed from description.yaml files? qserv_testdata/python/lsst/qserv/tests/qservDbLoader.py, l.83: shouldn't we use dirTable parameter to manage emptyChunks.txt creation? This would allow to remove director parameter from yaml. Is there a ticket for this? In datasets/case01/data/common.cfg datasets/case01/data/Source.cfg We certainly will have to handle RefSrcMatch use case, but it'' be done in a future ticket? It seems that dirColName parameter is set if an only if a table is "child", maybe we should add it as a comment in cfg files? And a minor comment here: https://github.com/LSST/qserv/pull/24/files#diff-d0640f4ae7bda4d42728ed8ede4839eeR79
            Hide
            salnikov Andy Salnikov added a comment -

            Fabrice,

            • I have removed partitionedTables() method and corresponding parameter from YAML files
            • We should indeed get rid of few other parameters in YAML files and use parameters from cfg files to obtain same info (director table name, views, partitioned views), this should go into separate ticket, probably at the same time we can merge common.cfg and description.yaml
            • there is already support for RefMatch tables, but it's not well tested
            • dirColName can be specified for director table as well, but it's optional (defaults to ID column name)
            • I made changes to partConfig.py file to avoid defaults for director table name

            I squashed the changes on the same branch and force-pushed them to repo, you could check it again if you prefer.

            Show
            salnikov Andy Salnikov added a comment - Fabrice, I have removed partitionedTables() method and corresponding parameter from YAML files We should indeed get rid of few other parameters in YAML files and use parameters from cfg files to obtain same info (director table name, views, partitioned views), this should go into separate ticket, probably at the same time we can merge common.cfg and description.yaml there is already support for RefMatch tables, but it's not well tested dirColName can be specified for director table as well, but it's optional (defaults to ID column name) I made changes to partConfig.py file to avoid defaults for director table name I squashed the changes on the same branch and force-pushed them to repo, you could check it again if you prefer.
            Hide
            jammes Fabrice Jammes added a comment -

            Ok Andy Salnikov, thanks.

            It's ok for me.

            Show
            jammes Fabrice Jammes added a comment - Ok Andy Salnikov , thanks. It's ok for me.
            Hide
            salnikov Andy Salnikov added a comment -

            Merged and pushed both repos.

            Show
            salnikov Andy Salnikov added a comment - Merged and pushed both repos.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Reviewers:
              Fabrice Jammes
              Watchers:
              Andy Salnikov, Fabrice Jammes, Serge Monkewitz
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.