Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-998

Adapt integration test to multi-node setup v1

    XMLWordPrintable

    Details

      Description

      Following DM-595 we can start qserv in multi-node configuration. Next step is to be able to run integration tests in that setup. This needs a bit of understanding how to distribute chunks between all workers in a cluster and how to load data in remote mysql server.

        Attachments

          Issue Links

            Activity

            Hide
            jbecla Jacek Becla added a comment -

            Based on discussion at hangout Nov 05, use existing single-node partitioner, then scp csv files produced by partitioner, and ssh <load> on each node.

            Show
            jbecla Jacek Becla added a comment - Based on discussion at hangout Nov 05, use existing single-node partitioner, then scp csv files produced by partitioner, and ssh <load> on each node.
            Hide
            vaikunth Vaikunth Thukral added a comment -

            For the first part of adapting the integration tests to a multi-node setup, all the requisite individual parts have to be exercised. I am documenting all the steps that were needed for a successful multi-node setup here. I used lsst-dbdev1 as the Czar, and lsst-dbdev3 as the Worker:

            1) First step is to create and configure Czar (dbdev1) and Worker (dbdev3) setups correctly. This was done through documentation in DM-595 and automated by Fabrice Jammes for IN2P3 in DM-213 . The script/instructions were adapted to setup for the dbdev machines.

            2) After czar and worker are setup, the worker node must be registered in CSS. This is done on the czar as follows:

            qserv-admin.py
            >> create node worker-dbdev3 host=lsst-dbdev3.ncsa.illinois.edu runDir=/usr/local/home/vaikunth/qserv-run mysqlConn=13306;

            Any further nodes can be added here as well. Note that mysqlConn can be given a value like lo:13306 for an SSH connection (requiring your password), but without the leading lo: it connects directly. Related - Currently deleting or updating a node is not possible (DM-2411)

            3) The next test was to load data on the workers from the czar using qserv-data-loader.py. In addition to the data loader, the integration tests need access to the worker databases for inserting, selecting etc. for various purposes. This is not an issue on a mono-node setup but required a change in permissions to be able to use the same functionality on the worker. Through various stages the final set of grants on the worker that are needed are (subject to change):

            +----------------------------------------------------------------------------------------+
            | Grants for qsmaster@lsst-dbdev%.ncsa.illinois.edu                                      |
            +----------------------------------------------------------------------------------------+
            | GRANT USAGE ON *.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu'                       |
            | GRANT SELECT ON `mysql`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu'                |
            | GRANT ALL PRIVILEGES ON `qservScratch`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu' |
            | GRANT ALL PRIVILEGES ON `Subchunks\_%`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu' |
            | GRANT ALL PRIVILEGES ON `qservTest%`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu'   |
            | GRANT ALL PRIVILEGES ON `qservw_worker`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu'|
            | GRANT ALL PRIVILEGES ON `q\_%`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu'         |
            +----------------------------------------------------------------------------------------+

            Parts of the integration test use the root account for mysql which is not ideal. This should be changed to user qsmaster for all purposes eventually (DM-2412), since granting all privileges to root on workers is not feasbile. In addition, we should think about the final set of grants that should be given on a worker and include those in the worker configuration procedure instead of manually doing it on each of them.

            4) Data can be loaded to any registered nodes in CSS via the qserv-data-loader.py command by appending the worker node list:

             qserv-data-loader.py --worker worker-dbdev3 --worker worker-dbdev4 <other parameters>

            This was sucessfully tested to work on the command line but there was an error while trying to do it through the integration tests. This is fixed by Andy Salnikov in DM-2417 and data now loads correctly on the worker.

            Show
            vaikunth Vaikunth Thukral added a comment - For the first part of adapting the integration tests to a multi-node setup, all the requisite individual parts have to be exercised. I am documenting all the steps that were needed for a successful multi-node setup here. I used lsst-dbdev1 as the Czar, and lsst-dbdev3 as the Worker: 1) First step is to create and configure Czar (dbdev1) and Worker (dbdev3) setups correctly. This was done through documentation in DM-595 and automated by Fabrice Jammes for IN2P3 in DM-213 . The script/instructions were adapted to setup for the dbdev machines. 2) After czar and worker are setup, the worker node must be registered in CSS. This is done on the czar as follows: qserv-admin.py >> create node worker-dbdev3 host=lsst-dbdev3.ncsa.illinois.edu runDir=/usr/local/home/vaikunth/qserv-run mysqlConn=13306; Any further nodes can be added here as well. Note that mysqlConn can be given a value like lo:13306 for an SSH connection (requiring your password), but without the leading lo: it connects directly. Related - Currently deleting or updating a node is not possible ( DM-2411 ) 3) The next test was to load data on the workers from the czar using qserv-data-loader.py. In addition to the data loader, the integration tests need access to the worker databases for inserting, selecting etc. for various purposes. This is not an issue on a mono-node setup but required a change in permissions to be able to use the same functionality on the worker. Through various stages the final set of grants on the worker that are needed are (subject to change): +----------------------------------------------------------------------------------------+ | Grants for qsmaster@lsst-dbdev%.ncsa.illinois.edu | +----------------------------------------------------------------------------------------+ | GRANT USAGE ON *.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu' | | GRANT SELECT ON `mysql`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu' | | GRANT ALL PRIVILEGES ON `qservScratch`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu' | | GRANT ALL PRIVILEGES ON `Subchunks\_%`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu' | | GRANT ALL PRIVILEGES ON `qservTest%`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu' | | GRANT ALL PRIVILEGES ON `qservw_worker`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu'| | GRANT ALL PRIVILEGES ON `q\_%`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu' | +----------------------------------------------------------------------------------------+ Parts of the integration test use the root account for mysql which is not ideal. This should be changed to user qsmaster for all purposes eventually ( DM-2412 ), since granting all privileges to root on workers is not feasbile. In addition, we should think about the final set of grants that should be given on a worker and include those in the worker configuration procedure instead of manually doing it on each of them. 4) Data can be loaded to any registered nodes in CSS via the qserv-data-loader.py command by appending the worker node list: qserv-data-loader.py --worker worker-dbdev3 --worker worker-dbdev4 <other parameters> This was sucessfully tested to work on the command line but there was an error while trying to do it through the integration tests. This is fixed by Andy Salnikov in DM-2417 and data now loads correctly on the worker.
            Hide
            vaikunth Vaikunth Thukral added a comment - - edited

            Continued from above ...

            5) Current integration tests rely on local sql-Interface to execute query commands, which does not work when connecting to worker mysql. The tests need to be adapted to use the helper QservAdmin and WorkerAdmin classes which manage connections to the workers:

            qAdmin = qservAdmin.QservAdmin('localhost:12181')
            wAdmin = workerAdmin.WorkerAdmin('worker-dbdev3',self.qAdmin)
            db = wAdmin.mysqlConn(user='qsmaster')

            and sql commands are executed now as:

            db.execCommandN(sql-command)

            Note that a new wAdmin object must be made for each worker and while I am able to test it manually right now, should be automated in the future. Also note that the user must be specified as qsmaster for mysqlConn.

            6) At this stage, the integration tests are able to load data successfully into the workers and are able to move towards query testing. Query testing fails right now and is being looked at in DM-2451, due to errors from interface changes between cmsd and xrootd.

            7) Continuing integration test adaptation to multi-node setup in DM-2175.

            Show
            vaikunth Vaikunth Thukral added a comment - - edited Continued from above ... 5) Current integration tests rely on local sql-Interface to execute query commands, which does not work when connecting to worker mysql. The tests need to be adapted to use the helper QservAdmin and WorkerAdmin classes which manage connections to the workers: qAdmin = qservAdmin.QservAdmin('localhost:12181') wAdmin = workerAdmin.WorkerAdmin('worker-dbdev3',self.qAdmin) db = wAdmin.mysqlConn(user='qsmaster') and sql commands are executed now as: db.execCommandN(sql-command) Note that a new wAdmin object must be made for each worker and while I am able to test it manually right now, should be automated in the future. Also note that the user must be specified as qsmaster for mysqlConn . 6) At this stage, the integration tests are able to load data successfully into the workers and are able to move towards query testing. Query testing fails right now and is being looked at in DM-2451 , due to errors from interface changes between cmsd and xrootd. 7) Continuing integration test adaptation to multi-node setup in DM-2175 .
            Hide
            jbecla Jacek Becla added a comment -

            It looks good, great work! I suspect DM-2175 will be the ultimate test of your recipe. At first look, I don't see anything wrong with it. I'll close it.

            Show
            jbecla Jacek Becla added a comment - It looks good, great work! I suspect DM-2175 will be the ultimate test of your recipe. At first look, I don't see anything wrong with it. I'll close it.
            Hide
            salnikov Andy Salnikov added a comment -

            Vaikunth, one minor comment about grants - when we did that together I did not have time to think, so we just copied whatever grants were there from 'qsmaster'@'localhost'. In reality most of those grants should not be needed, I think this is the set which should be enough for data loader:

            +----------------------------------------------------------------------------------------+
            | Grants for qsmaster@lsst-dbdev%.ncsa.illinois.edu                                      |
            +----------------------------------------------------------------------------------------+
            | GRANT USAGE ON *.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu'                       |
            | GRANT SELECT ON `mysql`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu'                |
            | GRANT ALL PRIVILEGES ON `qservw_worker`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu'|
            +----------------------------------------------------------------------------------------+

            In addition grants are needed for every database in the integration tests.

            Show
            salnikov Andy Salnikov added a comment - Vaikunth, one minor comment about grants - when we did that together I did not have time to think, so we just copied whatever grants were there from 'qsmaster'@'localhost'. In reality most of those grants should not be needed, I think this is the set which should be enough for data loader: +----------------------------------------------------------------------------------------+ | Grants for qsmaster@lsst-dbdev%.ncsa.illinois.edu | +----------------------------------------------------------------------------------------+ | GRANT USAGE ON *.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu' | | GRANT SELECT ON `mysql`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu' | | GRANT ALL PRIVILEGES ON `qservw_worker`.* TO 'qsmaster'@'lsst-dbdev%.ncsa.illinois.edu'| +----------------------------------------------------------------------------------------+ In addition grants are needed for every database in the integration tests.
            Hide
            vaikunth Vaikunth Thukral added a comment -

            Thanks Andy Salnikov, I know we went overkill on the grants to get things to start working, but once I have the other pieces working, I think it would be easy to incrementally test every grant that would be needed so we can compile the final list needed on a worker. I will use the ones you've mentioned as a starting point and can test it easily on a new worker (I plan on redoing all the above steps as a check).

            Moving further discussion to DM-2175.

            Show
            vaikunth Vaikunth Thukral added a comment - Thanks Andy Salnikov , I know we went overkill on the grants to get things to start working, but once I have the other pieces working, I think it would be easy to incrementally test every grant that would be needed so we can compile the final list needed on a worker. I will use the ones you've mentioned as a starting point and can test it easily on a new worker (I plan on redoing all the above steps as a check). Moving further discussion to DM-2175 .

              People

              Assignee:
              vaikunth Vaikunth Thukral
              Reporter:
              salnikov Andy Salnikov
              Reviewers:
              Jacek Becla
              Watchers:
              Andy Salnikov, Jacek Becla, Vaikunth Thukral
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  CI Builds

                  No builds found.