Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-595

Setup multi-node Qserv

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None

      Description

      We are currently focusing on single-node Qserv. It'd be nice to try setting up multi-node Qserv (say 4 workers and a czar on lsst-dbdev*), and improve installation scripts to simplify the process.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            Regarding qserv-configure.py use of qserv.conf here is what I learned:

            • qserv-configure.py has a hidden -C option which can be used to specify alternative location for qserv.conf
            • default location for qserv.conf is not an install directory but run directory, the script first copies all files there and then uses them for configuration.

            Potentially creating run directory can be done in two steps (instead of one step with -a option):

            % qserv-configure.py -r /path/to/run/dir -p
            # edit /path/to/run/dir/qserv.conf and/or templates if needed
            % qserv-configure.py -r /path/to/run/dir

            Show
            salnikov Andy Salnikov added a comment - Regarding qserv-configure.py use of qserv.conf here is what I learned: qserv-configure.py has a hidden -C option which can be used to specify alternative location for qserv.conf default location for qserv.conf is not an install directory but run directory, the script first copies all files there and then uses them for configuration. Potentially creating run directory can be done in two steps (instead of one step with -a option): % qserv-configure.py -r /path/to/run/dir -p # edit /path/to/run/dir/qserv.conf and/or templates if needed % qserv-configure.py -r /path/to/run/dir
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Somewhat easier and unified way for multi-node setup. This will run xrootd on master (plus cmsd on all machines) even in the case of single worker, but this is OK and single-node worker is not the most interesting case so we can afford tiny overhead for xrootd cluster manager for the sake of uniformity.

            1. Build/install qserv as one normally would
            2. distribute the software on each machine in the cluster (assume it is installed in the same location now)
              • there is simple helper script which runs rsync (admin/bin/dreplicate.sh)
              • dreplicate.sh /path/to/stack host1 host2 host3
            3. go to master host:
              • setup qserv, make fresh run directory, edit config file, fill run directory:

                % setup qserv $VERSION
                % qserv-configure.py -r /path/to/run/dir -p
                % vi /path/to/run/dir/qserv.conf 
                   -- change "node_type" to "master", and "master" to master host name
                % qserv-configure.py -r /path/to/run/dir
                % /path/to/run/dir/bin/qserv-start.sh

            4. go to worker host(s):
              • setup qserv, make fresh run directory, edit config file, fill run directory:

                % setup qserv $VERSION
                % qserv-configure.py -r /path/to/run/dir -p
                % vi /path/to/run/dir/qserv.conf 
                   -- change "node_type" to "worker", and "master" to master host name
                % qserv-configure.py -r /path/to/run/dir
                % /path/to/run/dir/bin/qserv-start.sh

            There are small complications with that - cmsd pid file name is different for the manager cmsd instance which confuses our init.d scripts, will need to fix that. Same applies to the xrootd/cmsd log file name (it does not prevent them from running, just when it fails to start init.d script prints "incorrect" log file name).

            Show
            salnikov Andy Salnikov added a comment - - edited Somewhat easier and unified way for multi-node setup. This will run xrootd on master (plus cmsd on all machines) even in the case of single worker, but this is OK and single-node worker is not the most interesting case so we can afford tiny overhead for xrootd cluster manager for the sake of uniformity. Build/install qserv as one normally would distribute the software on each machine in the cluster (assume it is installed in the same location now) there is simple helper script which runs rsync ( admin/bin/dreplicate.sh ) dreplicate.sh /path/to/stack host1 host2 host3 go to master host: setup qserv, make fresh run directory, edit config file, fill run directory: % setup qserv $VERSION % qserv-configure.py -r /path/to/run/dir -p % vi /path/to/run/dir/qserv.conf -- change "node_type" to "master", and "master" to master host name % qserv-configure.py -r /path/to/run/dir % /path/to/run/dir/bin/qserv-start.sh go to worker host(s): setup qserv, make fresh run directory, edit config file, fill run directory: % setup qserv $VERSION % qserv-configure.py -r /path/to/run/dir -p % vi /path/to/run/dir/qserv.conf -- change "node_type" to "worker", and "master" to master host name % qserv-configure.py -r /path/to/run/dir % /path/to/run/dir/bin/qserv-start.sh There are small complications with that - cmsd pid file name is different for the manager cmsd instance which confuses our init.d scripts, will need to fix that. Same applies to the xrootd/cmsd log file name (it does not prevent them from running, just when it fails to start init.d script prints "incorrect" log file name).
            Hide
            salnikov Andy Salnikov added a comment -

            I think I better limit this ticket to configuration of qserv on multiple hosts and leave integration tests with this setup to a new ticket. I have pushed changes to my private branch, please review when you have a minute. It should be possible to install/configure it following the instructions in previous comment. Everything should start now in multi-node setup. In standard "mono" setup things should work as before, all integration tests succeed for me.

            The number of changes is not very large and many of them are small fixes not actually related to this ticket but they are fixes to the issues discovered by pylint. Some issues are real problems, we should do pylint more extensively on all python code.

            Show
            salnikov Andy Salnikov added a comment - I think I better limit this ticket to configuration of qserv on multiple hosts and leave integration tests with this setup to a new ticket. I have pushed changes to my private branch, please review when you have a minute. It should be possible to install/configure it following the instructions in previous comment. Everything should start now in multi-node setup. In standard "mono" setup things should work as before, all integration tests succeed for me. The number of changes is not very large and many of them are small fixes not actually related to this ticket but they are fixes to the issues discovered by pylint. Some issues are real problems, we should do pylint more extensively on all python code.
            Hide
            jbecla Jacek Becla added a comment -

            Andy, it is going in a nice direction.

            I agree dealing with the integration tests should be separate. Can you open a ticket for that?

            I think administrators would not like us if we make them edit a file on 300 worker machines... I am alluding to this step:

            qserv-configure.py -r /path/to/run/dir -p
            #edit /path/to/run/dir/qserv.conf 
            qserv-configure.py -r /path/to/run/dir

            Can we do something like:

            qserv-configure.py -r /path/to/run/dir --type worker|master --masterhost = theHostName

            Since you mentioned lsst-dev01... let's fix that, it shows up in 3 places:

            • ./core/modules/util/xrootd.cc
            • ./core/modules/czar/lsst/qserv/czar/config.py
            • ./admin/templates/server/etc/local.qserv.cnf

            And finally, I made some cosmetic tweaks, see u/jbecla/DM-595, (I didn't test them though), feel free to take them, or just ignore, it is all minor

            Thanks!

            Show
            jbecla Jacek Becla added a comment - Andy, it is going in a nice direction. I agree dealing with the integration tests should be separate. Can you open a ticket for that? I think administrators would not like us if we make them edit a file on 300 worker machines... I am alluding to this step: qserv-configure.py -r /path/to/run/dir -p #edit /path/to/run/dir/qserv.conf qserv-configure.py -r /path/to/run/dir Can we do something like: qserv-configure.py -r /path/to/run/dir --type worker|master --masterhost = theHostName Since you mentioned lsst-dev01... let's fix that, it shows up in 3 places: ./core/modules/util/xrootd.cc ./core/modules/czar/lsst/qserv/czar/config.py ./admin/templates/server/etc/local.qserv.cnf And finally, I made some cosmetic tweaks, see u/jbecla/ DM-595 , (I didn't test them though), feel free to take them, or just ignore, it is all minor Thanks!
            Hide
            salnikov Andy Salnikov added a comment -

            HI JAcek,

            I made a new ticket DM-998 for integration tests in multi-node setup.

            We should indeed improve configuration, I agree that editing config file is not going to do it. Once we start thinking how to do full installation it may become clearer what is really needed on that part.

            I also replaced lsst-dev01 with localhost (I'm not sure if that stuff is used in our setup or maybe it needs some cleanup) and merged your changes.

            Cheers,
            Andy

            Show
            salnikov Andy Salnikov added a comment - HI JAcek, I made a new ticket DM-998 for integration tests in multi-node setup. We should indeed improve configuration, I agree that editing config file is not going to do it. Once we start thinking how to do full installation it may become clearer what is really needed on that part. I also replaced lsst-dev01 with localhost (I'm not sure if that stuff is used in our setup or maybe it needs some cleanup) and merged your changes. Cheers, Andy

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              fritzm Fritz Mueller
              Reviewers:
              Jacek Becla
              Watchers:
              Andy Salnikov, Fabrice Jammes, Jacek Becla, Robyn Allsman [X] (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: