# Setup multi-node Qserv

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
6
• Sprint:
DB_S14_07
• Team:
Data Access and Database

#### Description

We are currently focusing on single-node Qserv. It'd be nice to try setting up multi-node Qserv (say 4 workers and a czar on lsst-dbdev*), and improve installation scripts to simplify the process.

#### Activity

Hide
Andy Salnikov added a comment -

Regarding qserv-configure.py use of qserv.conf here is what I learned:

• qserv-configure.py has a hidden -C option which can be used to specify alternative location for qserv.conf
• default location for qserv.conf is not an install directory but run directory, the script first copies all files there and then uses them for configuration.

Potentially creating run directory can be done in two steps (instead of one step with -a option):

 % qserv-configure.py -r /path/to/run/dir -p # edit /path/to/run/dir/qserv.conf and/or templates if needed % qserv-configure.py -r /path/to/run/dir

Show
Andy Salnikov added a comment - Regarding qserv-configure.py use of qserv.conf here is what I learned: qserv-configure.py has a hidden -C option which can be used to specify alternative location for qserv.conf default location for qserv.conf is not an install directory but run directory, the script first copies all files there and then uses them for configuration. Potentially creating run directory can be done in two steps (instead of one step with -a option): % qserv-configure.py -r /path/to/run/dir -p # edit /path/to/run/dir/qserv.conf and/or templates if needed % qserv-configure.py -r /path/to/run/dir
Hide
Andy Salnikov added a comment - - edited

Somewhat easier and unified way for multi-node setup. This will run xrootd on master (plus cmsd on all machines) even in the case of single worker, but this is OK and single-node worker is not the most interesting case so we can afford tiny overhead for xrootd cluster manager for the sake of uniformity.

1. Build/install qserv as one normally would
2. distribute the software on each machine in the cluster (assume it is installed in the same location now)
• there is simple helper script which runs rsync (admin/bin/dreplicate.sh)
• dreplicate.sh /path/to/stack host1 host2 host3
3. go to master host:
• setup qserv, make fresh run directory, edit config file, fill run directory:

 % setup qserv $VERSION % qserv-configure.py -r /path/to/run/dir -p % vi /path/to/run/dir/qserv.conf -- change "node_type" to "master", and "master" to master host name % qserv-configure.py -r /path/to/run/dir % /path/to/run/dir/bin/qserv-start.sh 4. go to worker host(s): • setup qserv, make fresh run directory, edit config file, fill run directory:  % setup qserv$VERSION % qserv-configure.py -r /path/to/run/dir -p % vi /path/to/run/dir/qserv.conf -- change "node_type" to "worker", and "master" to master host name % qserv-configure.py -r /path/to/run/dir % /path/to/run/dir/bin/qserv-start.sh

There are small complications with that - cmsd pid file name is different for the manager cmsd instance which confuses our init.d scripts, will need to fix that. Same applies to the xrootd/cmsd log file name (it does not prevent them from running, just when it fails to start init.d script prints "incorrect" log file name).

Show
Andy Salnikov added a comment - - edited Somewhat easier and unified way for multi-node setup. This will run xrootd on master (plus cmsd on all machines) even in the case of single worker, but this is OK and single-node worker is not the most interesting case so we can afford tiny overhead for xrootd cluster manager for the sake of uniformity. Build/install qserv as one normally would distribute the software on each machine in the cluster (assume it is installed in the same location now) there is simple helper script which runs rsync ( admin/bin/dreplicate.sh ) dreplicate.sh /path/to/stack host1 host2 host3 go to master host: setup qserv, make fresh run directory, edit config file, fill run directory: % setup qserv $VERSION % qserv-configure.py -r /path/to/run/dir -p % vi /path/to/run/dir/qserv.conf -- change "node_type" to "master", and "master" to master host name % qserv-configure.py -r /path/to/run/dir % /path/to/run/dir/bin/qserv-start.sh go to worker host(s): setup qserv, make fresh run directory, edit config file, fill run directory: % setup qserv$VERSION % qserv-configure.py -r /path/to/run/dir -p % vi /path/to/run/dir/qserv.conf -- change "node_type" to "worker", and "master" to master host name % qserv-configure.py -r /path/to/run/dir % /path/to/run/dir/bin/qserv-start.sh There are small complications with that - cmsd pid file name is different for the manager cmsd instance which confuses our init.d scripts, will need to fix that. Same applies to the xrootd/cmsd log file name (it does not prevent them from running, just when it fails to start init.d script prints "incorrect" log file name).
Hide
Andy Salnikov added a comment -

I think I better limit this ticket to configuration of qserv on multiple hosts and leave integration tests with this setup to a new ticket. I have pushed changes to my private branch, please review when you have a minute. It should be possible to install/configure it following the instructions in previous comment. Everything should start now in multi-node setup. In standard "mono" setup things should work as before, all integration tests succeed for me.

The number of changes is not very large and many of them are small fixes not actually related to this ticket but they are fixes to the issues discovered by pylint. Some issues are real problems, we should do pylint more extensively on all python code.

Show
Andy Salnikov added a comment - I think I better limit this ticket to configuration of qserv on multiple hosts and leave integration tests with this setup to a new ticket. I have pushed changes to my private branch, please review when you have a minute. It should be possible to install/configure it following the instructions in previous comment. Everything should start now in multi-node setup. In standard "mono" setup things should work as before, all integration tests succeed for me. The number of changes is not very large and many of them are small fixes not actually related to this ticket but they are fixes to the issues discovered by pylint. Some issues are real problems, we should do pylint more extensively on all python code.
Hide
Jacek Becla added a comment -

Andy, it is going in a nice direction.

I agree dealing with the integration tests should be separate. Can you open a ticket for that?

I think administrators would not like us if we make them edit a file on 300 worker machines... I am alluding to this step:

 qserv-configure.py -r /path/to/run/dir -p #edit /path/to/run/dir/qserv.conf qserv-configure.py -r /path/to/run/dir

Can we do something like:

 qserv-configure.py -r /path/to/run/dir --type worker|master --masterhost = theHostName

Since you mentioned lsst-dev01... let's fix that, it shows up in 3 places:

• ./core/modules/util/xrootd.cc
• ./core/modules/czar/lsst/qserv/czar/config.py

And finally, I made some cosmetic tweaks, see u/jbecla/DM-595, (I didn't test them though), feel free to take them, or just ignore, it is all minor

Thanks!

Show
Jacek Becla added a comment - Andy, it is going in a nice direction. I agree dealing with the integration tests should be separate. Can you open a ticket for that? I think administrators would not like us if we make them edit a file on 300 worker machines... I am alluding to this step: qserv-configure.py -r /path/to/run/dir -p #edit /path/to/run/dir/qserv.conf qserv-configure.py -r /path/to/run/dir Can we do something like: qserv-configure.py -r /path/to/run/dir --type worker|master --masterhost = theHostName Since you mentioned lsst-dev01... let's fix that, it shows up in 3 places: ./core/modules/util/xrootd.cc ./core/modules/czar/lsst/qserv/czar/config.py ./admin/templates/server/etc/local.qserv.cnf And finally, I made some cosmetic tweaks, see u/jbecla/ DM-595 , (I didn't test them though), feel free to take them, or just ignore, it is all minor Thanks!
Hide
Andy Salnikov added a comment -

HI JAcek,

I made a new ticket DM-998 for integration tests in multi-node setup.

We should indeed improve configuration, I agree that editing config file is not going to do it. Once we start thinking how to do full installation it may become clearer what is really needed on that part.

I also replaced lsst-dev01 with localhost (I'm not sure if that stuff is used in our setup or maybe it needs some cleanup) and merged your changes.

Cheers,
Andy

Show
Andy Salnikov added a comment - HI JAcek, I made a new ticket DM-998 for integration tests in multi-node setup. We should indeed improve configuration, I agree that editing config file is not going to do it. Once we start thinking how to do full installation it may become clearer what is really needed on that part. I also replaced lsst-dev01 with localhost (I'm not sure if that stuff is used in our setup or maybe it needs some cleanup) and merged your changes. Cheers, Andy

#### People

Assignee:
Andy Salnikov
Reporter:
Fritz Mueller
Reviewers:
Jacek Becla
Watchers:
Andy Salnikov, Fabrice Jammes, Jacek Becla, Robyn Allsman [X] (Inactive)