# Setup multi-node Qserv

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
6
• Sprint:
DB_S14_07
• Team:
Data Access and Database

#### Description

We are currently focusing on single-node Qserv. It'd be nice to try setting up multi-node Qserv (say 4 workers and a czar on lsst-dbdev*), and improve installation scripts to simplify the process.

#### Activity

Hide
Robyn Allsman [X] (Inactive) added a comment -

To: salnikov@slac.stanford.edu
Subject: [JIRA] (DM-595) Setting up multi-node Qserv
Sent: Tue, 29 Apr 2014 19:19:37 -0700

did not reach the following recipient(s):

salnikov@slac.stanford.edu on Tue, 29 Apr 2014 19:20:03 -0700
The e-mail address could not be found. Perhaps the recipient moved
to a different e-mail organization, or there was a mistake in the
<mail.lsst.org #5.1.8 smtp;553 5.1.8 <jira-dm@lsst.org>... Domain of
sender address jira-dm@lsst.org does not exist>

Show
Robyn Allsman [X] (Inactive) added a comment - Your message To: salnikov@slac.stanford.edu Subject: [JIRA] ( DM-595 ) Setting up multi-node Qserv Sent: Tue, 29 Apr 2014 19:19:37 -0700 did not reach the following recipient(s): salnikov@slac.stanford.edu on Tue, 29 Apr 2014 19:20:03 -0700 The e-mail address could not be found. Perhaps the recipient moved to a different e-mail organization, or there was a mistake in the address. Check the address and try again. <mail.lsst.org #5.1.8 smtp;553 5.1.8 <jira-dm@lsst.org>... Domain of sender address jira-dm@lsst.org does not exist>
Hide
Andy Salnikov added a comment -

This will be easier to do based on the work that Fabrice did in DM-622. For development/testing I'll try to extend qserv-configure.py script with relevant options. With minimal changes we can probably support single-worker setup but running on several hosts

1. czar + mysql-proxy (there is no real reason to run these two on separate hosts)
2. xrootd + worker
3. mysql server
(later on I'll extend this with more that one worker)

For now there is no mechanism to start processes remotely, it has to be done manually on each host.

Show
Andy Salnikov added a comment - This will be easier to do based on the work that Fabrice did in DM-622 . For development/testing I'll try to extend qserv-configure.py script with relevant options. With minimal changes we can probably support single-worker setup but running on several hosts czar + mysql-proxy (there is no real reason to run these two on separate hosts) xrootd + worker mysql server (later on I'll extend this with more that one worker) For now there is no mechanism to start processes remotely, it has to be done manually on each host.
Hide
Andy Salnikov added a comment - - edited

How do we configure our different servers (mostly stuff related to network connections). In blue color are the names of the parameters from qserv.conf file which determine the values.

config file: $RUN_DIR/etc/my-proxy.cnf interesting options: • mysql-proxy.proxy-address - address on which proxy listens (:4040), mysql_proxy.port • mysql-proxy.proxy-backend-addresses - backend mysql server address (127.0.0.1:13306), host name is hardcoded, mysqld.port • envvar QSERV_RPC_PORT - gives port number for qserv czar, qserv.rpc_port • czar host name is hard-coded in lua script as 127.0.0.1 #### Czar config file:$RUN_DIR/etc/local.qserv.cnf
defaults: core/modules/czar/lsst/qserv/czar/config.py

interesting options:

• frontend.port - port which czar is listening to (default: 7080), qserv.rpc_port
• frontend.xrootd - host and port of the xrootd server that czar will be talking to (default: lsst-dev01:1094) (default is not usable, should we use 127.0.0.1:1094?), qserv.master, xrootd.xrootd_port
• css.connection - CSS connection string (default: 127.0.0.1:12181), host name is hardcoded in template, zookeeper.port
• resultdb.unix_socket - socket for mysql server for result database (must refer to the same instance as mysql-proxy.proxy-backend-addresses, qserv.run_base_dir/...
• resultdb.host, resultdb.port - same server for result database but only using TCP connection, not defined

config file: $RUN_DIR/etc/zoo.cfg interesting options: • clientPort - port to listen to (12181), zookeeper.port • dataDir - directory for snapshot, qserv.run_base_dir/... #### xrootd config file:$RUN_DIR/etc/lsp.cf

interesting options:

• all.manager - host and port of manager instance, qserv.master, xrootd.cmsd_manager_port

#### mysql

config file: $RUN_DIR/etc/my.cnf interesting options: • mysqld.datadir, mysqld.data_dir • mysqld.socket, qserv.run_base_dir/... • mysqld.port, mysqld.port Show Andy Salnikov added a comment - - edited How do we configure our different servers (mostly stuff related to network connections). In blue color are the names of the parameters from qserv.conf file which determine the values. mysql-proxy config file:$RUN_DIR/etc/my-proxy.cnf interesting options: mysql-proxy.proxy-address - address on which proxy listens (:4040), mysql_proxy.port mysql-proxy.proxy-backend-addresses - backend mysql server address (127.0.0.1:13306), host name is hardcoded, mysqld.port envvar QSERV_RPC_PORT - gives port number for qserv czar, qserv.rpc_port czar host name is hard-coded in lua script as 127.0.0.1 Czar config file: $RUN_DIR/etc/local.qserv.cnf defaults: core/modules/czar/lsst/qserv/czar/config.py interesting options: frontend.port - port which czar is listening to (default: 7080 ), qserv.rpc_port frontend.xrootd - host and port of the xrootd server that czar will be talking to (default: lsst-dev01:1094 ) (default is not usable, should we use 127.0.0.1:1094?), qserv.master, xrootd.xrootd_port css.connection - CSS connection string (default: 127.0.0.1:12181 ), host name is hardcoded in template, zookeeper.port resultdb.unix_socket - socket for mysql server for result database (must refer to the same instance as mysql-proxy.proxy-backend-addresses , qserv.run_base_dir/... resultdb.host , resultdb.port - same server for result database but only using TCP connection, not defined zookeeper config file:$RUN_DIR/etc/zoo.cfg interesting options: clientPort - port to listen to (12181), zookeeper.port dataDir - directory for snapshot, qserv.run_base_dir/... xrootd config file: $RUN_DIR/etc/lsp.cf interesting options: all.manager - host and port of manager instance, qserv.master, xrootd.cmsd_manager_port mysql config file:$RUN_DIR/etc/my.cnf interesting options: mysqld.datadir , mysqld.data_dir mysqld.socket , qserv.run_base_dir/... mysqld.port , mysqld.port
Hide
Andy Salnikov added a comment -

#### Mono option

As far as I can tell "mono" option (node_type=mono) right now only affects xrootd configuration. When node_type is set to mono then init.d/xrootd does not start cmsd and cmsd options are commented out in lsp.cf. If node_type is set to anything else then cmsd is started and xrootd on qserv "master" node is configured as a manager.

#### Possible configurations to test

1. multi-node-mono - this is almost like our regular mono but different services are running on different hosts. Can be used as a first step to check that configuration works OK for multiple nodes and also to test how to distribute stuff across many nodes. Makes sense to run worker stuff on one node (xrootd + worker mysqld) and everything else on master node.
2. multi-node - running with multiple workers on different nodes (potentially workers can share the node). Need to run xrootd in cluster configuration.
3. May also try to distribute master stuff across multiple nodes, e.g. run zookeeper on dedicated machine.

#### Code distribution

If we do not have shared filesystem (and this is true for lsst-dbdev* machines) then we need to distribute code and possibly some data between all machines. Setting up somethig automated could be a significant project. For now simplest thing that we could do is probably a manual distribution based on rsync/ssh. It's easy to write a simple script which replicates specified directory on any number of machines (or borrow someone else's script). This should work OK for software (stack) that we build for ourselves.

Each machine that runs a part of the cluster also needs a run directory. This should probably go to local filesystem even if shared filesystem is available as it will contain data and performance/latency is critical. WE do not need to replicate run directories but there may be two options for setting up run directories on each machine:

1. set it up on one machine and then replicate once on all other hosts
2. set it up individually on each host

For my tests I think latter option may be easier but for the large-scale installation replication may work better (especially if stack is also need to be replicated).

Show
Andy Salnikov added a comment - Mono option As far as I can tell "mono" option ( node_type=mono ) right now only affects xrootd configuration. When node_type is set to mono then init.d/xrootd does not start cmsd and cmsd options are commented out in lsp.cf . If node_type is set to anything else then cmsd is started and xrootd on qserv "master" node is configured as a manager. Possible configurations to test multi-node-mono - this is almost like our regular mono but different services are running on different hosts. Can be used as a first step to check that configuration works OK for multiple nodes and also to test how to distribute stuff across many nodes. Makes sense to run worker stuff on one node (xrootd + worker mysqld) and everything else on master node. multi-node - running with multiple workers on different nodes (potentially workers can share the node). Need to run xrootd in cluster configuration. May also try to distribute master stuff across multiple nodes, e.g. run zookeeper on dedicated machine. Code distribution If we do not have shared filesystem (and this is true for lsst-dbdev* machines) then we need to distribute code and possibly some data between all machines. Setting up somethig automated could be a significant project. For now simplest thing that we could do is probably a manual distribution based on rsync/ssh. It's easy to write a simple script which replicates specified directory on any number of machines (or borrow someone else's script). This should work OK for software (stack) that we build for ourselves. Each machine that runs a part of the cluster also needs a run directory. This should probably go to local filesystem even if shared filesystem is available as it will contain data and performance/latency is critical. WE do not need to replicate run directories but there may be two options for setting up run directories on each machine: set it up on one machine and then replicate once on all other hosts set it up individually on each host For my tests I think latter option may be easier but for the large-scale installation replication may work better (especially if stack is also need to be replicated).
Hide
Andy Salnikov added a comment -

#### multi-node-mono setup

Very simple setup:

• xrootd and worker mysqld running on one worker machine
• mysql-proxy, result mysqld, zookeeper, and czar running on another machine
• there is only one xrootd, no need to run cmsd

czar is actually configured to connect to xrootd in the same master host, there are two options to work with that:

1. start xrootd/cmsd on master machine
2. change configuration to support non-master host for xrootd

I have added new option xrootd_host to qserv.conf, and modified admin/python/lsst/qserv/admin/configure.py to use that option or fallback to qserv.master option as before.

• qserv.master option in qserv.conf was set to lsst-dbdev2.ncsa.illinois.edu and xrootd.xrootd_port set to lsst-dbdev4.ncsa.illinois.edu.
• Built and installed qserv in stack with this modifications, small script admin/bin/dreplicate.sh is used to replicate whole stack to other machines.
• Ran qserv-configure.py on both lsst-dbdev2 and lsst-dbdev4 (small fix is needed in qserv-configure.py, it fails if /.lsst/qserv.conf does not exist) to install run directory in /qserv-run/u.salnikov.DM-595 (this is local directory on each host).
• Started mysql and xrootd servers on lsst-dbdev4:

 [salnikov@lsst-dbdev4 ~]$~/qserv-run/u.salnikov.DM-595/etc/init.d/mysqld start Starting MySQL. [ OK ] [salnikov@lsst-dbdev4 ~]$ ~/qserv-run/u.salnikov.DM-595/etc/init.d/xrootd start Starting xrootd. [ OK ]

• started everything else on lsst-dbdev2:

 $for svc in mysqld zookeeper mysql-proxy qserv-czar; do ~/qserv-run/u.salnikov.DM-595/etc/init.d/$svc start; done Starting MySQL. [ OK ] Starting ZooKeeper daemon (zookeeper): JMX enabled by default Using config: /usr/local/home/salnikov/qserv-run/u.salnikov.DM-595/etc/zoo.cfg Starting zookeeper ... STARTED   Starting mysql-proxy. [ OK ] Starting qserv-czar [ OK ]

Running our usual integration test on this setup is going to be interesting, the loader needs to be able to load data into worker mysql, likely loader needs some new logic in it.

Some issues that are obvious even without running tests:

• qserv-configure.py uses qserv.conf from the install area (stack). If we need to modify config file (I think it is expected that we provide different configs for master and workers) then it cannot be done in install area which is read-only by convention. We need to be able to copy config file and use the copy instead.
• qserv-start.py starts all possible services, this needs to depend on the node type
• qserv.conf mentions only three node types - mono, master, or worker. Maybe we need more flexibility. I could imagine that we may want to run zookeeper (or any other CSS service) on separate machine.
Show
Andy Salnikov added a comment - multi-node-mono setup Very simple setup: xrootd and worker mysqld running on one worker machine mysql-proxy, result mysqld, zookeeper, and czar running on another machine there is only one xrootd, no need to run cmsd czar is actually configured to connect to xrootd in the same master host, there are two options to work with that: start xrootd/cmsd on master machine change configuration to support non-master host for xrootd I have added new option xrootd_host to qserv.conf, and modified admin/python/lsst/qserv/admin/configure.py to use that option or fallback to qserv.master option as before. qserv.master option in qserv.conf was set to lsst-dbdev2.ncsa.illinois.edu and xrootd.xrootd_port set to lsst-dbdev4.ncsa.illinois.edu. Built and installed qserv in stack with this modifications, small script admin/bin/dreplicate.sh is used to replicate whole stack to other machines. Ran qserv-configure.py on both lsst-dbdev2 and lsst-dbdev4 (small fix is needed in qserv-configure.py , it fails if /.lsst/qserv.conf does not exist) to install run directory in /qserv-run/u.salnikov. DM-595 (this is local directory on each host). Started mysql and xrootd servers on lsst-dbdev4 : [salnikov@lsst-dbdev4 ~]$~/qserv-run/u.salnikov.DM-595/etc/init.d/mysqld start Starting MySQL. [ OK ] [salnikov@lsst-dbdev4 ~]$ ~/qserv-run/u.salnikov.DM-595/etc/init.d/xrootd start Starting xrootd. [ OK ] started everything else on lsst-dbdev2 : $for svc in mysqld zookeeper mysql-proxy qserv-czar; do ~/qserv-run/u.salnikov.DM-595/etc/init.d/$svc start; done Starting MySQL. [ OK ] Starting ZooKeeper daemon (zookeeper): JMX enabled by default Using config: /usr/local/home/salnikov/qserv-run/u.salnikov.DM-595/etc/zoo.cfg Starting zookeeper ... STARTED   Starting mysql-proxy. [ OK ] Starting qserv-czar [ OK ] Running our usual integration test on this setup is going to be interesting, the loader needs to be able to load data into worker mysql, likely loader needs some new logic in it. Some issues that are obvious even without running tests: qserv-configure.py uses qserv.conf from the install area (stack). If we need to modify config file (I think it is expected that we provide different configs for master and workers) then it cannot be done in install area which is read-only by convention. We need to be able to copy config file and use the copy instead. qserv-start.py starts all possible services, this needs to depend on the node type qserv.conf mentions only three node types - mono, master, or worker. Maybe we need more flexibility. I could imagine that we may want to run zookeeper (or any other CSS service) on separate machine.
Hide
Andy Salnikov added a comment -

Regarding qserv-configure.py use of qserv.conf here is what I learned:

• qserv-configure.py has a hidden -C option which can be used to specify alternative location for qserv.conf
• default location for qserv.conf is not an install directory but run directory, the script first copies all files there and then uses them for configuration.

Potentially creating run directory can be done in two steps (instead of one step with -a option):

 % qserv-configure.py -r /path/to/run/dir -p # edit /path/to/run/dir/qserv.conf and/or templates if needed % qserv-configure.py -r /path/to/run/dir

Show
Andy Salnikov added a comment - Regarding qserv-configure.py use of qserv.conf here is what I learned: qserv-configure.py has a hidden -C option which can be used to specify alternative location for qserv.conf default location for qserv.conf is not an install directory but run directory, the script first copies all files there and then uses them for configuration. Potentially creating run directory can be done in two steps (instead of one step with -a option): % qserv-configure.py -r /path/to/run/dir -p # edit /path/to/run/dir/qserv.conf and/or templates if needed % qserv-configure.py -r /path/to/run/dir
Hide
Andy Salnikov added a comment - - edited

Somewhat easier and unified way for multi-node setup. This will run xrootd on master (plus cmsd on all machines) even in the case of single worker, but this is OK and single-node worker is not the most interesting case so we can afford tiny overhead for xrootd cluster manager for the sake of uniformity.

1. Build/install qserv as one normally would
2. distribute the software on each machine in the cluster (assume it is installed in the same location now)
• there is simple helper script which runs rsync (admin/bin/dreplicate.sh)
• dreplicate.sh /path/to/stack host1 host2 host3
3. go to master host:
• setup qserv, make fresh run directory, edit config file, fill run directory:

 % setup qserv $VERSION % qserv-configure.py -r /path/to/run/dir -p % vi /path/to/run/dir/qserv.conf   -- change "node_type" to "master", and "master" to master host name % qserv-configure.py -r /path/to/run/dir % /path/to/run/dir/bin/qserv-start.sh 4. go to worker host(s): • setup qserv, make fresh run directory, edit config file, fill run directory:  % setup qserv$VERSION % qserv-configure.py -r /path/to/run/dir -p % vi /path/to/run/dir/qserv.conf   -- change "node_type" to "worker", and "master" to master host name % qserv-configure.py -r /path/to/run/dir % /path/to/run/dir/bin/qserv-start.sh

There are small complications with that - cmsd pid file name is different for the manager cmsd instance which confuses our init.d scripts, will need to fix that. Same applies to the xrootd/cmsd log file name (it does not prevent them from running, just when it fails to start init.d script prints "incorrect" log file name).

Show
Andy Salnikov added a comment - - edited Somewhat easier and unified way for multi-node setup. This will run xrootd on master (plus cmsd on all machines) even in the case of single worker, but this is OK and single-node worker is not the most interesting case so we can afford tiny overhead for xrootd cluster manager for the sake of uniformity. Build/install qserv as one normally would distribute the software on each machine in the cluster (assume it is installed in the same location now) there is simple helper script which runs rsync ( admin/bin/dreplicate.sh ) dreplicate.sh /path/to/stack host1 host2 host3 go to master host: setup qserv, make fresh run directory, edit config file, fill run directory: % setup qserv $VERSION % qserv-configure.py -r /path/to/run/dir -p % vi /path/to/run/dir/qserv.conf -- change "node_type" to "master", and "master" to master host name % qserv-configure.py -r /path/to/run/dir % /path/to/run/dir/bin/qserv-start.sh go to worker host(s): setup qserv, make fresh run directory, edit config file, fill run directory: % setup qserv$VERSION % qserv-configure.py -r /path/to/run/dir -p % vi /path/to/run/dir/qserv.conf -- change "node_type" to "worker", and "master" to master host name % qserv-configure.py -r /path/to/run/dir % /path/to/run/dir/bin/qserv-start.sh There are small complications with that - cmsd pid file name is different for the manager cmsd instance which confuses our init.d scripts, will need to fix that. Same applies to the xrootd/cmsd log file name (it does not prevent them from running, just when it fails to start init.d script prints "incorrect" log file name).
Hide
Andy Salnikov added a comment -

I think I better limit this ticket to configuration of qserv on multiple hosts and leave integration tests with this setup to a new ticket. I have pushed changes to my private branch, please review when you have a minute. It should be possible to install/configure it following the instructions in previous comment. Everything should start now in multi-node setup. In standard "mono" setup things should work as before, all integration tests succeed for me.

The number of changes is not very large and many of them are small fixes not actually related to this ticket but they are fixes to the issues discovered by pylint. Some issues are real problems, we should do pylint more extensively on all python code.

Show
Andy Salnikov added a comment - I think I better limit this ticket to configuration of qserv on multiple hosts and leave integration tests with this setup to a new ticket. I have pushed changes to my private branch, please review when you have a minute. It should be possible to install/configure it following the instructions in previous comment. Everything should start now in multi-node setup. In standard "mono" setup things should work as before, all integration tests succeed for me. The number of changes is not very large and many of them are small fixes not actually related to this ticket but they are fixes to the issues discovered by pylint. Some issues are real problems, we should do pylint more extensively on all python code.
Hide
Jacek Becla added a comment -

Andy, it is going in a nice direction.

I agree dealing with the integration tests should be separate. Can you open a ticket for that?

I think administrators would not like us if we make them edit a file on 300 worker machines... I am alluding to this step:

 qserv-configure.py -r /path/to/run/dir -p #edit /path/to/run/dir/qserv.conf  qserv-configure.py -r /path/to/run/dir

Can we do something like:

 qserv-configure.py -r /path/to/run/dir --type worker|master --masterhost = theHostName

Since you mentioned lsst-dev01... let's fix that, it shows up in 3 places:

• ./core/modules/util/xrootd.cc
• ./core/modules/czar/lsst/qserv/czar/config.py

And finally, I made some cosmetic tweaks, see u/jbecla/DM-595, (I didn't test them though), feel free to take them, or just ignore, it is all minor

Thanks!

Show
Jacek Becla added a comment - Andy, it is going in a nice direction. I agree dealing with the integration tests should be separate. Can you open a ticket for that? I think administrators would not like us if we make them edit a file on 300 worker machines... I am alluding to this step: qserv-configure.py -r /path/to/run/dir -p #edit /path/to/run/dir/qserv.conf qserv-configure.py -r /path/to/run/dir Can we do something like: qserv-configure.py -r /path/to/run/dir --type worker|master --masterhost = theHostName Since you mentioned lsst-dev01... let's fix that, it shows up in 3 places: ./core/modules/util/xrootd.cc ./core/modules/czar/lsst/qserv/czar/config.py ./admin/templates/server/etc/local.qserv.cnf And finally, I made some cosmetic tweaks, see u/jbecla/ DM-595 , (I didn't test them though), feel free to take them, or just ignore, it is all minor Thanks!
Hide
Andy Salnikov added a comment -

HI JAcek,

I made a new ticket DM-998 for integration tests in multi-node setup.

We should indeed improve configuration, I agree that editing config file is not going to do it. Once we start thinking how to do full installation it may become clearer what is really needed on that part.

I also replaced lsst-dev01 with localhost (I'm not sure if that stuff is used in our setup or maybe it needs some cleanup) and merged your changes.

Cheers,
Andy

Show
Andy Salnikov added a comment - HI JAcek, I made a new ticket DM-998 for integration tests in multi-node setup. We should indeed improve configuration, I agree that editing config file is not going to do it. Once we start thinking how to do full installation it may become clearer what is really needed on that part. I also replaced lsst-dev01 with localhost (I'm not sure if that stuff is used in our setup or maybe it needs some cleanup) and merged your changes. Cheers, Andy

#### People

Assignee:
Andy Salnikov
Reporter:
Fritz Mueller
Reviewers:
Jacek Becla
Watchers:
Andy Salnikov, Fabrice Jammes, Jacek Becla, Robyn Allsman [X] (Inactive)