Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-8150

Allow temporary directory configuration in the container-based Qserv deployments

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:

      Description

      The problem

      The data loader script qserv-data-loader.py fails to load large tables into Qserv. This was observed when loading a large 14 GB table 'Science_Ccd_Exposure_Metadata' into 'qserv-master01' from 'qserv-db01' using:

      % qserv-data-loader.py \
      --verbose --verbose --verbose --verbose-all \
      --host=lsst-qserv-master01 --port=5012 \
      --secret=/home/gapon/production/stripe82/wmgr.secret \
      --no-css --worker=lsst-qserv-master01 \
      --config=/datasets/gapon/production/stripe82_catalog_load/production_load/common-non-part.cfg \
      sdss_stripe82_00 Science_Ccd_Exposure_Metadata \
      /datasets/gapon/catalogs/gapon_SDRP_Stripe82/Science_Ccd_Exposure_Metadata.sql \ /datasets/gapon/catalogs/gapon_SDRP_Stripe82/Science_Ccd_Exposure_Metadata.tsv \
      >& logs/Science_Ccd_Exposure_Metadata.log
      

      The relevant message at a log file of the wmgr service used by the loader has:

      % sudo tail -200 /qserv/log/qserv-wmgr.log
      ...
      2016-10-30 14:28:19,831 [PID:442] [ERROR] (log_exception() at app.py:1423) wmgr: Exception on /dbs/sdss_stripe82_00/tables/Science_Ccd_Exposure_Metadata/data [POST]
      Traceback (most recent call last):
        File "/qserv/stack/Linux64/flask/0.10.1.lsst2+1/lib/python/Flask-0.10.1-py2.7.egg/flask/app.py", line 1817, in wsgi_app
          response = self.full_dispatch_request()
      ...
        File "/qserv/stack/Linux64/flask/0.10.1.lsst2+1/lib/python/Werkzeug-0.11.10-py2.7.egg/werkzeug/formparser.py", line 521, in <genexpr>
          form = (p[1] for p in formstream if p[0] == 'form')
        File "/qserv/stack/Linux64/flask/0.10.1.lsst2+1/lib/python/Werkzeug-0.11.10-py2.7.egg/werkzeug/formparser.py", line 497, in parse_parts
          _write(ell)
      IOError: [Errno 28] No space left on device
      2016-10-30 14:28:20,833 [PID:442] [INFO] (_log() at _internal.py:87) werkzeug: 141.142.181.132 - - [30/Oct/2016 14:28:20] "POST /dbs/sdss_stripe82_00/tables/Science_Ccd_Exposure_Metadata/data HTTP/1.1" 500 -
      

      The current implementation of the wmgr services creates temporary files at the following folder:

       /qserv/run/tmp
      

      Proposed solutions

      Extend the container deployment configuration to allow mapping /qserv/run/tmp inside container to some larger file system outside the container. Consider these options:

      • /qserv/tmp
        • this folder can be created on a large file system which is available in the present QServ deployments in IN2P3 and NCSA-PDAC
      • /tmp
        • this folder may not work in all deployments due to its relatively small size (depends on a configuration of the hosts)

        Attachments

          Activity

          Hide
          gapon Igor Gaponenko added a comment - - edited

          Testing

          Tested the new implementation in two ways using the NCSA PDAC cluster.

          /tmp

          The first step was to enable the new configuration option in file env.sh to point to the host's temporary folder:

          # Temporary directory location on docker host, optional
          HOST_TMP_DIR=/tmp
          

          Then restarted the containers and inspected their status by:

          docker inspect qserv
          ..
                  "Mounts": [
                      {
                          "Source": "/qserv/data",
                          "Destination": "/qserv/data",
                          "Mode": "",
                          "RW": true,
                          "Propagation": "rprivate"
                      },
                      {
                          "Source": "/qserv/log",
                          "Destination": "/qserv/run/var/log",
                          "Mode": "",
                          "RW": true,
                          "Propagation": "rprivate"
                      },
                      {
                          "Source": "/tmp",
                          "Destination": "/qserv/run/tmp",
                          "Mode": "",
                          "RW": true,
                          "Propagation": "rprivate"
                      }
          

          /qserv//tmp

          The first step was to enable the new configuration option in file env.sh to point to a dedicated temporary folder in the Qserv deployment area:

          # Temporary directory location on docker host, optional
          HOST_TMP_DIR=/qserv/tmp
          

          The folder was created and properly configured on all 30 worker nodes and the master node of the installation:

          % sudo ls -al /qserv/
          total 4
          drwxrwx---   5 qserv qserv   37 Nov 12 14:59 .
          dr-xr-xr-x. 20 root  root  4096 Oct 28 09:04 ..
          drwxr-xr-x   4 qserv qserv   30 Oct 25 16:11 data
          drwxr-xr-x   3 qserv qserv  154 Nov  9 17:53 log
          drwxr-xr-x   4 qserv qserv   35 Nov 12 15:01 tmp
          

          Then restarted the containers and inspected their status by:

          docker inspect qserv
          ..
                  "Mounts": [
                      {
                          "Source": "/qserv/data",
                          "Destination": "/qserv/data",
                          "Mode": "",
                          "RW": true,
                          "Propagation": "rprivate"
                      },
                      {
                          "Source": "/qserv/log",
                          "Destination": "/qserv/run/var/log",
                          "Mode": "",
                          "RW": true,
                          "Propagation": "rprivate"
                      },
                      {
                          "Source": "/qserv/tmp",
                          "Destination": "/qserv/run/tmp",
                          "Mode": "",
                          "RW": true,
                          "Propagation": "rprivate"
                      }
          

          Another check was to inspect and use the temporary folder from inside the running container:

          % docker exec -it qserv ls -al /qserv/run/tmp
           
          drwxr-xr-x 4 qserv qserv 101 Nov 12 21:01 configure
          drwx------ 6 qserv qserv  53 Nov 12 21:01 worker
           
          % docker exec -it qserv touch /qserv/run/tmp/a.txt
          % docker exec -it qserv ls -al /qserv/run/tmp
           
          -rw-r--r-- 1 qserv qserv   0 Nov 12 21:20 a.txt
          drwxr-xr-x 4 qserv qserv 101 Nov 12 21:01 configure
          drwx------ 6 qserv qserv  53 Nov 12 21:01 worker
           
          % sudo ls -al  /qserv/tmp/
          .
          -rw-r--r-- 1 qserv qserv   0 Nov 12 15:20 a.txt
          drwxr-xr-x 4 qserv qserv 101 Nov 12 15:01 configure
          drwx------ 6 qserv qserv  53 Nov 12 15:01 worker
           
          % docker exec -it qserv rm /qserv/run/tmp/a.txt
          

          Show
          gapon Igor Gaponenko added a comment - - edited Testing Tested the new implementation in two ways using the NCSA PDAC cluster. /tmp The first step was to enable the new configuration option in file env.sh to point to the host's temporary folder: # Temporary directory location on docker host, optional HOST_TMP_DIR=/tmp Then restarted the containers and inspected their status by: docker inspect qserv .. "Mounts" : [ { "Source" : "/qserv/data" , "Destination" : "/qserv/data" , "Mode" : "" , "RW" : true , "Propagation" : "rprivate" }, { "Source" : "/qserv/log" , "Destination" : "/qserv/run/var/log" , "Mode" : "" , "RW" : true , "Propagation" : "rprivate" }, { "Source" : "/tmp" , "Destination" : "/qserv/run/tmp" , "Mode" : "" , "RW" : true , "Propagation" : "rprivate" } /qserv//tmp The first step was to enable the new configuration option in file env.sh to point to a dedicated temporary folder in the Qserv deployment area: # Temporary directory location on docker host, optional HOST_TMP_DIR=/qserv/tmp The folder was created and properly configured on all 30 worker nodes and the master node of the installation: % sudo ls -al /qserv/ total 4 drwxrwx--- 5 qserv qserv 37 Nov 12 14:59 . dr-xr-xr-x. 20 root root 4096 Oct 28 09:04 .. drwxr-xr-x 4 qserv qserv 30 Oct 25 16:11 data drwxr-xr-x 3 qserv qserv 154 Nov 9 17:53 log drwxr-xr-x 4 qserv qserv 35 Nov 12 15:01 tmp Then restarted the containers and inspected their status by: docker inspect qserv .. "Mounts" : [ { "Source" : "/qserv/data" , "Destination" : "/qserv/data" , "Mode" : "" , "RW" : true , "Propagation" : "rprivate" }, { "Source" : "/qserv/log" , "Destination" : "/qserv/run/var/log" , "Mode" : "" , "RW" : true , "Propagation" : "rprivate" }, { "Source" : "/qserv/tmp" , "Destination" : "/qserv/run/tmp" , "Mode" : "" , "RW" : true , "Propagation" : "rprivate" } Another check was to inspect and use the temporary folder from inside the running container: % docker exec -it qserv ls -al /qserv/run/tmp   drwxr-xr-x 4 qserv qserv 101 Nov 12 21:01 configure drwx------ 6 qserv qserv 53 Nov 12 21:01 worker   % docker exec -it qserv touch /qserv/run/tmp/a.txt % docker exec -it qserv ls -al /qserv/run/tmp   -rw-r--r-- 1 qserv qserv 0 Nov 12 21:20 a.txt drwxr-xr-x 4 qserv qserv 101 Nov 12 21:01 configure drwx------ 6 qserv qserv 53 Nov 12 21:01 worker   % sudo ls -al /qserv/tmp/ . -rw-r--r-- 1 qserv qserv 0 Nov 12 15:20 a.txt drwxr-xr-x 4 qserv qserv 101 Nov 12 15:01 configure drwx------ 6 qserv qserv 53 Nov 12 15:01 worker   % docker exec -it qserv rm /qserv/run/tmp/a.txt
          Hide
          gapon Igor Gaponenko added a comment -

          HI Fabrice

          could you review my modifications please? Note that there is no need to build a new container. My changes are rather simple, and they only affect the way the Qserv containers start.

          Thanks,
          Igor

          Show
          gapon Igor Gaponenko added a comment - HI Fabrice could you review my modifications please? Note that there is no need to build a new container. My changes are rather simple, and they only affect the way the Qserv containers start. Thanks, Igor
          Hide
          jammes Fabrice Jammes added a comment -

          Good job which keep good flexibility on tmp directory management. Thanks

          Show
          jammes Fabrice Jammes added a comment - Good job which keep good flexibility on tmp directory management. Thanks
          Hide
          gapon Igor Gaponenko added a comment -

          Thank you for the review!

          Igor

          Show
          gapon Igor Gaponenko added a comment - Thank you for the review! Igor
          Hide
          gapon Igor Gaponenko added a comment -

          Changes have been merged with the master branch.

          Show
          gapon Igor Gaponenko added a comment - Changes have been merged with the master branch.

            People

            • Assignee:
              gapon Igor Gaponenko
              Reporter:
              jammes Fabrice Jammes
              Reviewers:
              Fabrice Jammes
              Watchers:
              Fabrice Jammes, Fritz Mueller, Igor Gaponenko, Jason Alt [X] (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Summary Panel