Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-627

Switch to using new partitioner, loader

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None

      Description

      Integrated tests procedure has to rely on new loader

        Attachments

          Issue Links

            Activity

            Hide
            jammes Fabrice Jammes added a comment -
            • dataloader option to keep unzipped data on disk
            • dataloader create emptyChunks dir if not exists
            Show
            jammes Fabrice Jammes added a comment - dataloader option to keep unzipped data on disk dataloader create emptyChunks dir if not exists
            Hide
            jammes Fabrice Jammes added a comment - - edited

            Hi Andy Salnikov,

            Happy new year.

            FYI, i've refactored most of the integration test code so that is gets more simple.

            Code is both in qserv and qserv_testdata repositories.

            I've used this good practice to configure python logging:
            http://victorlin.me/posts/2012/08/26/good-logging-practice-in-python
            It's very powerful because all log levels for each modules can be easily configured via a single yaml configuration file. It would allow to remove all the code of --verbose-all and -vvv options in the loader, and also all logging configuration code from other Qserv python tools, which would lead to simpler code and add some standardization in our python logging management. For now I've let both techniques live aside.

            Show
            jammes Fabrice Jammes added a comment - - edited Hi Andy Salnikov , Happy new year. FYI, i've refactored most of the integration test code so that is gets more simple. Code is both in qserv and qserv_testdata repositories. I've used this good practice to configure python logging: http://victorlin.me/posts/2012/08/26/good-logging-practice-in-python It's very powerful because all log levels for each modules can be easily configured via a single yaml configuration file. It would allow to remove all the code of --verbose-all and -vvv options in the loader, and also all logging configuration code from other Qserv python tools, which would lead to simpler code and add some standardization in our python logging management. For now I've let both techniques live aside.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Fabrice,

            comments on qserv commits are all in github, here are comments for qserv_testdata, comments are for both tickets on the branch (DM-1658 and this):

            bin/qserv-test-head.sh:

            • hard-coded location for QSERV_RUN_DIR, there should be a command-line option to change current default value
            • line 79:

              if [ -r "${EUPS_DIR}" -a -x "${EUPS_DIR}" ]; then
                  . "${EUPS_DIR}/bin/setups.sh"

              why not

              if [ -f "${EUPS_DIR}/bin/setups.sh" ]; then
                  . "${EUPS_DIR}/bin/setups.sh"

            • line 87: again, I'd replace it with

              if [ ! -f "${QSERV_DIR}/SConstruct" ]; then

            • line 98: killall mysqld mysql-proxy xrootd java python ...
              • this is very unfriendly thing to do Imagine I run some Java or Python application which is not qserv. Please fix this to kill only the things that need to be killed (this should be true for all services)
            • line 119: proper cleanup should be done even if qserv-test-integration.py fails, with set -e the script will stop immediately on any error. If you do cleanup then you probably do not need killall
            • line 122: avoid making those symlinks in current directory, if you have not removed this yet
            • if you use printf instead of echo do not forget to add \n at the end (and echo is OK in most cases, printf is needed only when you want format numbers or insert special characters)
            • line 126: /n should be \n, but that line should not be there

            ... To be continued ...

            Show
            salnikov Andy Salnikov added a comment - - edited Fabrice, comments on qserv commits are all in github, here are comments for qserv_testdata, comments are for both tickets on the branch ( DM-1658 and this): bin/qserv-test-head.sh: hard-coded location for QSERV_RUN_DIR, there should be a command-line option to change current default value line 79: if [ -r "${EUPS_DIR}" -a -x "${EUPS_DIR}" ]; then . "${EUPS_DIR}/bin/setups.sh" why not if [ -f "${EUPS_DIR}/bin/setups.sh" ]; then . "${EUPS_DIR}/bin/setups.sh" line 87: again, I'd replace it with if [ ! -f "${QSERV_DIR}/SConstruct" ]; then line 98: killall mysqld mysql-proxy xrootd java python ... this is very unfriendly thing to do Imagine I run some Java or Python application which is not qserv. Please fix this to kill only the things that need to be killed (this should be true for all services) line 119: proper cleanup should be done even if qserv-test-integration.py fails, with set -e the script will stop immediately on any error. If you do cleanup then you probably do not need killall line 122: avoid making those symlinks in current directory, if you have not removed this yet if you use printf instead of echo do not forget to add \n at the end (and echo is OK in most cases, printf is needed only when you want format numbers or insert special characters) line 126: /n should be \n , but that line should not be there ... To be continued ...
            Hide
            salnikov Andy Salnikov added a comment -

            Continuing review for qserv_testdata:

            • our coding guidelines say that python module names should be camelCase (mysqlLoader.py instead of mysqlloader.py), probably not worth fixing it now, but something to think about for the future

            bin/qserv-check-integration.py:

            • lines 1-2: shebang line repeated twice
            • line 33: tarfile import is unused
            • line 36: common import is unused
            • line 80: __name__ is "__main__" when you run it as a script, I'm not sure it's the best option for logger name, I'd probably just use root logger
            • line 103: return_code is always 0 here, should it be non-zero for failed tests?
            • some variable names have underscores, I believe DM coding guidelines are to use camelCase for regular names

            bin/qserv-check-integration.py:

            • unused imports logging, os, commons

            bin/qserv-testunit.py

            • lsst.qserv.tests.testqservdataloader does not exist?
            • some comments/docstring would be handy, plus standard copyright header

            tests/benchmark.py:

            • copyright string needs an update
            • line 60: regexp string needs r prefix or doubling of all backslashes
            • line 124: format string has 2 formatting codes but there are 4 arguments to format()
            • in general logging functions (especially debug) should use lazy interpolation (debug("message %s", message) instead of debug("message {0}".format(message)) or debug("message %s" % message)) to save some CPU cycles

            tests/mysqlloader.py:

            • line 66: commons.run_command() does not return anything, out will be None

            tests/qservloader.py:

            • unused imports logging, const

            tests/testqservloader.py:

            • line 69: QservLoader constructor only takes 4 arguments
            • line 76, 80: QservLoader has no connectAndInitDatabase() or alterTable() methods

            description.yaml (for all 3 cases):

            • these should be merged with common.cfg once we have everything in yaml
            • some information in description.yaml is redundant, the same info exists (or should be added) in Table.cfg files, we should have it only in one place, not two. This applies to tables.partitioned-tables, tables.views, tables.directors (for that we need DM-1720 done). I hope we can deduce table loading order from other information as well.
            • we should document all configuration parameters used by partitioner and data loader(s), or maybe even better, have some form of schema for that so we can discover errors in config files (right now there is no checks that parameter names are sane)
            Show
            salnikov Andy Salnikov added a comment - Continuing review for qserv_testdata: our coding guidelines say that python module names should be camelCase (mysqlLoader.py instead of mysqlloader.py), probably not worth fixing it now, but something to think about for the future bin/qserv-check-integration.py: lines 1-2: shebang line repeated twice line 33: tarfile import is unused line 36: common import is unused line 80: __name__ is "__main__" when you run it as a script, I'm not sure it's the best option for logger name, I'd probably just use root logger line 103: return_code is always 0 here, should it be non-zero for failed tests? some variable names have underscores, I believe DM coding guidelines are to use camelCase for regular names bin/qserv-check-integration.py: unused imports logging, os, commons bin/qserv-testunit.py lsst.qserv.tests.testqservdataloader does not exist? some comments/docstring would be handy, plus standard copyright header tests/benchmark.py: copyright string needs an update line 60: regexp string needs r prefix or doubling of all backslashes line 124: format string has 2 formatting codes but there are 4 arguments to format() in general logging functions (especially debug) should use lazy interpolation ( debug("message %s", message) instead of debug("message {0}".format(message)) or debug("message %s" % message) ) to save some CPU cycles tests/mysqlloader.py: line 66: commons.run_command() does not return anything, out will be None tests/qservloader.py: unused imports logging, const tests/testqservloader.py: line 69: QservLoader constructor only takes 4 arguments line 76, 80: QservLoader has no connectAndInitDatabase() or alterTable() methods description.yaml (for all 3 cases): these should be merged with common.cfg once we have everything in yaml some information in description.yaml is redundant, the same info exists (or should be added) in Table.cfg files, we should have it only in one place, not two. This applies to tables.partitioned-tables , tables.views , tables.directors (for that we need DM-1720 done). I hope we can deduce table loading order from other information as well. we should document all configuration parameters used by partitioner and data loader(s), or maybe even better, have some form of schema for that so we can discover errors in config files (right now there is no checks that parameter names are sane)
            Hide
            jammes Fabrice Jammes added a comment -

            Hi Andy Salnikov

            I've taken in account all your comments, but them related to yaml, and pushed them to u/fjammes/DM-627.
            Jacek Becla want to talk tomorrow about yaml use in Qserv. For now do you agree if I move logging configuration to basic .ini file and let description.yaml as it is in order to close this ticket?

            Thanks,

            Fabrice

            Show
            jammes Fabrice Jammes added a comment - Hi Andy Salnikov I've taken in account all your comments, but them related to yaml, and pushed them to u/fjammes/ DM-627 . Jacek Becla want to talk tomorrow about yaml use in Qserv. For now do you agree if I move logging configuration to basic .ini file and let description.yaml as it is in order to close this ticket? Thanks, Fabrice
            Hide
            salnikov Andy Salnikov added a comment -

            Hi Fabrice,

            if you can avoid importing yaml in qserv for now then it should be OK to merge. For qserv_testdata it's OK to keep yaml config files for now (I presume that people who need to run integration tests all have anaconda in stack) but check other issues in my comments above.

            Andy

            Show
            salnikov Andy Salnikov added a comment - Hi Fabrice, if you can avoid importing yaml in qserv for now then it should be OK to merge. For qserv_testdata it's OK to keep yaml config files for now (I presume that people who need to run integration tests all have anaconda in stack) but check other issues in my comments above. Andy
            Hide
            jammes Fabrice Jammes added a comment -

            Hi Andy Salnikov,

            I've checked all the issues, it required some significant work but I think it is worth it. Thanks for helping me to improve the code quality.
            I've done what was decided with yaml.
            Please let me know if we can close this ticket.

            FYI

            qserv-test-head.sh -R ~/qserv-run/test

            pass successfully.

            Cheers,

            Fabrice

            Show
            jammes Fabrice Jammes added a comment - Hi Andy Salnikov , I've checked all the issues, it required some significant work but I think it is worth it. Thanks for helping me to improve the code quality. I've done what was decided with yaml. Please let me know if we can close this ticket. FYI qserv- test - head .sh -R ~ /qserv-run/test pass successfully. Cheers, Fabrice
            Hide
            salnikov Andy Salnikov added a comment -

            OK to merge.

            Show
            salnikov Andy Salnikov added a comment - OK to merge.

              People

              • Assignee:
                jammes Fabrice Jammes
                Reporter:
                fritzm Fritz Mueller
                Reviewers:
                Andy Salnikov
                Watchers:
                Andy Salnikov, Fabrice Jammes, Jacek Becla
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel