Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-13303

Dynamic reloading of available resources in Qserv workers

    Details

    • Type: Improvement
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None

      Description

      Improve a worker plugin with a mechanism which would allow dynamically reload a collection of available chunks. This will require adding a hander for requests to worker-specific resources. The mechanism will be used by the replication system to modify worker services on changes in a disposition of chunks available to them.

        Attachments

          Issue Links

            Activity

            Hide
            gapon Igor Gaponenko added a comment - - edited

            John Gates

            Summary of modifications

            Most modifications are in module wpublish

            Moved two (earlier developed) classes which implement worker management commands from module wcontrol to wpublish

            Changes in the worker database schema

            These were needed because the worker management code is now required to modify certain tabes in the database

            • allowing user qsmaster to modify metadata tables in database qservw_worker
            • added schema migration script (SQL) for the schema change

            The relevant files are:

            admin/templates/configuration/tmp/configure/sql/qserv-worker.sql
            core/modules/wdb/schema/migrate-1-to-2.sql
            

            fixes in the wmgr service

            The changes were to move away from using MySQL root account to the existing account qsmaster which now has extended privileges allowing to modify metadata tables in database qservw_worker. The following files were changed:

            core/modules/wmgr/python/dbMgr.py
            core/modules/wmgr/python/xrdMgr.py
            

            Extended/changed protobuf definitions

            These were needed to support new worker management commands. Minor refactoring of some messages.

            New worker management operations and classes

            • a command to rebuild the chunk lists database table qservw_worker.Chunks from scratch using an actual collection of partitioned tables found in databases mentioned in table qservw_worker.Dbs
            • a command to add a group of chunks (a chunk with the same number across collocated databases).
            • a command to remove a group of chunks (a chunk with the same number across collocated databases)
            • a command to return a list of all known (to a worker) chunks along with a counter indicating how many active XRootD/SSI requests exists for each chunk

            For ecah of those operation there are two classes:

            • <Operation>QservRequest: representing the operation on the client-side (the Replication system, etc.)
            • <Operation>WorkerCommand: implementing the operation within Qserv worker

            New command line tools

            The tools (written in C++) are meant to interact with worker servies:

            • qserv-worker-notify: for invoking all worker management operations
            • qserv-worker-perf: for performance studies of the management protocol

            The tools required to make changes to file wpublish/SConscript. Also two utility classes were added:

            • util/CmdParser: the simple (and convenient to use) command line parser for C++ applications
            • util/BlockPost: a thread-safe class for introducing arbitrary (within a configurable interval of milliseconds) delays in C++ applications

            Extended interface and implementation of class ChunkInventory

            The modifications were introduced in order to implement the above mentioned worker management operations. In particular, added methods add and remove for adding/removing chunks. These operations would also remove the corresponding entries in metadata table Chunks (mentioned earlier). Also added two trivial exception classes for error reporting in case of database operations failures.

            Trivial extensions in class global/ResourceUnit

            The changes were needed to build resource strings

            New class wpublish/ResourceMonitor

            This class (effectively - a singleton) is managed by class SsiRequest. Its main purpose is to track which chunk resources and how many of those are in use at any given moment of time. This information is passed back to the Replication system in a request returning a list of known chunks, thus allowing the replication system to make informed decision on the "garbage collecting" of removed chunks. The monitor is also use in the implementation of the work operation which removes chunks from the chunk inventory to prevent (if needed) removing those chunks which are in use.

            Change in the XRootD configuration template

            The following file had to be changed to allow manipulations of the XRootD resource cache when adding/removing chunks to workers:

            admin/templates/configuration/etc/lsp.cf
            

            Changes in xrdsvc/SsiProvider

            Overriding two virtual methods of the base class XrdSsiProvider which are invoked by XrootD Cluster Management system to notify the XRootD's CMSD service on changes in the resource cache. These changes are also expected to be made in a copy of the ChunkInventory object owned by CMSD.

            Changes in xrdsvc/SsiRequest

            Refactored this class by putting instantiations of the work management objects into a dedicated method.

            Show
            gapon Igor Gaponenko added a comment - - edited John Gates Summary of modifications Most modifications are in module wpublish Moved two (earlier developed) classes which implement worker management commands from module wcontrol to wpublish Changes in the worker database schema These were needed because the worker management code is now required to modify certain tabes in the database allowing user qsmaster to modify metadata tables in database qservw_worker added schema migration script (SQL) for the schema change The relevant files are: admin/templates/configuration/tmp/configure/sql/qserv-worker.sql core/modules/wdb/schema/migrate-1-to-2.sql fixes in the wmgr service The changes were to move away from using MySQL root account to the existing account qsmaster which now has extended privileges allowing to modify metadata tables in database qservw_worker . The following files were changed: core/modules/wmgr/python/dbMgr.py core/modules/wmgr/python/xrdMgr.py Extended/changed protobuf definitions These were needed to support new worker management commands. Minor refactoring of some messages. New worker management operations and classes a command to rebuild the chunk lists database table qservw_worker . Chunks from scratch using an actual collection of partitioned tables found in databases mentioned in table qservw_worker . Dbs a command to add a group of chunks (a chunk with the same number across collocated databases). a command to remove a group of chunks (a chunk with the same number across collocated databases) a command to return a list of all known (to a worker) chunks along with a counter indicating how many active XRootD/SSI requests exists for each chunk For ecah of those operation there are two classes: <Operation>QservRequest : representing the operation on the client-side (the Replication system, etc.) <Operation>WorkerCommand : implementing the operation within Qserv worker New command line tools The tools (written in C++) are meant to interact with worker servies: qserv-worker-notify : for invoking all worker management operations qserv-worker-perf : for performance studies of the management protocol The tools required to make changes to file wpublish/SConscript . Also two utility classes were added: util/CmdParser : the simple (and convenient to use) command line parser for C++ applications util/BlockPost : a thread-safe class for introducing arbitrary (within a configurable interval of milliseconds) delays in C++ applications Extended interface and implementation of class ChunkInventory The modifications were introduced in order to implement the above mentioned worker management operations. In particular, added methods add and remove for adding/removing chunks. These operations would also remove the corresponding entries in metadata table Chunks (mentioned earlier). Also added two trivial exception classes for error reporting in case of database operations failures. Trivial extensions in class global/ResourceUnit The changes were needed to build resource strings New class wpublish/ResourceMonitor This class (effectively - a singleton) is managed by class SsiRequest . Its main purpose is to track which chunk resources and how many of those are in use at any given moment of time. This information is passed back to the Replication system in a request returning a list of known chunks, thus allowing the replication system to make informed decision on the "garbage collecting" of removed chunks. The monitor is also use in the implementation of the work operation which removes chunks from the chunk inventory to prevent (if needed) removing those chunks which are in use. Change in the XRootD configuration template The following file had to be changed to allow manipulations of the XRootD resource cache when adding/removing chunks to workers: admin/templates/configuration/etc/lsp.cf Changes in xrdsvc/SsiProvider Overriding two virtual methods of the base class XrdSsiProvider which are invoked by XrootD Cluster Management system to notify the XRootD's CMSD service on changes in the resource cache. These changes are also expected to be made in a copy of the ChunkInventory object owned by CMSD . Changes in xrdsvc/SsiRequest Refactored this class by putting instantiations of the work management objects into a dedicated method.
            Hide
            jgates John Gates added a comment -

            Looks good, aside from deviation from the LSST style guide. I do care about the spaces between function names and opening parenthesis as if will affect grep "funcName(". 

            Show
            jgates John Gates added a comment - Looks good, aside from deviation from the LSST style guide. I do care about the spaces between function names and opening parenthesis as if will affect grep "funcName(". 

              People

              • Assignee:
                gapon Igor Gaponenko
                Reporter:
                gapon Igor Gaponenko
                Reviewers:
                John Gates
                Watchers:
                Fritz Mueller, Igor Gaponenko, John Gates
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel