Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-10424

Replication Framework (integration with Qserv)

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      10
    • Sprint:
      DB_S18_02, DB_S18_03, DB_S18_04, DB_S18_05, DB_F18_06
    • Team:
      Data Access and Database

      Description

      This task has two goals:

      1. establishing a mechanism to notify Qserv on a completion of the replication operations after new replicas are created. This also includes a possible negotiation process when replicas need to be removed from a cluster
      2. serve as an anchor for the first code review of the project

       

      Here is a link to my presentation on the design and the architecture of the Replication system:

      http://www.slac.stanford.edu/~gapon/TALKS/2018_Apr_LSST_GroupMeeting/QservReplicationSystem.pdf

       

      Suggested areas for code review (assuming Qsev module replica or its subfolders sql/ and tools/):

      1.  Nate Pease Messaging network:
        • Messenger, MessengerConnectionWorkerServerWorkerServerConnection
        • tools/qserv-replica-messenger-test
      2. Andy Salnikov Configuration service:
        • Configuration, ConfigurationMySQL, ConfigurationFile
        • sql/replication.sql, sql/replication_config*.sql
      3. Andy Salnikov Database and persistent state support:
        • DatabaseMySQL, DatabaseServicesDatabaseServicesMySQL
        • sql/replication.sql
        • tools/qserv-replica-mysql-test - a simple test for the database API
        • tools/qserv-replica-test-sql - experimentation (And a demo) with the C++11's variadic templates for query construction
      4. Nate Pease Replication framework:
        • controller-side: Controller, Request, RequestMessenger (intermediate base class for specific requests based on the Messenger), FindRequest, FindAllRequest, ReplicationRequest, DeleteRequest, StatusRequest (a family of classes), StopRequest (a family of classes), ServiceManagementRequest (a family of classes), JobJobController
        • worker-side: WorkerProcessor, WorkerRequest, WorkerFindRequestWorkerFindAllRequest,WorkerReplicationRequest, WorkerDeleteRequest
        • shared: ServiceProvider, Configuration, DatabaseServices, QservMgtServices, ChunkLocker
        • tools/qserv-replica-worker - the worker-side server with a built-in file service
        • tools/qserv-replica-controller-cmd - tool for invoking individual requests
        • tools/qserv-replica-controller-admin - tool for inspecting/managing worker services
        • tools/qserv-replica-embedded-test - all-in-one solution encapsulating both Controller and (multiple) worker-side services
      5. Fritz Mueller High-level algorithms:
        • Job - base class for a family of requests
        • JobController - job controller
        • specific jobs: FindJob, FindAllJob, ReplicateJob, PurgeJob, RebalanceJob, CreateReplicaJob, DeleteReplicaJob, MoveReplicaJob, PurgeJob, FixUpJob, VerifyJob
        • tools/qserv-replica-master - a fixed-logic replication master executing a sequence of jobs in an infinite loop
        • tools/qserv-replica-jobctrl-test - a simple test of the JobController
        • tools/qserv-replica-job-* - a collection of single-job controllers
      6. John Gates File transfer services, operations with files:
        • FileServer, FileServerConnection, FileClient, FileUtils
        • tools/qserv-replica-calc-cs - a test for the iterative computation of check-sums (see FileUtils)
        • tools/qserv-replica-file-server - stand alone file service
        • tools/qserv-replica-file-read - an example application using the file service to read a remote file
      7. John Gates Integration with Qserv (over XRootD/SSI)
        • QservMgtServices - provides services for Jobs
        • QservMgtRequest - base class for a family of Controller requests
        • specific requests: AddReplicaQservMgtRequest, RemoveReplicaQservMgtRequestGetReplicasQservMgtRequest, SetReplicasQservMgtRequest
        • relevant Job classes: QservGetReplicasJob
        • tools/qserv-replica-worker-notify - tool for sending commands to Qserv workers
        • tools/qserv-replica-job-sync - single job controller application for getting Qserv collection of chunks in synch with the actual disposition of data

        Attachments

          Issue Links

            Activity

            Hide
            gapon Igor Gaponenko added a comment - - edited

            Dear all

            I've created a pull request for the Replication system ticket, and I'm asking for your help in reviewing my code. I've split the code into a number of functionally related groups which you'll find in the Description header of the Jira ticket. I've also attached your names to the groups. The assignment was half-way random and half-way intentional based on my understanding of your experience with Qserv. 

            A few more notes. First, I have many test applications, but zero formal unit test. Not that I'm against the test-driven software development process. It's mostly because this (distributed) software is quite complex to be tested with simple unit tests. Though, I can still see a great need in adding some of those to areas where it makes a sense. Hence, I would really appreciate if you provided me with your feedback on this subject and throw your suggestions on what I should improve here (which unit tests would make a sense to add). I'll be happy to implement them.

            The second thing to mention is that Travis CI for Qserv integration and multi-node is presently a broken due to changes which I made in order to build a Docker container to be run on a different set of ports at IN2P3 (upper half of the Qserv cluster). This had to be done to avoid interference with the main instance of Qserv run simultaneously on the same set of nodes. I'm planning to discard the port changes after the code review before merging code to the master branch.

            And finally, I would like to remind you where to find my presentation on the Replication System which I gave at the group meeting a few weeks ago:

            http://www.slac.stanford.edu/~gapon/TALKS/2018_Apr_LSST_GroupMeeting/QservReplicationSystem.pdf

            And PLEASE, do not make your review based on GitHub's diffs! (If possible) review every single line of the files.

             

            Show
            gapon Igor Gaponenko added a comment - - edited Dear all I've created a pull request for the Replication system ticket, and I'm asking for your help in reviewing my code. I've split the code into a number of functionally related groups which you'll find in the Description header of the Jira ticket. I've also attached your names to the groups. The assignment was half-way random and half-way intentional based on my understanding of your experience with Qserv.  A few more notes. First, I have many test applications, but zero formal unit test. Not that I'm against the test-driven software development process. It's mostly because this (distributed) software is quite complex to be tested with simple unit tests. Though, I can still see a great need in adding some of those to areas where it makes a sense. Hence, I would really appreciate if you provided me with your feedback on this subject and throw your suggestions on what I should improve here (which unit tests would make a sense to add). I'll be happy to implement them. The second thing to mention is that Travis CI for Qserv integration and multi-node is presently a broken due to changes which I made in order to build a Docker container to be run on a different set of ports at IN2P3 (upper half of the Qserv cluster). This had to be done to avoid interference with the main instance of Qserv run simultaneously on the same set of nodes. I'm planning to discard the port changes after the code review before merging code to the master branch. And finally, I would like to remind you where to find my presentation on the Replication System which I gave at the group meeting a few weeks ago: http://www.slac.stanford.edu/~gapon/TALKS/2018_Apr_LSST_GroupMeeting/QservReplicationSystem.pdf And PLEASE , do not make your review based on GitHub's diffs ! (If possible) review every single line of the files.  
            Hide
            npease Nate Pease added a comment -

            I'm done looking at Messaging Network and Replication Framework. I'll look at  High-level algorithms later today or tomorrow.

            Show
            npease Nate Pease added a comment - I'm done looking at Messaging Network and Replication Framework. I'll look at  High-level algorithms later today or tomorrow.
            Hide
            gapon Igor Gaponenko added a comment -

            Nate Pease thanks a lot! I really appreciate your help with moving this project on!

            Show
            gapon Igor Gaponenko added a comment - Nate Pease thanks a lot! I really appreciate your help with moving this project on!
            Hide
            npease Nate Pease added a comment -

            I'm done with everything under High-level algorithms except tools/qserv-replica-job-* - a collection of single-job controllers, I'll do that tomorrow afternoon.

            Show
            npease Nate Pease added a comment - I'm done with everything under High-level algorithms except tools/qserv-replica-job-* - a collection of single-job controllers , I'll do that tomorrow afternoon.
            Hide
            npease Nate Pease added a comment -

            and I'm done with qserv-replica-job* now, which AFAIK is everything. Let me know if there's anything else you'd like me to look at otherwise I'm done. I guess I'll change the status to "reviewed" and you can put it back if needed.

            Show
            npease Nate Pease added a comment - and I'm done with qserv-replica-job* now, which AFAIK is everything. Let me know if there's anything else you'd like me to look at otherwise I'm done. I guess I'll change the status to "reviewed" and you can put it back if needed.
            Hide
            npease Nate Pease added a comment -

            or, "review complete", as it were.

            Show
            npease Nate Pease added a comment - or, "review complete", as it were.
            Hide
            gapon Igor Gaponenko added a comment -

            Nate Pease thank you very much for such a great job! It has helped me a lot (same applies to other reviewers of the ticket)! I still have a few more improvements to be made to the code before attempting rebase and merge. I may ask you (or someone on the list) to have a look at those (yet to come) mods.

            Show
            gapon Igor Gaponenko added a comment - Nate Pease thank you very much for such a great job! It has helped me a lot (same applies to other reviewers of the ticket)! I still have a few more improvements to be made to the code before attempting rebase and merge. I may ask you (or someone on the list) to have a look at those (yet to come) mods.
            Hide
            gapon Igor Gaponenko added a comment -

            Closing the ticket. Though, I messed up with an order of operations when closing the pull request. The request was closed before I pushed the properly merged master into GitHub.

            Show
            gapon Igor Gaponenko added a comment - Closing the ticket. Though, I messed up with an order of operations when closing the pull request. The request was closed before I pushed the properly merged master into GitHub.

              People

              • Assignee:
                gapon Igor Gaponenko
                Reporter:
                fritzm Fritz Mueller
                Reviewers:
                Andy Salnikov, Fritz Mueller, John Gates, Nate Pease
                Watchers:
                Andy Salnikov, Fritz Mueller, Igor Gaponenko, John Gates, Nate Pease
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel