Fix Version/s: None
Sprint:DB_S18_05, DB_F18_06, DB_F18_07, DB_F18_08, DB_F18_09, DB_F18_10
Team:Data Access and Database
Major milestones of this effort:
- upgrade Qserv installation in PDAC with the latest version of Docker containers based on MariaDB 10.2.14. The new containers are also needed to support the extended management protocol needed for cooperation between Qserv workers and the Replication system's Controllers when changes to replica disposition are made.
- Docker-based install and preliminary tests of the Replication system's tools on the cluster. At this stage a proper configuration of the Replication system will be devised and tested.
- Scalability tests of the main replication operations, including operations with persistent state of the operations (Replication jobs and requests). Making improvements to the implementation of the system as needed.
- Implement the Health Monitoring Algorithm for workers of both kinds (Qserv and the Replication system). Integrate this algorithm into the above mentioned fixed logic Controller. The initial version of the algorithm will be sending probe requests to both kinds of workers and measure their responses (which must arrive within a reasonable period of time). This may also require to make some adjustments to the Replication system's Messaging Network to respect priorities of these requests. The current implementation has a plain queue. The new one should get a priority queue.
- Finalize and test the fixed-logic replication Controller
- when confident with the functionality, performance and robustness of the Replication system integrate the system with the existing Kubernetes infrastructure
Hi Igor Gaponenko, I'm not a C++ expert but thanks for adding me to this review where I learned a log about this replication system. I've added a few minor comments and removed myself from reviewers list. We should think on how to integrate these new services inside Kubernetes in the long term... Your health probe algorithms might also be triggered inside Kubernetes probes...
I think I have done my part reviewing the ticket (or at least I have to stop somewhere to return to my work), there are plenty of comments on PR.
I really do not know what to say. I trust replication part should be OK as it is was tested at PDAC. What worries me is the database access code. As it happens we now have two separate APIs for mysql interface in qserv and they both are very far from perfect. I do not think that we can afford to support and improve both so we should instead have one reasonable interface with good quality implementation. I really wish that Igor worked on improving our existing sql+mysql implementation rather than rolling out completely new code. Anyways I think that for long term we should converge on a single implementation, either one of our own or maybe some other third-party package (and for completeness I should also mention option I like best - rewriting replicator in Python )
Removing myself from reviewers.
Andy Salnikov thank you very much for putting so much effort into reviewing the ticket! Most of your comments and suggestion on my code (as always) make a lot of sense to me.
I would also like to address your last comment on the database APIs. Even though, I would agree (in general) with your argument that we may need a better API, I should also mention the following:
- a choice of the database technology for supporting a persistent state of the Replication system is not that critical as in case of Qserv. The Replication system's state can be well contained within virtually any relational (and not only) database. I'm pretty sure multiple strains of MySQL will be around for many years ahead. Moreover, the Docker technology makes this long-term investment (into MySQL) even more reasonable.
- I don't see much benefits for building a database abstraction layer on top of MySQL within the Replication system. That's why I've put my efforts in building a MySQL-specifc OO-layer on top of the low-level C library (for MySQL) based primarily on my needs and requirements. Also note that my implementation is quite limited. And it's not aiming at being a general purpose solution to all problems.
- at the mean time, requirements to the database abstraction layer in Qserv are quite different. Having a technology-neutral API could be one of those. Besides, this API should also allow efficient implementations for reading large amounts of data from a database.
- I also believe that building a high quality general purpose database API (including a technology-neutral abstraction layer and efficient underlying implementation) is a very big job. It may require substantial efforts to design and implement it. Based on my informal cost-benefit analysis, I don't think it's worth it. Or, we may not just afford this development.
- And finally, we've already discussed a possibility of decoupling the Replication system from Qserv code base and placing it into a separate Git package. This will make code sharing even less possible from a practical standpoint (actually, as of today, there is very little overlap between Qserv and the Replication system)
between Andy Salnikov and I all the files have been looked at and comments have been left where appropriate. We will let Igor Gaponenko make changes and submit as he sees fit.
I've looked at core/modules/replica/AddReplicaQservMgtRequest.cc through core/modules/replica/Controller.h And I'm calling it quits on this for the day.