The dax_apdb branch that I used for Cassandra development diverged significantly from the master branch. We need to start merging that branch to provide Cassandra support and hopefully allow AP pipeline to use actual Cassandra service (when it goes live at SLAC). These are very big changes and it will not be possible to make it backward compatible so we need to update client side as well. The changes include:
- how do we configure each implementation, obviously Cassandra configuration is going to be different from sqlalchemy
- because of how Cassandra partitioning works we need to change API for storing/retrieving all data, but first we need to figure out what is the best API for clients
This ticket is going to cover all aspects, I'll start filling details of these changes in comments so that we can have a productive discussion. Please add anyone who you think should be involved to watchers.
- is blocked by
RFC-798 Add cassandra-driver to conda rubin-env
Krzysztof Findeisen, would you have time to look at the changes? Sorry for large PR, but I hope it simplifies few things on client side and will make it easier for AP pipeline to handle Cassandra transition.
There are four packages to look at (JIRA ignores ap_pipe for some reason):
- dax_apdb: https://github.com/lsst/dax_apdb/pull/21
- ap_association: https://github.com/lsst/ap_association/pull/130
- ap_pipe: https://github.com/lsst/ap_pipe/pull/91
- verify: https://github.com/lsst/verify/pull/89
No need to look at l1dbproto, that one is for my self-review.
Jenkins is happy, and I ran ap_verify with ap_verify_ci_cosmos_pdr2 dataset, that also worked OK.
Any feedback on new API is welcome, if you have any suggestions I'd be happy to hear that too.
Sorry for the delay. A few of the dax_apdb changes make me a little nervous, but otherwise looks good.
I have one question about what this means for ap_verify. ap_verify creates and configures an SQLite database on the fly, setting config parameters that in the new system are specific to SqlApdb. Should it also force a redirect to SQlApdb, or have a check that the pipeline fed to ap_verify uses a subclass of SQlApdb?
To answer the question about `ap_verify` - at this point I don't think it matters, there is no alternative implementation yet so there is no chance it will break. Once we have Cassandra merged we can start looking at how the whole thing needs to be updated to support two implementations. I imagine there is more than one place that will need an update and will likely need a dedicated ticket just for that.
Jenkins is happy with all my recent updates and ap_verify runs OK too. I have merged all packages, hope nothing is broken. Thanks for review and all suggestions, and sorry for such ginormous PR, next ones should be more reasonable size.
And just a heads up for possible future discussion about APDB interface - the countUnassociatedObjects() method may be super inefficient in Casssandra. I'll think about it later when I start working on that implementation, but be prepared for that discussion