We have pg_sphere installed at usdf-butelr-dev instance (version 1.2, I understand this is latest one from github repo). I ran some tests with obscore table and pg_sphere using the migration scripts from DM-36766 branch, here is the summary of the tests.
DM-36766 code was developed and tested on the copy of OGA repository loaded into my personal server. That copy has very little data, approx 20k records for raw images. These tests we performed on a copy of main repo (main_20210215 schema) which has a lot more data, so one of the goals was to check performance of the whole process.
The process of adding obscore table to the registry includes two steps:
- Adding obscore and obscore-config managers to the registry configuration (butler_attributes table), this obviously needs obscore configuration, I used some something based on configuration used for generating obscore table for QServ. Migration script only creates an empty obscore table but does not populate it with any data.
- Adding records to obscore table from existing registry data, which is done by a new command in dax_butler package: butler obscore-update-table. It scans all dataset refs based on obscore configuration, converts them to obscore records and adds to the table. This is a slow process, especially for raw dataset type because we also need to find a corresponding region from matching visit records.
First worked perfectly, one minor issue is that DM-36766 defines migration script only for OGA namespace (I plan to use that for demonstration and add migration script for main namespace later) so I had to cheat in my config, but this does not really matter for this test.
Second step needed one minor patch to DM-36766 to handle empty timespans correctly (already added to DM-36766 branch). With that fix I managed to run the whole thing, and it took about 3 hours 40 minutes to complete. That feels like a long time, but OTOH we do not need downtime for this and it can run concurrently with regular activity so this should not be an issue.
Two longest contributions to that come from:
- raw dataset type from LSSTCam/raw/all collection - approx. 6100 seconds and 6,188,213 records
- raw dataset type from HSC/raw/all collection - approx. 5300 seconds and 2,935,078 records
All other collections and dataset types take much less time compared to these two. I believe that the major contribution for raw dataset comes from finding matching regions from visit records. There may be ways to improve that, but I do not want to spend too much time right non this (this step should only be done once, so performance is not critical). What is not quite clear to me is why time does not scale linearly with the number of records for these two collections., I'll try to investigate.
I tried to run a bunch of spacial queries on obscore table to verify that pg_sphere works ok, it does not crash the server, and that spatial indexing works as expected. The results were reasonable, no crashes observed, queries (those that can be indexed) were running fast and query analysis shows that indexing is indeed used for those queries.
I think that I can declare success with this test, and we can ask Yee Li to deploy pg_sphere for production instance when there is an opportunity (I guess this will need a downtime).
We need to involve the SQuaRE team in this since they deploy exactly this form of Postgres for the TAP service and we should agree on which variant of pgsphere we are using.
IDF is problematic given that CloudSQL does not support pgsphere. I didn't think we had a requirement to use ObsCore on the IDF.