The stats have finished generating and been tested against an Object-Source JOIN that finished in ~130 minutes. Previously, JOIN queries would time out based on the 8hr hard limit in the qserv scheduler.
The generation itself was expedited by splitting the ANALYZE TABLE step into 4 threads and running it in parallel, totaling ~1day run time at IN2P3 rather than the expected 4-5 day timeframe.
Other research attempts were also made to expedite and plan for this process for future KPM and DRs. We found that local mysql instances depended on the mysql.column_stats table only, which is generated by ANALYZE TABLE. It is yet unclear which columns particularly help decide the optimizer for the correct JOIN order, but a short meta-analysis of these column values for all chunks in a worker provided some insight into the spread of the min/max/avg values of the columns. The hope is that some order could be determined in the scaling of the distribution of these values for future/larger datasets, thus avoiding the need for reanalyzing tabes for each new DR. As of now, this work is tentatively planned for a future cycle as it would need dedicated resources and time.