I have reviewed the ticket (its scope, a proposed solution and the documentation), and this all looks right to me. The only minor improvement which I would recommend for further practical uses of the proposed recipe is related to step #2 ("...Generate list of all chunks that need to have statistics generated with appropriate options..."). I think using the INFORMATION_SCHEMA would be a better option for compiling a list of eligible tables. An advantage of this approach is that it won't require a direct access to the underlying file system. Here is an idea of a query against the INFORMATION_SCHEMA which would produce a similar result (set of the ANALYZE TABLE ... statements).:
SELECT CONCAT("ANALYSE TABLE ",TABLE_SCHEMA,".",TABLE_NAME)
|
FROM information_schema.tables
|
WHERE TABLE_SCHEMA="LSST" AND TABLE_NAME LIKE "Source\_%";
|
+------------------------------------------------------+
|
| CONCAT("ANALYSE TABLE ",TABLE_SCHEMA,".",TABLE_NAME) |
|
+------------------------------------------------------+
|
| ANALYSE TABLE LSST.Source_1 |
|
| ANALYSE TABLE LSST.Source_2 |
|
+------------------------------------------------------+
|
Planning on writing this up in confluence.