# Documentation for generating statistics for L2/catalog data

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
2
• Sprint:
DB_S17_5, DB_S17_6
• Team:
Data Access and Database

#### Description

Statistics were generated at IN2P3 for correcting JOIN queries. The general process needs to be documented so we can optimize it in the future (see DM-9757 for reference).

#### Activity

Hide
Vaikunth Thukral added a comment -

Planning on writing this up in confluence.

Show
Vaikunth Thukral added a comment - Planning on writing this up in confluence.
Hide
Vaikunth Thukral added a comment -
Show
Vaikunth Thukral added a comment - The document to review is in confluence: https://confluence.lsstcorp.org/pages/viewpage.action?pageId=58950786
Hide
Igor Gaponenko added a comment -

I have reviewed the ticket (its scope, a proposed solution and the documentation), and this all looks right to me. The only minor improvement which I would recommend for further practical uses of the proposed recipe is related to step #2 ("...Generate list of all chunks that need to have statistics generated with appropriate options..."). I think using the INFORMATION_SCHEMA would be a better option for compiling a list of eligible tables. An advantage of this approach is that it won't require a direct access to the underlying file system. Here is an idea of a query against the INFORMATION_SCHEMA which would produce a similar result (set of the ANALYZE TABLE ... statements).:

 SELECT CONCAT("ANALYSE TABLE ",TABLE_SCHEMA,".",TABLE_NAME)  FROM information_schema.tables  WHERE TABLE_SCHEMA="LSST" AND TABLE_NAME LIKE "Source\_%"; +------------------------------------------------------+ | CONCAT("ANALYSE TABLE ",TABLE_SCHEMA,".",TABLE_NAME) | +------------------------------------------------------+ | ANALYSE TABLE LSST.Source_1 | | ANALYSE TABLE LSST.Source_2 | +------------------------------------------------------+ 

Show
Igor Gaponenko added a comment - I have reviewed the ticket (its scope, a proposed solution and the documentation), and this all looks right to me. The only minor improvement which I would recommend for further practical uses of the proposed recipe is related to step #2 ("...Generate list of all chunks that need to have statistics generated with appropriate options..."). I think using the INFORMATION_SCHEMA would be a better option for compiling a list of eligible tables. An advantage of this approach is that it won't require a direct access to the underlying file system. Here is an idea of a query against the INFORMATION_SCHEMA which would produce a similar result (set of the ANALYZE TABLE ... statements).: SELECT CONCAT( "ANALYSE TABLE " ,TABLE_SCHEMA, "." ,TABLE_NAME) FROM information_schema.tables WHERE TABLE_SCHEMA= "LSST" AND TABLE_NAME LIKE "Source\_%" ; + ------------------------------------------------------+ | CONCAT( "ANALYSE TABLE " ,TABLE_SCHEMA, "." ,TABLE_NAME) | + ------------------------------------------------------+ | ANALYSE TABLE LSST.Source_1 | | ANALYSE TABLE LSST.Source_2 | + ------------------------------------------------------+
Hide
Vaikunth Thukral added a comment -

Thanks for the review Igor Gaponenko, I've added your suggestion to the documentation and moved the ticket to Done.

Show
Vaikunth Thukral added a comment - Thanks for the review Igor Gaponenko , I've added your suggestion to the documentation and moved the ticket to Done.

#### People

Assignee:
Vaikunth Thukral
Reporter:
Vaikunth Thukral
Reviewers:
Igor Gaponenko
Watchers:
Igor Gaponenko, Vaikunth Thukral