Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-8237

Clarify policy on the use of the qserv_ vs. scisql_ functions for spherical geometry queries

    XMLWordPrintable

    Details

    • Type: Story
    • Status: In Progress
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
    • Story Points:
      0.25
    • Sprint:
      DB_F20_09, DB_S21_12, DB_F21_06, DB_S22_12, DB_F22_6, DB_S23_6
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      Please clarify whether the operational DAX+Qserv system should be queried using the qserv_* family of functions (UDFs) or the corresponding scisql_* functions.

      Example: qserv_areaspec_circle versus scisql_s2PtInCircle.

      From documentation I can see, the scisql_* functions may be a (substantial) superset of the qserv_* functions.

      I understand that the answer to this question may be different for the 2016-17 PDAC versus the planned final system.

        Attachments

          Issue Links

            Activity

            Hide
            ktl Kian-Tat Lim added a comment -

            I think the answer is that Qserv today will limit queries to relevant chunks if and only if qserv_ functions are used. (Those qserv_ functions get translated into scisql_ functions on the workers.) No scisql_ functions in the query appear to be recognized for chunk restriction; they are only ever executed on the workers. (There is a special case for scisql_angSep which is used to check for overlap region usage but still not for chunk restriction.)

            Show
            ktl Kian-Tat Lim added a comment - I think the answer is that Qserv today will limit queries to relevant chunks if and only if qserv_ functions are used. (Those qserv_ functions get translated into scisql_ functions on the workers.) No scisql_ functions in the query appear to be recognized for chunk restriction; they are only ever executed on the workers. (There is a special case for scisql_angSep which is used to check for overlap region usage but still not for chunk restriction.)
            Show
            ktl Kian-Tat Lim added a comment - This matches with the documentation in https://github.com/lsst/qserv/blob/master/UserManual.md#spatial-constraints-should-be-expressed-through-our-qserv_areaspec_-functions .
            Hide
            ktl Kian-Tat Lim added a comment -

            Fritz Mueller please outsource to someone to confirm my code reading and decide if any clarifications to the documentation are in order.

            Show
            ktl Kian-Tat Lim added a comment - Fritz Mueller please outsource to someone to confirm my code reading and decide if any clarifications to the documentation are in order.
            Hide
            xiuqin Xiuqin Wu [X] (Inactive) added a comment -

            Does this mean that SUIT should limit the calls to DAX service using only the qserv_ functions? Kian-Tat Lim and Fritz Mueller

            Show
            xiuqin Xiuqin Wu [X] (Inactive) added a comment - Does this mean that SUIT should limit the calls to DAX service using only the qserv_ functions? Kian-Tat Lim and Fritz Mueller
            Hide
            fritzm Fritz Mueller added a comment -

            If you have an area-restricted query, and you can use qserv_ functions to express your constraint, use those for best performance.

            Show
            fritzm Fritz Mueller added a comment - If you have an area-restricted query, and you can use qserv_ functions to express your constraint, use those for best performance.
            Hide
            xiuqin Xiuqin Wu [X] (Inactive) added a comment -

            "No scisql_ functions in the query appear to be recognized for chunk restriction; they are only ever executed on the workers"

            For SUIT request through DAX, what does this mean, not executed or slow execution?

            Show
            xiuqin Xiuqin Wu [X] (Inactive) added a comment - "No scisql_ functions in the query appear to be recognized for chunk restriction; they are only ever executed on the workers" For SUIT request through DAX, what does this mean, not executed or slow execution?
            Hide
            ktl Kian-Tat Lim added a comment -

            Xiuqin Wu [X]: qserv_ functions enable Qserv to execute the query on only the chunks that might produce results. Without a qserv_ function (or an objectId condition in the WHERE clause), Qserv has to execute the query on every chunk, which will usually take longer.

            So, as Fritz said, if you are using a scisql_ function in the WHERE clause that has a qserv_ equivalent, you should use the qserv_ version for best performance. It will be executed only on the relevant chunks, and it will automatically be converted into the scisql_ function for execution on the workers.

            If you are using a scisql_ function that does not have a qserv_ equivalent, you should consider adding a qserv_ restriction function in the WHERE clause. Whether you do so or not, the scisql_ function will still be executed on the workers. But if you don't add a qserv_ function, the query will execute on every chunk and so may be slower.

            Show
            ktl Kian-Tat Lim added a comment - Xiuqin Wu [X] : qserv_ functions enable Qserv to execute the query on only the chunks that might produce results. Without a qserv_ function (or an objectId condition in the WHERE clause), Qserv has to execute the query on every chunk, which will usually take longer. So, as Fritz said, if you are using a scisql_ function in the WHERE clause that has a qserv_ equivalent, you should use the qserv_ version for best performance. It will be executed only on the relevant chunks, and it will automatically be converted into the scisql_ function for execution on the workers. If you are using a scisql_ function that does not have a qserv_ equivalent, you should consider adding a qserv_ restriction function in the WHERE clause. Whether you do so or not, the scisql_ function will still be executed on the workers. But if you don't add a qserv_ function, the query will execute on every chunk and so may be slower.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment - - edited
            Show
            gpdf Gregory Dubois-Felsmann added a comment - - edited Xiuqin Wu [X] : We need to edit the [sample queries] https://confluence.lsstcorp.org/display/DM/PDAC+sample+queries+and+test+cases accordingly.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            This ticket is shown as "in review". Where will the user instructions / policy be documented persistently?

            Presumably (Brian Van Klaveren?) this is an interim solution and eventually this will be done via optimal back-end interpretation of equivalent ADQL constructs?

            Show
            gpdf Gregory Dubois-Felsmann added a comment - This ticket is shown as "in review". Where will the user instructions / policy be documented persistently? Presumably ( Brian Van Klaveren ?) this is an interim solution and eventually this will be done via optimal back-end interpretation of equivalent ADQL constructs?
            Hide
            bvan Brian Van Klaveren added a comment -

            I do think this would be an interim solution. Long term we should support ADQL, though I there may be some similar restrictions there as well.

            I would note that the qserv_areaspec_**(...) are semantically equivalent to scisql_s2PtIn**(psRa, psDecl, ...), where psRa and psDecl are always from the director table (Object) but with the restriction that you may only have one such qserv_areaspec_ directive per query.

            And that's probably a good way of thinking about it - the qserv_areaspec_ functions are really directives telling qserv the geometries to restrict the queries to, based on the master partitioning scheme.

            Show
            bvan Brian Van Klaveren added a comment - I do think this would be an interim solution. Long term we should support ADQL, though I there may be some similar restrictions there as well. I would note that the qserv_areaspec_**(...) are semantically equivalent to scisql_s2PtIn**(psRa, psDecl, ...) , where psRa and psDecl are always from the director table (Object) but with the restriction that you may only have one such qserv_areaspec_ directive per query. And that's probably a good way of thinking about it - the qserv_areaspec_ functions are really directives telling qserv the geometries to restrict the queries to, based on the master partitioning scheme.
            Hide
            fritzm Fritz Mueller added a comment -

            +1 to Brian's comment above. We still need to decide where to document before closing the ticket?

            Show
            fritzm Fritz Mueller added a comment - +1 to Brian's comment above. We still need to decide where to document before closing the ticket?
            Hide
            tjenness Tim Jenness added a comment -

            Has this ticket become irrelevant now that we are using ADQL?

            Show
            tjenness Tim Jenness added a comment - Has this ticket become irrelevant now that we are using ADQL?
            Hide
            fritzm Fritz Mueller added a comment -

            I believe so (yes, now irrelevant).

            TAP notwithstanding, Qserv itself also now automatically infers the most common area restrictions directly from `scisql_s2PtIn...`.

            Gregory Dubois-Felsmann: please set status to "Invalid" if you feel the issue is satisfactorily resolved, or clarify required further action here if not?

            Kian-Tat Lim: Was the status update to "In Review" on 09/Nov/16 intentional?  I do not see any linked ticket.  Is/was there material requiring review?

            Show
            fritzm Fritz Mueller added a comment - I believe so (yes, now irrelevant). TAP notwithstanding, Qserv itself also now automatically infers the most common area restrictions directly from `scisql_s2PtIn...`. Gregory Dubois-Felsmann : please set status to "Invalid" if you feel the issue is satisfactorily resolved, or clarify required further action here if not? Kian-Tat Lim : Was the status update to "In Review" on 09/Nov/16 intentional?  I do not see any linked ticket.  Is/was there material requiring review?
            Hide
            ktl Kian-Tat Lim added a comment -

            As I said at the time, the status was updated to "In Review" because I was asserting that existing documentation already clarified this policy and wanted a check on that.

            Show
            ktl Kian-Tat Lim added a comment - As I said at the time, the status was updated to "In Review" because I was asserting that existing documentation already clarified this policy and wanted a check on that.
            Hide
            fritzm Fritz Mueller added a comment -

            Thanks, Kian-Tat Lim (sorry I missed your original comment above the fold).  I'll attach this to our current F20 doc epic, and close it when we've updated UserManual.md wrt. auto-inferred spatial constraints.

            Show
            fritzm Fritz Mueller added a comment - Thanks, Kian-Tat Lim (sorry I missed your original comment above the fold).  I'll attach this to our current F20 doc epic, and close it when we've updated UserManual.md wrt. auto-inferred spatial constraints.
            Hide
            tjenness Tim Jenness added a comment -

            It seems like this ticket is still relevant in that we are allowing scisql calls directly into the ADQL.

            We need to make a decision about this, maybe on a new ticket, since we have to be intentional about extensions to ADQL given:

            • We are promising to support those extensions for a long time since people want their queries to work.
            • We need to minimize these extensions (do not use scisql for sky area when ADQL works fine)
            • We need to whitelist specific extensions.
            • We should consider renaming them so they are not locking us into scisql if at some point we change from mariadb.
            Show
            tjenness Tim Jenness added a comment - It seems like this ticket is still relevant in that we are allowing scisql calls directly into the ADQL. We need to make a decision about this, maybe on a new ticket, since we have to be intentional about extensions to ADQL given: We are promising to support those extensions for a long time since people want their queries to work. We need to minimize these extensions (do not use scisql for sky area when ADQL works fine) We need to whitelist specific extensions. We should consider renaming them so they are not locking us into scisql if at some point we change from mariadb.

              People

              Assignee:
              fritzm Fritz Mueller
              Reporter:
              gpdf Gregory Dubois-Felsmann
              Reviewers:
              Fritz Mueller
              Watchers:
              Brian Van Klaveren, Christine Banek, Fritz Mueller, Gregory Dubois-Felsmann, Kian-Tat Lim, Serge Monkewitz, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins

                  No builds found.