Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-530

Table column names in new parser

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
    • Story Points:
      5
    • Sprint:
      DB_W16_12
    • Team:
      Data Access and Database

      Description

      Running tests (qserv-testdata.sh) on pre-loaded data I have observed that many test fail for the only reason that the column names in the dumped query results are different between mysql and qserv. Here is an example of query reqult returned from mysql:

      mysql> SELECT sce.filterName, sce.field, sce.camcol, sce.run FROM Science_Ccd_Exposure AS sce WHERE sce.filterName = 'g' AND sce.field = 670 AND sce.camcol = 2 AND sce.run = 7202;
      +------------+-------+--------+------+
      | filterName | field | camcol | run  |
      +------------+-------+--------+------+
      | g          |   670 |      2 | 7202 |
      +------------+-------+--------+------+

      and this is the same query processed by qserv:

      mysql> SELECT sce.filterName, sce.field, sce.camcol, sce.run FROM Science_Ccd_Exposure AS sce WHERE sce.filterName = 'g' AND sce.field = 670 AND sce.camcol = 2 AND sce.run = 7202;
      +----------+----------+----------+----------+
      | QS1_PASS | QS2_PASS | QS3_PASS | QS4_PASS |
      +----------+----------+----------+----------+
      | g        |      670 |        2 |     7202 |
      +----------+----------+----------+----------+

      We discussed this already with Daniel yesterday and at qserv meeting today, here I just want to collect what we know so far so that we can return to this again later.

      As Daniel explained to me this is the result of the new parser assigning aliases to the columns which do not define aliases for themselves. This helps with tracking query proceeding through the processing pipeline. Daniel's observation is that different database engines may assign different names to result columns (or some may not even assign any names), there is no standard in that respect so there is no point in trying to follow what one particular implementation does. Additionally there are issues with conflicting column names and names which are complex expressions.

      Difference in column names breaks our tests which dump complete results including table header. The tests could be fixed easily, we could just ignore table headers when dumping the data. More interesting issue is that there may be use cases for better compatibility between mysql and qserv including result column naming. In particular standard Python mysql interface allows one to use column names to retrieve values from queiry result. If qserv assigns arbitrary aliases to the columns it may confuse this kind of clients.

      This issue depends very much on what kind of API qserv is going to provide to clients. If mysql (wire-level) protocol is going to be the main API (which would allow all kinds of mysql clients to talk to qserv directly) then we should probably think more about compatibility with mysql. OTOH if we decide to provide our own API then this may not be an issue at all (but we still need to fix current test setup which is based on mysql).

      We probably should discuss API question at our dev meeting.

        Attachments

          Issue Links

            Activity

            Hide
            ktl Kian-Tat Lim added a comment -

            I think that SQL mandates that every column or alias (with different values) in the SELECT clause be textually distinct. I don't see any reason not to use the text that the user supplied in the column headers. I agree that we should strive to be MySQL-compatible.

            Show
            ktl Kian-Tat Lim added a comment - I think that SQL mandates that every column or alias (with different values) in the SELECT clause be textually distinct. I don't see any reason not to use the text that the user supplied in the column headers. I agree that we should strive to be MySQL-compatible.
            Hide
            xiuqin Xiuqin Wu [X] (Inactive) added a comment -

            I agree with K-T. There is no benefit not using the text that the user supplied in the column headers.

            Show
            xiuqin Xiuqin Wu [X] (Inactive) added a comment - I agree with K-T. There is no benefit not using the text that the user supplied in the column headers.
            Hide
            npease Nate Pease [X] (Inactive) added a comment -

            this is implemented: column names are now returned as they were named in the sql statement, according to mysql rules (e.g. `table.column` is truncated to `column`)

            Show
            npease Nate Pease [X] (Inactive) added a comment - this is implemented: column names are now returned as they were named in the sql statement, according to mysql rules (e.g. `table.column` is truncated to `column`)

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              salnikov Andy Salnikov
              Watchers:
              Andy Salnikov, Jacek Becla, Kian-Tat Lim, Nate Pease [X] (Inactive), Xiuqin Wu [X] (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.