Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-1615

Design and implement CSS structure for distributed Qserv setup

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None

      Description

      For management of the distributed databases/tables we need info in CSS about all workers and tables. The info will be created by data loader and updated by replicator which do not exist yet. For this issue we need to provide python API which can fill the same information in CSS so that we can build and test other pieces needed for this epic.

        Attachments

          Issue Links

            Activity

            Hide
            ktl Kian-Tat Lim added a comment -

            Looks pretty good. Some minor points:

            • If the node exists in the CSS but has no data, should nothing be returned, or should an exception be raised?
            • Should the exception message mention the key (which is not actually correct if the data is in JSON format) or just the node name?
            • I take it the existence check on creation is just for safety and that colliding creates are also caught in the create method. Otherwise, there could be a race condition.
            • In the Trac document (which is migrating to Confluence?), in the possible future where the node name is just the host name plus port number, are the host name and port number values even necessary? What if they disagreed?
            • There's no documentation of the .json option in the Trac page.
            • It should be noted that .json overrides a plain node name. Should the case of both keys existing be caught?
            • The error message "Node does not exists" should be "Node does not exist".
            Show
            ktl Kian-Tat Lim added a comment - Looks pretty good. Some minor points: If the node exists in the CSS but has no data, should nothing be returned, or should an exception be raised? Should the exception message mention the key (which is not actually correct if the data is in JSON format) or just the node name? I take it the existence check on creation is just for safety and that colliding creates are also caught in the create method. Otherwise, there could be a race condition. In the Trac document (which is migrating to Confluence?), in the possible future where the node name is just the host name plus port number, are the host name and port number values even necessary? What if they disagreed? There's no documentation of the .json option in the Trac page. It should be noted that .json overrides a plain node name. Should the case of both keys existing be caught? The error message "Node does not exists" should be "Node does not exist".
            Hide
            salnikov Andy Salnikov added a comment -

            Thanks, K-T. Some responses:

            If the node exists in the CSS but has no data, should nothing be returned, or should an exception be raised?

            It will return empty dict for now. I'm not sure if there is any use case for that, if anyone wants to make empty nodes then we should handle it "normally".

            Should the exception message mention the key (which is not actually correct if the data is in JSON format) or just the node name?

            It's complicated, I'll need to check the logic again when checking for .json key should be done vs non-json. I think message should mention the actual key because of this ambiguity, or it can mention both node name and a key.

            I take it the existence check on creation is just for safety and that colliding creates are also caught in the create method. Otherwise, there could be a race condition.

            Mostly yes, but I don't think races are possible here. Populating CSS with nodes in my view is not an activity which can be done concurrently by multiple clients.

            In the Trac document (which is migrating to Confluence?), in the possible future where the node name is just the host name plus port number, are the host name and port number values even necessary? What if they disagreed?

            No comment on Confluence. I think we should keep node name to be just a unique ID and not assign any meaning to it, it should give us extr flexibility in case we decide to do something differently.

            There's no documentation of the .json option in the Trac page.

            JSON is an implementation detail for now, not exposed anywhere yet (but we should document it). I believe Daniel unpacks JSON when he clones CSS data into memory for qserv use.

            It should be noted that .json overrides a plain node name. Should the case of both keys existing be caught?

            There are use cases when both keys are there, in ZK at least there is a distinction between key value and key children. In case of packed JSON data only children are stored as JSON and key value is still stored as regular key.

            The error message "Node does not exists" should be "Node does not exist".

            OK.

            Show
            salnikov Andy Salnikov added a comment - Thanks, K-T. Some responses: If the node exists in the CSS but has no data, should nothing be returned, or should an exception be raised? It will return empty dict for now. I'm not sure if there is any use case for that, if anyone wants to make empty nodes then we should handle it "normally". Should the exception message mention the key (which is not actually correct if the data is in JSON format) or just the node name? It's complicated, I'll need to check the logic again when checking for .json key should be done vs non-json. I think message should mention the actual key because of this ambiguity, or it can mention both node name and a key. I take it the existence check on creation is just for safety and that colliding creates are also caught in the create method. Otherwise, there could be a race condition. Mostly yes, but I don't think races are possible here. Populating CSS with nodes in my view is not an activity which can be done concurrently by multiple clients. In the Trac document (which is migrating to Confluence?), in the possible future where the node name is just the host name plus port number, are the host name and port number values even necessary? What if they disagreed? No comment on Confluence. I think we should keep node name to be just a unique ID and not assign any meaning to it, it should give us extr flexibility in case we decide to do something differently. There's no documentation of the .json option in the Trac page. JSON is an implementation detail for now, not exposed anywhere yet (but we should document it). I believe Daniel unpacks JSON when he clones CSS data into memory for qserv use. It should be noted that .json overrides a plain node name. Should the case of both keys existing be caught? There are use cases when both keys are there, in ZK at least there is a distinction between key value and key children. In case of packed JSON data only children are stored as JSON and key value is still stored as regular key. The error message "Node does not exists" should be "Node does not exist". OK.
            Hide
            ktl Kian-Tat Lim added a comment -

            It would be nice if nodes could self-register themselves, so there could definitely be multiple simultaneous clients, but in that case there shouldn't be any collisions, so the current code is OK.

            JSON is not exposed in the API, but it is exposed in the available keys (which is what the documentation purports to list), so it seems like a bit more than an implementation detail.

            Show
            ktl Kian-Tat Lim added a comment - It would be nice if nodes could self-register themselves, so there could definitely be multiple simultaneous clients, but in that case there shouldn't be any collisions, so the current code is OK. JSON is not exposed in the API, but it is exposed in the available keys (which is what the documentation purports to list), so it seems like a bit more than an implementation detail.
            Hide
            salnikov Andy Salnikov added a comment -

            Fixed some checks and improved messages, merged and pushed.

            Regarding JSON - I agree that it needs better documentation, I'll try to work on it soon.

            Regarding self-registering we need to talk about it, for me CSS is primary source of info describing the intent. This means that workers will be started based on the info from CSS, not vice versa. But we could reverse that of course, just need to agree and design things accordingly.

            Show
            salnikov Andy Salnikov added a comment - Fixed some checks and improved messages, merged and pushed. Regarding JSON - I agree that it needs better documentation, I'll try to work on it soon. Regarding self-registering we need to talk about it, for me CSS is primary source of info describing the intent. This means that workers will be started based on the info from CSS, not vice versa. But we could reverse that of course, just need to agree and design things accordingly.
            Hide
            salnikov Andy Salnikov added a comment -

            There's no documentation of the .json option in the Trac page.

            I have added a section describing key packing: https://dev.lsstcorp.org/trac/wiki/db/Qserv/CSS#Noteonkeysandpacking

            Show
            salnikov Andy Salnikov added a comment - There's no documentation of the .json option in the Trac page. I have added a section describing key packing: https://dev.lsstcorp.org/trac/wiki/db/Qserv/CSS#Noteonkeysandpacking

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Reviewers:
              Kian-Tat Lim
              Watchers:
              Andy Salnikov, Jacek Becla, Kian-Tat Lim
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.