Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-1900

Worker management service - design

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None

      Description

      We need to replace direct worker-mysql communication and other administrative channels with a special service which will control all worker communication. Some light-weight service running alongside other worker servers, probably HTTP-based. Data loading, start/stop should be handled by this service.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            I can move this to a Trac page if you say so during review

            Show
            salnikov Andy Salnikov added a comment - I can move this to a Trac page if you say so during review
            Hide
            ktl Kian-Tat Lim added a comment -
            • I think either a Confluence page or an in-repo RST/MD file would be better to store design docs.
            • Security for Qserv may become an issue. Users should not be able to observe each other's queries, table creations, or even database creations. Unless we insist Qserv be deployed on a restricted network, this may mean encrypting all Qserv traffic, including query results and administrative messages.
            • The thing I dislike about ssh is its heavyweight connections. "Permanent" connections have fault-tolerance problems; lots of dynamic ssh connections are very expensive. Running HTTP over ssh does not seem ideal. I tend to dislike Unix sockets as well. HTTP/TCP for now and HTTPS later?
            • The informational side of things (GET requests) is fine. I'm still leery of having centralized "command and control" for making changes (PUT/POST requests), however. I'd prefer it if the system could be more self-organizing. But maybe that's too complex and difficult at this point.
            • You could use PUT /dbs/<db> etc. instead of POST /dbs etc. since the client knows the resulting URL already.
            • Is PUT /dbs/<db>/tables/<table>/data expected to take all the data for the entire table, or is this incremental (in which case it probably shouldn't be PUT)?
            • PUT /services/<service> should be setting a state, not performing an action, and it should be idempotent. Perhaps PUT /services/<service>/running instead?
            • Logging should be handled separately. I'd like to avoid race conditions as something else is rotating and collecting logs.
            Show
            ktl Kian-Tat Lim added a comment - I think either a Confluence page or an in-repo RST/MD file would be better to store design docs. Security for Qserv may become an issue. Users should not be able to observe each other's queries, table creations, or even database creations. Unless we insist Qserv be deployed on a restricted network, this may mean encrypting all Qserv traffic, including query results and administrative messages. The thing I dislike about ssh is its heavyweight connections. "Permanent" connections have fault-tolerance problems; lots of dynamic ssh connections are very expensive. Running HTTP over ssh does not seem ideal. I tend to dislike Unix sockets as well. HTTP/TCP for now and HTTPS later? The informational side of things (GET requests) is fine. I'm still leery of having centralized "command and control" for making changes (PUT/POST requests), however. I'd prefer it if the system could be more self-organizing. But maybe that's too complex and difficult at this point. You could use PUT /dbs/<db> etc. instead of POST /dbs etc. since the client knows the resulting URL already. Is PUT /dbs/<db>/tables/<table>/data expected to take all the data for the entire table, or is this incremental (in which case it probably shouldn't be PUT)? PUT /services/<service> should be setting a state, not performing an action, and it should be idempotent. Perhaps PUT /services/<service>/running instead? Logging should be handled separately. I'd like to avoid race conditions as something else is rotating and collecting logs.
            Hide
            salnikov Andy Salnikov added a comment -

            Thanks K-T, few answers:

            Confluence page or an in-repo RST/MD file would be better to store design docs.

            I'll transfer this to Confluence (or Trac), I'm not sure that general design docs should appear in repo.

            I tend to dislike Unix sockets as well.

            Why?

            HTTP/TCP for now and HTTPS later?

            OK. Still I'd like to keep ssh option. There may be other issues beyond our control (firewalls) that can prevent direct TCP access.

            You could use PUT /dbs/<db> etc. instead of POST /dbs etc. since the client knows the resulting URL already.

            I thought about that and I did not like it. I think usually when you want to create database you want to have it empty (no tables). PUT has to be idempotent which probably maps better to "create if database does not exist, drop all tables if it exists". I think I prefer explicit DELETE to drop database with tables and POST which means "create if does not exist, fail if exist" and is not idempotent.

            Is PUT /dbs/<db>/tables/<table>/data expected to take all the data for the entire table, or is this incremental

            It should be incremental. I thought that existing data would be overwritten if you want to load identical data. But sure one can argue that it should be an error to attempt to overwrite existing data. Should we support both (PUT for overwrite and POST for "append-only")? I'm also OK with just disabling overwrite and switching to POST.

            PUT /services/<service> should be setting a state, not performing an action, and it should be idempotent. Perhaps PUT /services/<service>/running instead?

            Well, the same state can be reached in different ways. Suppose I want running state, but the service is already running, should it just say "OK" or restart service? Sometimes I want to do explicit restart as opposed to just making sure that service is running. For restart I could probably do it in two steps going to stopped and then to running again, but this logically may be different from single restart action. I do not know good RESTful way to solve this problem, we can probably introduce restarted state to solve it but this looks very ugly.

            Logging should be handled separately.

            OK, I won't mention it again.

            Show
            salnikov Andy Salnikov added a comment - Thanks K-T, few answers: Confluence page or an in-repo RST/MD file would be better to store design docs. I'll transfer this to Confluence (or Trac), I'm not sure that general design docs should appear in repo. I tend to dislike Unix sockets as well. Why? HTTP/TCP for now and HTTPS later? OK. Still I'd like to keep ssh option. There may be other issues beyond our control (firewalls) that can prevent direct TCP access. You could use PUT /dbs/<db> etc. instead of POST /dbs etc. since the client knows the resulting URL already. I thought about that and I did not like it. I think usually when you want to create database you want to have it empty (no tables). PUT has to be idempotent which probably maps better to "create if database does not exist, drop all tables if it exists". I think I prefer explicit DELETE to drop database with tables and POST which means "create if does not exist, fail if exist" and is not idempotent. Is PUT /dbs/<db>/tables/<table>/data expected to take all the data for the entire table, or is this incremental It should be incremental. I thought that existing data would be overwritten if you want to load identical data. But sure one can argue that it should be an error to attempt to overwrite existing data. Should we support both (PUT for overwrite and POST for "append-only")? I'm also OK with just disabling overwrite and switching to POST. PUT /services/<service> should be setting a state, not performing an action, and it should be idempotent. Perhaps PUT /services/<service>/running instead? Well, the same state can be reached in different ways. Suppose I want running state, but the service is already running, should it just say "OK" or restart service? Sometimes I want to do explicit restart as opposed to just making sure that service is running. For restart I could probably do it in two steps going to stopped and then to running again, but this logically may be different from single restart action. I do not know good RESTful way to solve this problem, we can probably introduce restarted state to solve it but this looks very ugly. Logging should be handled separately. OK, I won't mention it again.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Copied things to Trac with some minor modification to take into account K-T's comments: https://dev.lsstcorp.org/trac/wiki/db/Qserv/WMGRDesign
            Further improvements should be done there, we can still discuss things on this ticket.

            Show
            salnikov Andy Salnikov added a comment - - edited Copied things to Trac with some minor modification to take into account K-T's comments: https://dev.lsstcorp.org/trac/wiki/db/Qserv/WMGRDesign Further improvements should be done there, we can still discuss things on this ticket.
            Hide
            ktl Kian-Tat Lim added a comment -

            Andy's responses look reasonable. No further comments at this time.

            Show
            ktl Kian-Tat Lim added a comment - Andy's responses look reasonable. No further comments at this time.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Reviewers:
              Kian-Tat Lim
              Watchers:
              Andy Salnikov, Kian-Tat Lim
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  CI Builds

                  No builds found.