Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-5514

Validate shared scan implementation on IN2P3 cluster

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None

      Attachments

        Activity

        fritzm Fritz Mueller created issue -
        fritzm Fritz Mueller made changes -
        Field Original Value New Value
        Epic Link DM-4809 [ 22176 ]
        fritzm Fritz Mueller made changes -
        Sprint DB_X16_03 [ 204 ] DB_S16_04 [ 200 ]
        fritzm Fritz Mueller made changes -
        Rank Ranked lower
        jgates John Gates made changes -
        Status To Do [ 10001 ] In Progress [ 3 ]
        Hide
        jgates John Gates added a comment -

        SELECT o.deepSourceId, s.objectId, s.id, o.ra, o.decl FROM Object o, Source s WHERE o.deepSourceId=s.objectId AND s.flux_sinc BETWEEN 0.3 AND 0.304;

        single solo query - 6 hours 50 min 58.31 sec

        5 copies started at nearly the same time
        5 hours 58 min 16.76 sec
        6 hours 15 min 11.38 sec
        6 hours 15 min 10.33 sec
        6 hours 15 min 9.35 sec
        6 hours 15 min 8.27 sec

        SELECT count FROM Object WHERE u_apFluxSigma between 0 and 1.8e-30;
        solo: 1 min 53.65 sec
        Five started within a couple of seconds of each other:
        2 min 7.58 sec
        2 min 13.16 sec
        2 min 13.81 sec
        2 min 12.32 sec
        2 min 17.73 sec

        Show
        jgates John Gates added a comment - SELECT o.deepSourceId, s.objectId, s.id, o.ra, o.decl FROM Object o, Source s WHERE o.deepSourceId=s.objectId AND s.flux_sinc BETWEEN 0.3 AND 0.304; single solo query - 6 hours 50 min 58.31 sec 5 copies started at nearly the same time 5 hours 58 min 16.76 sec 6 hours 15 min 11.38 sec 6 hours 15 min 10.33 sec 6 hours 15 min 9.35 sec 6 hours 15 min 8.27 sec SELECT count FROM Object WHERE u_apFluxSigma between 0 and 1.8e-30; solo: 1 min 53.65 sec Five started within a couple of seconds of each other: 2 min 7.58 sec 2 min 13.16 sec 2 min 13.81 sec 2 min 12.32 sec 2 min 17.73 sec
        Hide
        jgates John Gates added a comment -

        Using noop disk scheduler instead of deadline, scheduler limited to working on 3 chunks at a time.

        SELECT o.deepSourceId, s.objectId, s.id, o.ra, o.decl FROM Object o, Source s WHERE o.deepSourceId=s.objectId AND s.flux_sinc BETWEEN 0.3 AND 0.304;

        solo - 5 hours 40 min 54.10 sec

        Five copies started at nearly the same time.
        6 hours 26 min 51.08 sec
        6 hours 26 min 52.17 sec
        6 hours 26 min 53.31 sec
        6 hours 26 min 54.44 sec
        6 hours 21 min 13.36 sec

        Show
        jgates John Gates added a comment - Using noop disk scheduler instead of deadline, scheduler limited to working on 3 chunks at a time. SELECT o.deepSourceId, s.objectId, s.id, o.ra, o.decl FROM Object o, Source s WHERE o.deepSourceId=s.objectId AND s.flux_sinc BETWEEN 0.3 AND 0.304; solo - 5 hours 40 min 54.10 sec Five copies started at nearly the same time. 6 hours 26 min 51.08 sec 6 hours 26 min 52.17 sec 6 hours 26 min 53.31 sec 6 hours 26 min 54.44 sec 6 hours 21 min 13.36 sec
        Hide
        jgates John Gates added a comment -

        The above tests had issues that prevented memman from working properly.

        Show
        jgates John Gates added a comment - The above tests had issues that prevented memman from working properly.
        Hide
        jgates John Gates added a comment -

        The mmap call is failing do to system configuration settings being incorrect on the cluster. I made a work around in the code that marks the memory for a table as reserved if the mmap call fails and the total locked and reserved memory is less than bytesMax.

        SELECT o.deepSourceId, s.objectId, s.id, o.ra, o.decl FROM Object o, Source s WHERE o.deepSourceId=s.objectId AND s.flux_sinc BETWEEN 0.3 AND 0.304;
        With bytesMax = 6 gigabytes runtime was 8 hours 0.18 sec.

        Show
        jgates John Gates added a comment - The mmap call is failing do to system configuration settings being incorrect on the cluster. I made a work around in the code that marks the memory for a table as reserved if the mmap call fails and the total locked and reserved memory is less than bytesMax. SELECT o.deepSourceId, s.objectId, s.id, o.ra, o.decl FROM Object o, Source s WHERE o.deepSourceId=s.objectId AND s.flux_sinc BETWEEN 0.3 AND 0.304; With bytesMax = 6 gigabytes runtime was 8 hours 0.18 sec.
        Hide
        jgates John Gates added a comment -

        After testing and doing some cleanup, some things work other things not so much. A few bugs needed to be fixed.

        The basics of scan scheduling appear to be sound. Several identical queries running at the same time only take marginally longer than a single query. However, the scheduler can impose a significant delay to the start of new queries that requires a significant change to the scheduler to fix. The cluster at in2p3 had configuration issues that kept mmap from working. Had the configuration issues been solved, there is insufficient memory (or our chunks are too large) to read in an adequate number of large table chunks to test memman efficiency. memman does appear to help in that limiting the number of chunks open at one time allowed the query to finish in about the same time as having several more chuncks open at the same time, but the system load was much lower. Once mmap can be used, it would be good to compare several duplicate queries running with mmap and without it.

        Worker scheduling to do:

        • Switch scheduler from using a pair of heaps to using a list of buckets so that queries can start sooner.
        • Make a better fix for memman to work with systems with low memory or where configuring to work with mmap is difficult.
        • Low priority everything queue for slow queries and queries that don't really fit in an existing scheduler.
        • code to detect and move slow queries to the everything queue.
        Show
        jgates John Gates added a comment - After testing and doing some cleanup, some things work other things not so much. A few bugs needed to be fixed. The basics of scan scheduling appear to be sound. Several identical queries running at the same time only take marginally longer than a single query. However, the scheduler can impose a significant delay to the start of new queries that requires a significant change to the scheduler to fix. The cluster at in2p3 had configuration issues that kept mmap from working. Had the configuration issues been solved, there is insufficient memory (or our chunks are too large) to read in an adequate number of large table chunks to test memman efficiency. memman does appear to help in that limiting the number of chunks open at one time allowed the query to finish in about the same time as having several more chuncks open at the same time, but the system load was much lower. Once mmap can be used, it would be good to compare several duplicate queries running with mmap and without it. Worker scheduling to do: Switch scheduler from using a pair of heaps to using a list of buckets so that queries can start sooner. Make a better fix for memman to work with systems with low memory or where configuring to work with mmap is difficult. Low priority everything queue for slow queries and queries that don't really fit in an existing scheduler. code to detect and move slow queries to the everything queue.
        jgates John Gates made changes -
        Reviewers Andy Salnikov [ salnikov ]
        Status In Progress [ 3 ] In Review [ 10004 ]
        Hide
        salnikov Andy Salnikov added a comment -

        John, looks OK, few minor comments on PR. I can't say I understand this stuff but it looks like other Andy is willing to cover the logic part

        Show
        salnikov Andy Salnikov added a comment - John, looks OK, few minor comments on PR. I can't say I understand this stuff but it looks like other Andy is willing to cover the logic part
        salnikov Andy Salnikov made changes -
        Status In Review [ 10004 ] Reviewed [ 10101 ]
        jgates John Gates made changes -
        Resolution Done [ 10000 ]
        Status Reviewed [ 10101 ] Done [ 10002 ]
        Hide
        jgates John Gates added a comment -

        For queries joining Object and Source, reserving 10 gigabytes of RAM for memman resulted in the fastest time for a single query (5 hours 37 min), only about 3 minutes faster than reserving 12 gigs, and 10 minutes faster than reserving 16 gigs. The workers only have 16 gigabytes of RAM, so 12 and 16 gigs would likely cause issues if mmap was being used. Below 10gigs being reserved, only a couple of chunks could fit in memory at a time and performance was much slower. It appears that the system works better when the workers have lots of memory and the chunks are small enough that several can fit in memory at the same time.

        Show
        jgates John Gates added a comment - For queries joining Object and Source, reserving 10 gigabytes of RAM for memman resulted in the fastest time for a single query (5 hours 37 min), only about 3 minutes faster than reserving 12 gigs, and 10 minutes faster than reserving 16 gigs. The workers only have 16 gigabytes of RAM, so 12 and 16 gigs would likely cause issues if mmap was being used. Below 10gigs being reserved, only a couple of chunks could fit in memory at a time and performance was much slower. It appears that the system works better when the workers have lots of memory and the chunks are small enough that several can fit in memory at the same time.

          People

          Assignee:
          jgates John Gates
          Reporter:
          fritzm Fritz Mueller
          Reviewers:
          Andy Salnikov
          Watchers:
          Andy Salnikov, Fritz Mueller, John Gates
          Votes:
          0 Vote for this issue
          Watchers:
          3 Start watching this issue

            Dates

            Created:
            Updated:
            Resolved: