Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-28154

Test APDB Cassandra prototype with 6 server nodes

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Story Points:
      8
    • Sprint:
      DB_S21_12
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      With a baseline established by DM-28136 it's time to move and see how things scale with the extended cluster size. For this round of tests I want to use 6 nodes on server side, nodes are identical to the one already defined, I don't want very large data set yet with the existing SSD config we should be OK to run ~120k visits.

        Attachments

        1. apdb-gcp2-latency-diasource.png
          232 kB
          Andy Salnikov
        2. apdb-gcp2-nb-cpu-time-select-fit.png
          83 kB
          Andy Salnikov
        3. apdb-gcp2-nb-time-select-fit.png
          87 kB
          Andy Salnikov
        4. apdb-gcp2-nb-time-store-fit.png
          62 kB
          Andy Salnikov
        5. apdb-gcp3-latency-diaobjectlast.png
          113 kB
          Andy Salnikov
        6. apdb-gcp3-nb-cpu-time-select-fit.png
          75 kB
          Andy Salnikov
        7. apdb-gcp3-nb-time-select-fit.png
          90 kB
          Andy Salnikov
        8. apdb-gcp3-nb-time-store-fit.png
          69 kB
          Andy Salnikov
        9. apdb-gcp3-system-loadaverage.png
          136 kB
          Andy Salnikov

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            OK, here it goes:

            $ nodetool status
            Datacenter: datacenter1
            =======================
            Status=Up/Down
            |/ State=Normal/Leaving/Joining/Moving
            --  Address       Load        Tokens  Owns (effective)  Host ID                               Rack
            UN  10.128.0.116  553.12 KiB  256     ?                 9dea29d2-81e1-47f7-b090-790273382b5f  rack1
            UN  10.128.0.114  437.55 KiB  256     ?                 dc3c9e53-b336-4cde-9a68-0c1ecc92aa18  rack1
            UN  10.128.0.115  80.01 MiB   256     ?                 36ddf99b-742a-48d4-a1b7-e82c8140d189  rack1
            UN  10.128.0.110  304.19 KiB  256     ?                 ba56d50c-eb46-407b-9a9f-45452a11550b  rack1
            UN  10.128.0.34   382.9 KiB   256     ?                 5b298a67-c59b-4b3c-8e9a-00263e14d0a6  rack1
            UN  10.128.0.30   232.81 KiB  256     ?                 e0be23ed-2516-45c5-88c5-bbdf9ffe9481  rack1
            

            Started short run with 1k visits, no problems so far after 50 visits.

            Show
            salnikov Andy Salnikov added a comment - OK, here it goes: $ nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.128.0.116 553.12 KiB 256 ? 9dea29d2-81e1-47f7-b090-790273382b5f rack1 UN 10.128.0.114 437.55 KiB 256 ? dc3c9e53-b336-4cde-9a68-0c1ecc92aa18 rack1 UN 10.128.0.115 80.01 MiB 256 ? 36ddf99b-742a-48d4-a1b7-e82c8140d189 rack1 UN 10.128.0.110 304.19 KiB 256 ? ba56d50c-eb46-407b-9a9f-45452a11550b rack1 UN 10.128.0.34 382.9 KiB 256 ? 5b298a67-c59b-4b3c-8e9a-00263e14d0a6 rack1 UN 10.128.0.30 232.81 KiB 256 ? e0be23ed-2516-45c5-88c5-bbdf9ffe9481 rack1 Started short run with 1k visits, no problems so far after 50 visits.
            Hide
            salnikov Andy Salnikov added a comment -

            Starting post mortem for this test, it was a partial success but there were hardware problems with server machine (bad disk) which caused some issues. Still, some results are usable and some conclusions can be made even with that.

            I planned to run this test with 100k visits but at some point client log started showing timeouts and exceptions and things started to run very slowly so I had to stop the test to try to understand what is going on. Apparently there was a problem with one of the four SSDs on apdb-server-6 machine which caused all sorts of problem for Cassandra.

            First errors for SSD appeared in kern.log on 2020-12-25 00:15:57Z:

            Dec 25 00:15:57 apdb-server-6 kernel: [172184.641277] blk_update_request: critical medium error, dev nvme0n1, sector 400547320 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0
            Dec 25 00:15:57 apdb-server-6 kernel: [172184.702759] blk_update_request: critical medium error, dev nvme0n1, sector 400592392 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0
            Dec 25 00:16:18 apdb-server-6 kernel: [172206.246250] blk_update_request: critical medium error, dev nvme0n1, sector 400575952 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0
            Dec 25 00:16:19 apdb-server-6 kernel: [172206.287305] blk_update_request: critical medium error, dev nvme0n1, sector 400571912 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0
            

            I think first errors appeared in client log at 2020-12-25 00:16:31Z, during visit 75923:

            2020-12-25 00:16:31,995 [ERROR] ap_proto: Exception in visit processing: errors={'10.128.0.30:9042': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=10.128.0.30:9042
            Traceback (most recent call last):
              File "/net/10.65.206.122/apdb_vol1/apdb/apdb-gcloud/l1dbproto/python/lsst/l1dbproto/ap_proto.py", line 429, in run_mpi_tile
                self.visit(db, visit_id, dt, region, sources, indices, tile)
              File "/net/10.65.206.122/apdb_vol1/apdb/apdb-gcloud/l1dbproto/python/lsst/l1dbproto/ap_proto.py", line 521, in visit
                db.storeDiaObjects(objects, dt, pos_func.pos_func_diaobj)
              File "/net/10.65.206.122/apdb_vol1/apdb/apdb-gcloud/dax_apdb/python/lsst/dax/apdb/apdbCassandra.py", line 610, in storeDiaObjects
                self._storeObjectsAfw(objs, "DiaObjectLast", dt, pos_func, extra_columns=extra_columns)
              File "/net/10.65.206.122/apdb_vol1/apdb/apdb-gcloud/dax_apdb/python/lsst/dax/apdb/apdbCassandra.py", line 854, in _storeObjectsAfw
                self._session.execute(queries)
              File "cassandra/cluster.py", line 2618, in cassandra.cluster.Session.execute
              File "cassandra/cluster.py", line 4877, in cassandra.cluster.ResponseFuture.result
            cassandra.OperationTimedOut: errors={'10.128.0.30:9042': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=10.128.0.30:9042
            

            Cluster state after I stopped the test:

            # nodetool status
            Datacenter: datacenter1
            =======================
            Status=Up/Down
            |/ State=Normal/Leaving/Joining/Moving
            --  Address       Load        Tokens  Owns (effective)  Host ID                               Rack
            UN  10.128.0.116  784.44 GiB  256     ?                 9dea29d2-81e1-47f7-b090-790273382b5f  rack1
            UN  10.128.0.114  907.19 GiB  256     ?                 dc3c9e53-b336-4cde-9a68-0c1ecc92aa18  rack1
            UN  10.128.0.115  914.94 GiB  256     ?                 36ddf99b-742a-48d4-a1b7-e82c8140d189  rack1
            UN  10.128.0.110  870.05 GiB  256     ?                 ba56d50c-eb46-407b-9a9f-45452a11550b  rack1
            UN  10.128.0.34   817.03 GiB  256     ?                 5b298a67-c59b-4b3c-8e9a-00263e14d0a6  rack1
            UN  10.128.0.30   894.46 GiB  256     ?                 e0be23ed-2516-45c5-88c5-bbdf9ffe9481  rack1
            

            I think first 75k visits should be OK and I will draw some results from that sample.

            Show
            salnikov Andy Salnikov added a comment - Starting post mortem for this test, it was a partial success but there were hardware problems with server machine (bad disk) which caused some issues. Still, some results are usable and some conclusions can be made even with that. I planned to run this test with 100k visits but at some point client log started showing timeouts and exceptions and things started to run very slowly so I had to stop the test to try to understand what is going on. Apparently there was a problem with one of the four SSDs on apdb-server-6 machine which caused all sorts of problem for Cassandra. First errors for SSD appeared in kern.log on 2020-12-25 00:15:57Z: Dec 25 00:15:57 apdb-server-6 kernel: [172184.641277] blk_update_request: critical medium error, dev nvme0n1, sector 400547320 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0 Dec 25 00:15:57 apdb-server-6 kernel: [172184.702759] blk_update_request: critical medium error, dev nvme0n1, sector 400592392 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0 Dec 25 00:16:18 apdb-server-6 kernel: [172206.246250] blk_update_request: critical medium error, dev nvme0n1, sector 400575952 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0 Dec 25 00:16:19 apdb-server-6 kernel: [172206.287305] blk_update_request: critical medium error, dev nvme0n1, sector 400571912 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0 I think first errors appeared in client log at 2020-12-25 00:16:31Z, during visit 75923: 2020-12-25 00:16:31,995 [ERROR] ap_proto: Exception in visit processing: errors={'10.128.0.30:9042': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=10.128.0.30:9042 Traceback (most recent call last): File "/net/10.65.206.122/apdb_vol1/apdb/apdb-gcloud/l1dbproto/python/lsst/l1dbproto/ap_proto.py", line 429, in run_mpi_tile self.visit(db, visit_id, dt, region, sources, indices, tile) File "/net/10.65.206.122/apdb_vol1/apdb/apdb-gcloud/l1dbproto/python/lsst/l1dbproto/ap_proto.py", line 521, in visit db.storeDiaObjects(objects, dt, pos_func.pos_func_diaobj) File "/net/10.65.206.122/apdb_vol1/apdb/apdb-gcloud/dax_apdb/python/lsst/dax/apdb/apdbCassandra.py", line 610, in storeDiaObjects self._storeObjectsAfw(objs, "DiaObjectLast", dt, pos_func, extra_columns=extra_columns) File "/net/10.65.206.122/apdb_vol1/apdb/apdb-gcloud/dax_apdb/python/lsst/dax/apdb/apdbCassandra.py", line 854, in _storeObjectsAfw self._session.execute(queries) File "cassandra/cluster.py", line 2618, in cassandra.cluster.Session.execute File "cassandra/cluster.py", line 4877, in cassandra.cluster.ResponseFuture.result cassandra.OperationTimedOut: errors={'10.128.0.30:9042': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=10.128.0.30:9042 Cluster state after I stopped the test: # nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.128.0.116 784.44 GiB 256 ? 9dea29d2-81e1-47f7-b090-790273382b5f rack1 UN 10.128.0.114 907.19 GiB 256 ? dc3c9e53-b336-4cde-9a68-0c1ecc92aa18 rack1 UN 10.128.0.115 914.94 GiB 256 ? 36ddf99b-742a-48d4-a1b7-e82c8140d189 rack1 UN 10.128.0.110 870.05 GiB 256 ? ba56d50c-eb46-407b-9a9f-45452a11550b rack1 UN 10.128.0.34 817.03 GiB 256 ? 5b298a67-c59b-4b3c-8e9a-00263e14d0a6 rack1 UN 10.128.0.30 894.46 GiB 256 ? e0be23ed-2516-45c5-88c5-bbdf9ffe9481 rack1 I think first 75k visits should be OK and I will draw some results from that sample.
            Hide
            salnikov Andy Salnikov added a comment -

            Summary of the results for this round of tests.

            Setup:

            • 6 server nodes, 32 cores, 64GB RAM each
            • 4x375GiB local SSD storage on each node
            • replication factor 3, QUORUM consistency for reads and writes
            • 6 client machines, 32 cores each
            • 75k visits generated, storage errors after that

            Select time as a function of visit number:

            Store time:

            Comparing with 3-node cluster for reading performance, numbers are for per-CCD real time per query.

              3 nodes 6 nodes
            obj_select_real at 50k visits, sec 0.52 0.32
            src_select_real at 50k visits, sec 2.8 2.1
            fsrc_select_real at 50k visits, sec 1.4 0.7

            There is some improvement in numbers but it is not an expected 2x times improvement. I think that is because there is a significant overhead on client side for converting query results into a its final format. Here are the CPU time per corresponding query:

              3 nodes 6 nodes
            obj_select_cpu at 50k visits, sec 0.17 0.16
            src_select_cpu at 50k visits, sec 0.94 1.13
            fsrc_select_cpu at 50k visits, sec 0.43 0.49

            Better measure for performance may be a difference between real and CPU times:

              3 nodes 6 nodes
            obj_select_cpu at 50k visits, sec 0.35 0.16
            src_select_cpu at 50k visits, sec 1.9 0.97
            fsrc_select_cpu at 50k visits, sec 0.97 0.21

            Here the improvement is more dramatic, factor of 2 or better (but the number are not very precise of course).

            I think I need to optimize client side to make overhead much smaller. I am now converting results to afw.Table like I did in older SQL code, and I think AP really wants to use pandas. I did not try to optimize afw conversion at all but there maybe ways to make that faster and for pandas I have found one recipe for optimal Cassandra performance: https://groups.google.com/a/lists.datastax.com/g/python-driver-user/c/1v-KHtyA0Zs

            One more interesting observation - looking at JMX metrics from Cassandra I noticed that write latency behaves differently on different nodes, e.g. DiaSource latency:

            On this plot one node (apdb-server-1) has lower latency that all other nodes, and one node (apdb-server-4) has significantly higher latency. This looks odd as my impression is that all nodes should look more or less similar. Not sure what it means, just a note to possibly investigate later.

            Show
            salnikov Andy Salnikov added a comment - Summary of the results for this round of tests. Setup: 6 server nodes, 32 cores, 64GB RAM each 4x375GiB local SSD storage on each node replication factor 3, QUORUM consistency for reads and writes 6 client machines, 32 cores each 75k visits generated, storage errors after that Select time as a function of visit number: Store time: Comparing with 3-node cluster for reading performance, numbers are for per-CCD real time per query.   3 nodes 6 nodes obj_select_real at 50k visits, sec 0.52 0.32 src_select_real at 50k visits, sec 2.8 2.1 fsrc_select_real at 50k visits, sec 1.4 0.7 There is some improvement in numbers but it is not an expected 2x times improvement. I think that is because there is a significant overhead on client side for converting query results into a its final format. Here are the CPU time per corresponding query:   3 nodes 6 nodes obj_select_cpu at 50k visits, sec 0.17 0.16 src_select_cpu at 50k visits, sec 0.94 1.13 fsrc_select_cpu at 50k visits, sec 0.43 0.49 Better measure for performance may be a difference between real and CPU times:   3 nodes 6 nodes obj_select_cpu at 50k visits, sec 0.35 0.16 src_select_cpu at 50k visits, sec 1.9 0.97 fsrc_select_cpu at 50k visits, sec 0.97 0.21 Here the improvement is more dramatic, factor of 2 or better (but the number are not very precise of course). I think I need to optimize client side to make overhead much smaller. I am now converting results to afw.Table like I did in older SQL code, and I think AP really wants to use pandas. I did not try to optimize afw conversion at all but there maybe ways to make that faster and for pandas I have found one recipe for optimal Cassandra performance: https://groups.google.com/a/lists.datastax.com/g/python-driver-user/c/1v-KHtyA0Zs One more interesting observation - looking at JMX metrics from Cassandra I noticed that write latency behaves differently on different nodes, e.g. DiaSource latency: On this plot one node (apdb-server-1) has lower latency that all other nodes, and one node (apdb-server-4) has significantly higher latency. This looks odd as my impression is that all nodes should look more or less similar. Not sure what it means, just a note to possibly investigate later.
            Hide
            salnikov Andy Salnikov added a comment -

            I implemented pandas support for Cassandra APDB implementation and added more efficient transform of the query result to pandas. With that I ran it in the same setup with 100k visits, results will be added below. Here is the cluster status right before I stop server nodes:

            andy_salnikov_gmail_com@apdb-server-1:~$ nodetool status
            Datacenter: datacenter1
            =======================
            Status=Up/Down
            |/ State=Normal/Leaving/Joining/Moving
            --  Address       Load         Tokens  Owns (effective)  Host ID                               Rack
            UN  10.128.0.116  1008.36 GiB  256     ?                 222e6f35-e667-4cba-a507-a45c7ed804cc  rack1
            UN  10.128.0.114  1.03 TiB     256     ?                 5d4b4b7b-6d75-4cb6-a419-ec7c4a518b29  rack1
            UN  10.128.0.115  987.94 GiB   256     ?                 861dff58-1ce4-415e-94ac-3cf770180d90  rack1
            UN  10.128.0.110  968.93 GiB   256     ?                 a35b0ec7-9940-4dc3-bced-0a842abb23f8  rack1
            UN  10.128.0.34   947.9 GiB    256     ?                 1dff0ff1-530c-4d3d-aee8-54cfcf9c7c0e  rack1
            UN  10.128.0.30   1017.47 GiB  256     ?                 70e974f9-d135-42af-8f11-85066c705198  rack1
            

            And data disk usage:

            (lsst-scipipe-4f18ecb) [andy_salnikov_gmail_com@apdb-client-1 apdb-gcloud]$ shmux -c "df -h /data/apdb*" -- apdb-server-{1..6}
            apdb-server-1: Filesystem      Size  Used Avail Use% Mounted on
            apdb-server-1: /dev/nvme0n1p1  375G  263G  113G  71% /data/apdb1
            apdb-server-1: /dev/nvme0n2p1  375G  264G  112G  71% /data/apdb2
            apdb-server-1: /dev/nvme0n3p1  375G  264G  112G  71% /data/apdb3
            apdb-server-1: /dev/nvme0n4p1  375G  262G  113G  70% /data/apdb4
            apdb-server-5: Filesystem      Size  Used Avail Use% Mounted on
            apdb-server-5: /dev/nvme0n1p1  375G  270G  106G  72% /data/apdb1
            apdb-server-5: /dev/nvme0n2p1  375G  268G  108G  72% /data/apdb2
            apdb-server-5: /dev/nvme0n3p1  375G  268G  107G  72% /data/apdb3
            apdb-server-5: /dev/nvme0n4p1  375G  269G  107G  72% /data/apdb4
            apdb-server-6: Filesystem      Size  Used Avail Use% Mounted on
            apdb-server-6: /dev/nvme0n1p1  375G  281G   95G  75% /data/apdb1
            apdb-server-6: /dev/nvme0n2p1  375G  273G  103G  73% /data/apdb2
            apdb-server-6: /dev/nvme0n3p1  375G  273G  103G  73% /data/apdb3
            apdb-server-6: /dev/nvme0n4p1  375G  274G  102G  73% /data/apdb4
            apdb-server-2: Filesystem      Size  Used Avail Use% Mounted on
            apdb-server-2: /dev/nvme0n1p1  375G  277G   99G  74% /data/apdb1
            apdb-server-2: /dev/nvme0n2p1  375G  274G  102G  73% /data/apdb2
            apdb-server-2: /dev/nvme0n3p1  375G  276G  100G  74% /data/apdb3
            apdb-server-2: /dev/nvme0n4p1  375G  279G   97G  75% /data/apdb4
            apdb-server-4: Filesystem      Size  Used Avail Use% Mounted on
            apdb-server-4: /dev/nvme0n1p1  375G  288G   88G  77% /data/apdb1
            apdb-server-4: /dev/nvme0n2p1  375G  286G   89G  77% /data/apdb2
            apdb-server-4: /dev/nvme0n3p1  375G  287G   89G  77% /data/apdb3
            apdb-server-4: /dev/nvme0n4p1  375G  287G   89G  77% /data/apdb4
            apdb-server-3: Filesystem      Size  Used Avail Use% Mounted on
            apdb-server-3: /dev/nvme0n1p1  375G  265G  111G  71% /data/apdb1
            apdb-server-3: /dev/nvme0n2p1  375G  258G  118G  69% /data/apdb2
            apdb-server-3: /dev/nvme0n3p1  375G  258G  118G  69% /data/apdb3
            apdb-server-3: /dev/nvme0n4p1  375G  254G  121G  68% /data/apdb4
            

            Show
            salnikov Andy Salnikov added a comment - I implemented pandas support for Cassandra APDB implementation and added more efficient transform of the query result to pandas. With that I ran it in the same setup with 100k visits, results will be added below. Here is the cluster status right before I stop server nodes: andy_salnikov_gmail_com@apdb-server-1:~$ nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.128.0.116 1008.36 GiB 256 ? 222e6f35-e667-4cba-a507-a45c7ed804cc rack1 UN 10.128.0.114 1.03 TiB 256 ? 5d4b4b7b-6d75-4cb6-a419-ec7c4a518b29 rack1 UN 10.128.0.115 987.94 GiB 256 ? 861dff58-1ce4-415e-94ac-3cf770180d90 rack1 UN 10.128.0.110 968.93 GiB 256 ? a35b0ec7-9940-4dc3-bced-0a842abb23f8 rack1 UN 10.128.0.34 947.9 GiB 256 ? 1dff0ff1-530c-4d3d-aee8-54cfcf9c7c0e rack1 UN 10.128.0.30 1017.47 GiB 256 ? 70e974f9-d135-42af-8f11-85066c705198 rack1 And data disk usage: (lsst-scipipe-4f18ecb) [andy_salnikov_gmail_com@apdb-client-1 apdb-gcloud]$ shmux -c "df -h /data/apdb*" -- apdb-server-{1..6} apdb-server-1: Filesystem Size Used Avail Use% Mounted on apdb-server-1: /dev/nvme0n1p1 375G 263G 113G 71% /data/apdb1 apdb-server-1: /dev/nvme0n2p1 375G 264G 112G 71% /data/apdb2 apdb-server-1: /dev/nvme0n3p1 375G 264G 112G 71% /data/apdb3 apdb-server-1: /dev/nvme0n4p1 375G 262G 113G 70% /data/apdb4 apdb-server-5: Filesystem Size Used Avail Use% Mounted on apdb-server-5: /dev/nvme0n1p1 375G 270G 106G 72% /data/apdb1 apdb-server-5: /dev/nvme0n2p1 375G 268G 108G 72% /data/apdb2 apdb-server-5: /dev/nvme0n3p1 375G 268G 107G 72% /data/apdb3 apdb-server-5: /dev/nvme0n4p1 375G 269G 107G 72% /data/apdb4 apdb-server-6: Filesystem Size Used Avail Use% Mounted on apdb-server-6: /dev/nvme0n1p1 375G 281G 95G 75% /data/apdb1 apdb-server-6: /dev/nvme0n2p1 375G 273G 103G 73% /data/apdb2 apdb-server-6: /dev/nvme0n3p1 375G 273G 103G 73% /data/apdb3 apdb-server-6: /dev/nvme0n4p1 375G 274G 102G 73% /data/apdb4 apdb-server-2: Filesystem Size Used Avail Use% Mounted on apdb-server-2: /dev/nvme0n1p1 375G 277G 99G 74% /data/apdb1 apdb-server-2: /dev/nvme0n2p1 375G 274G 102G 73% /data/apdb2 apdb-server-2: /dev/nvme0n3p1 375G 276G 100G 74% /data/apdb3 apdb-server-2: /dev/nvme0n4p1 375G 279G 97G 75% /data/apdb4 apdb-server-4: Filesystem Size Used Avail Use% Mounted on apdb-server-4: /dev/nvme0n1p1 375G 288G 88G 77% /data/apdb1 apdb-server-4: /dev/nvme0n2p1 375G 286G 89G 77% /data/apdb2 apdb-server-4: /dev/nvme0n3p1 375G 287G 89G 77% /data/apdb3 apdb-server-4: /dev/nvme0n4p1 375G 287G 89G 77% /data/apdb4 apdb-server-3: Filesystem Size Used Avail Use% Mounted on apdb-server-3: /dev/nvme0n1p1 375G 265G 111G 71% /data/apdb1 apdb-server-3: /dev/nvme0n2p1 375G 258G 118G 69% /data/apdb2 apdb-server-3: /dev/nvme0n3p1 375G 258G 118G 69% /data/apdb3 apdb-server-3: /dev/nvme0n4p1 375G 254G 121G 68% /data/apdb4
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Summary of the results:

            • setup is the same as above
            • switched to pandas, client CPU time looks much better
            • run 100k visits, no issues with storage on any node

            Select time as a function of visit:

            and corresponding CPU time on client side for selects:

            Store time looks slightly worse, not clear why yet, but effect is not too big (it may be due to interaction between reads and writes):

            Some tables:

            Real time for selects:

              3 nodes 6 nodes, afw 6 nodes, pandas
            obj_select_real at 50k visits, sec 0.52 0.32 0.13
            src_select_real at 50k visits, sec 2.8 2.1 1.03
            fsrc_select_real at 50k visits, sec 1.4 0.7 0.51

            CPU time:

              3 nodes 6 nodes, afw 6 nodes, pandas
            obj_select_cpu at 50k visits, sec 0.17 0.16 0.014
            src_select_cpu at 50k visits, sec 0.94 1.13 0.217
            fsrc_select_cpu at 50k visits, sec 0.43 0.49 0.105

            Difference between real and CPU times:

              3 nodes 6 nodes, afw 6 nodes, pandas
            obj_select at 50k visits, sec 0.35 0.16 0.12
            src_select at 50k visits, sec 1.9 0.97 1.01
            fsrc_select at 50k visits, sec 0.97 0.21 0.4

            With that sort of precision I think it's reasonable to say that reading performance improves as expected with the cluster size. Time grows linearly with the visit number also as expected. Next step would be to grow cluster again and see how it scales, and try longer run (1 year worth of visits).

            Show
            salnikov Andy Salnikov added a comment - - edited Summary of the results: setup is the same as above switched to pandas, client CPU time looks much better run 100k visits, no issues with storage on any node Select time as a function of visit: and corresponding CPU time on client side for selects: Store time looks slightly worse, not clear why yet, but effect is not too big (it may be due to interaction between reads and writes): Some tables: Real time for selects:   3 nodes 6 nodes, afw 6 nodes, pandas obj_select_real at 50k visits, sec 0.52 0.32 0.13 src_select_real at 50k visits, sec 2.8 2.1 1.03 fsrc_select_real at 50k visits, sec 1.4 0.7 0.51 CPU time:   3 nodes 6 nodes, afw 6 nodes, pandas obj_select_cpu at 50k visits, sec 0.17 0.16 0.014 src_select_cpu at 50k visits, sec 0.94 1.13 0.217 fsrc_select_cpu at 50k visits, sec 0.43 0.49 0.105 Difference between real and CPU times:   3 nodes 6 nodes, afw 6 nodes, pandas obj_select at 50k visits, sec 0.35 0.16 0.12 src_select at 50k visits, sec 1.9 0.97 1.01 fsrc_select at 50k visits, sec 0.97 0.21 0.4 With that sort of precision I think it's reasonable to say that reading performance improves as expected with the cluster size. Time grows linearly with the visit number also as expected. Next step would be to grow cluster again and see how it scales, and try longer run (1 year worth of visits).
            Hide
            salnikov Andy Salnikov added a comment -

            One more observation looking at JMX monitoring. Write latency on one host (apdb-server-2) is again higher than all other hosts:

            System load is also higher on the same host:

            Not clear what is going on, I suspect that one of the nodes is elected as some sort of coordinator, maybe it is related to that. Will try to look at Cassandra logs to see if there is anything interesting there.

            Show
            salnikov Andy Salnikov added a comment - One more observation looking at JMX monitoring. Write latency on one host (apdb-server-2) is again higher than all other hosts: System load is also higher on the same host: Not clear what is going on, I suspect that one of the nodes is elected as some sort of coordinator, maybe it is related to that. Will try to look at Cassandra logs to see if there is anything interesting there.
            Hide
            salnikov Andy Salnikov added a comment -

            I looked at the logs and additional JMX metrics but there is no smoking gun. I believe all clients are connected to every server so there should not be a single coordinating server. I'll try to see what happens in the next round of tests maybe there is something that I can see on the live system.

            Closing this ticket for now, moving to the next test.

            Show
            salnikov Andy Salnikov added a comment - I looked at the logs and additional JMX metrics but there is no smoking gun. I believe all clients are connected to every server so there should not be a single coordinating server. I'll try to see what happens in the next round of tests maybe there is something that I can see on the live system. Closing this ticket for now, moving to the next test.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Watchers:
              Andy Salnikov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.