Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-28820

Test APDB Cassandra prototype with packed records

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Story Points:
      8
    • Sprint:
      DB_S21_12
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      One more test to run with APDB on GCP. It is interesting to see if we can gain anything by replacing large number of columns/cells in DiaSource records with a packed presentation with a BLOB-like column. One clear benefit of that would be that server does need to manage storage for all those columns and that should reduce processing time on server side. Possible drawbacks are that client side will need to spend CPU on packing/unpacking the data and record size can grow depending on the packing format. Most flexible format could look like JSON (preferably in binary form, something like BSON) but that adds a lot of overhead because each column name should be present in a record. More compact format could be something like protobuf which maps filed names to indices on client side but that needs very careful management and evolution of the schema. Cassandra uses compression on server side, so JSON overhead may not be that large,and this is what I want to check as well.

        Attachments

          Activity

          Hide
          salnikov Andy Salnikov added a comment - - edited

          This whole run was done with varying options, here is the sequence for the record:

          • visits up to 42738:
            • using BLOBs packed with CBOR
            • dumb unpacking implementation, every returned record (which is a tuple) has packed field removed, unpacked, and then all unpacked attributes are appended to the record, all is done at Python level, so super-inefficient
          • visits 42739 to 85525:
            • somewhat optimized unpacking of BLOBs to pandas dataframe
            • in one loop over records make a list of all unpacked dicts); make one dataframe from returned records but ignore BLOB data; make second dataframe from the list of unpacked dicts; concatenate two dataframes.
            • still seems that CPU for conversion dominates overall reading time
          • visits 85526 to 129252
            • turned off conversion of sources to pandas (sources are not used in ap_proto, I only read them for timing), BLOBs are also left unpacked
            • this is to check what we could expect if we make pandas conversion super-efficient (or replace pandas with something else)
          • visits 129253 to 139352
            • switched to query-per-spatial partition (for temporal partitions I use separate tables in this test)
          Show
          salnikov Andy Salnikov added a comment - - edited This whole run was done with varying options, here is the sequence for the record: visits up to 42738: using BLOBs packed with CBOR dumb unpacking implementation, every returned record (which is a tuple) has packed field removed, unpacked, and then all unpacked attributes are appended to the record, all is done at Python level, so super-inefficient visits 42739 to 85525: somewhat optimized unpacking of BLOBs to pandas dataframe in one loop over records make a list of all unpacked dicts); make one dataframe from returned records but ignore BLOB data; make second dataframe from the list of unpacked dicts; concatenate two dataframes. still seems that CPU for conversion dominates overall reading time visits 85526 to 129252 turned off conversion of sources to pandas (sources are not used in ap_proto, I only read them for timing), BLOBs are also left unpacked this is to check what we could expect if we make pandas conversion super-efficient (or replace pandas with something else) visits 129253 to 139352 switched to query-per-spatial partition (for temporal partitions I use separate tables in this test)
          Hide
          salnikov Andy Salnikov added a comment -

          Stopping the test after 139352 visits generated.

          andy_salnikov_gmail_com@apdb-server-1:~$ nodetool status
          Datacenter: datacenter1
          =======================
          Status=Up/Down
          |/ State=Normal/Leaving/Joining/Moving
          --  Address       Load      Tokens  Owns (effective)  Host ID                               Rack
          UN  10.128.0.118  1.35 TiB  256     ?                 52884083-091e-4321-8086-59d98848b6d2  rack1
          UN  10.128.0.91   1.32 TiB  256     ?                 c1cc8f80-d4b5-49ce-b57a-de6e3dea1a38  rack1
          UN  10.128.0.119  1.23 TiB  256     ?                 12469887-ec7a-4681-889b-4b29745f4488  rack1
          UN  10.128.0.103  1.24 TiB  256     ?                 1a49a9d8-003e-4ef4-87db-6bba2e91c214  rack1
          UN  10.128.0.101  1.33 TiB  256     ?                 2afee2f5-a4ab-464d-b874-c9c7f09f7456  rack1
          UN  10.128.0.57   1.29 TiB  256     ?                 ae31e318-27f9-4503-9a98-d0bfb82d8601  rack1
          UN  10.128.0.123  1.31 TiB  256     ?                 a3ada1f5-7cc9-4136-946b-6d23d8c8e087  rack1
          UN  10.128.0.121  1.41 TiB  256     ?                 8bc1da41-c923-467b-80f7-d9295fee14e5  rack1
          UN  10.128.0.122  1.27 TiB  256     ?                 cb36cd32-7df8-4ba8-9a8f-9155d72bc994  rack1
          UN  10.128.0.100  1.26 TiB  256     ?                 179b107f-e813-4e2d-be63-43e7a97e25b4  rack1
          UN  10.128.0.52   1.33 TiB  256     ?                 7d2f5695-b8d0-405c-b695-42b1a6977261  rack1
          UN  10.128.0.37   1.33 TiB  256     ?                 5cceed5b-8482-47d7-ba21-e7a9e8edf21b  rack1
          

          Total 15.67 TiB of data.

          Per node:

          (lsst-scipipe-4f18ecb) [andy_salnikov_gmail_com@apdb-client-1 apdb-gcloud]$ shmux -m -s -c "df -h /data/apdb*" -- apdb-server-{1..12}
           apdb-server-1: Filesystem      Size  Used Avail Use% Mounted on
           apdb-server-1: /dev/nvme0n1p1  375G  179G  196G  48% /data/apdb1
           apdb-server-1: /dev/nvme0n2p1  375G  171G  205G  46% /data/apdb2
           apdb-server-1: /dev/nvme0n3p1  375G  170G  205G  46% /data/apdb3
           apdb-server-1: /dev/nvme0n4p1  375G  173G  203G  46% /data/apdb4
           apdb-server-1: /dev/nvme0n5p1  375G  168G  207G  45% /data/apdb5
           apdb-server-1: /dev/nvme0n6p1  375G  170G  205G  46% /data/apdb6
           apdb-server-1: /dev/nvme0n7p1  375G  171G  205G  46% /data/apdb7
           apdb-server-1: /dev/nvme0n8p1  375G  171G  205G  46% /data/apdb8
           apdb-server-7: Filesystem      Size  Used Avail Use% Mounted on
           apdb-server-7: /dev/nvme0n1p1  375G  183G  193G  49% /data/apdb1
           apdb-server-7: /dev/nvme0n2p1  375G  173G  203G  46% /data/apdb2
           apdb-server-7: /dev/nvme0n3p1  375G  172G  204G  46% /data/apdb3
           apdb-server-7: /dev/nvme0n4p1  375G  175G  201G  47% /data/apdb4
           apdb-server-7: /dev/nvme0n5p1  375G  171G  205G  46% /data/apdb5
           apdb-server-7: /dev/nvme0n6p1  375G  172G  204G  46% /data/apdb6
           apdb-server-7: /dev/nvme0n7p1  375G  175G  200G  47% /data/apdb7
           apdb-server-7: /dev/nvme0n8p1  375G  175G  201G  47% /data/apdb8
           apdb-server-9: Filesystem      Size  Used Avail Use% Mounted on
           apdb-server-9: /dev/nvme0n1p1  375G  164G  212G  44% /data/apdb1
           apdb-server-9: /dev/nvme0n2p1  375G  160G  216G  43% /data/apdb2
           apdb-server-9: /dev/nvme0n3p1  375G  161G  215G  43% /data/apdb3
           apdb-server-9: /dev/nvme0n4p1  375G  159G  217G  43% /data/apdb4
           apdb-server-9: /dev/nvme0n5p1  375G  160G  216G  43% /data/apdb5
           apdb-server-9: /dev/nvme0n6p1  375G  161G  215G  43% /data/apdb6
           apdb-server-9: /dev/nvme0n7p1  375G  161G  215G  43% /data/apdb7
           apdb-server-9: /dev/nvme0n8p1  375G  160G  216G  43% /data/apdb8
          apdb-server-11: Filesystem      Size  Used Avail Use% Mounted on
          apdb-server-11: /dev/nvme0n1p1  375G  167G  209G  45% /data/apdb1
          apdb-server-11: /dev/nvme0n2p1  375G  163G  213G  44% /data/apdb2
          apdb-server-11: /dev/nvme0n3p1  375G  164G  212G  44% /data/apdb3
          apdb-server-11: /dev/nvme0n4p1  375G  161G  215G  43% /data/apdb4
          apdb-server-11: /dev/nvme0n5p1  375G  164G  212G  44% /data/apdb5
          apdb-server-11: /dev/nvme0n6p1  375G  163G  213G  44% /data/apdb6
          apdb-server-11: /dev/nvme0n7p1  375G  164G  212G  44% /data/apdb7
          apdb-server-11: /dev/nvme0n8p1  375G  166G  210G  45% /data/apdb8
           apdb-server-5: Filesystem      Size  Used Avail Use% Mounted on
           apdb-server-5: /dev/nvme0n1p1  375G  167G  208G  45% /data/apdb1
           apdb-server-5: /dev/nvme0n2p1  375G  162G  214G  44% /data/apdb2
           apdb-server-5: /dev/nvme0n3p1  375G  161G  215G  43% /data/apdb3
           apdb-server-5: /dev/nvme0n4p1  375G  161G  214G  43% /data/apdb4
           apdb-server-5: /dev/nvme0n5p1  375G  160G  216G  43% /data/apdb5
           apdb-server-5: /dev/nvme0n6p1  375G  160G  216G  43% /data/apdb6
           apdb-server-5: /dev/nvme0n7p1  375G  162G  213G  44% /data/apdb7
           apdb-server-5: /dev/nvme0n8p1  375G  163G  212G  44% /data/apdb8
           apdb-server-3: Filesystem      Size  Used Avail Use% Mounted on
           apdb-server-3: /dev/nvme0n1p1  375G  176G  200G  47% /data/apdb1
           apdb-server-3: /dev/nvme0n2p1  375G  166G  210G  45% /data/apdb2
           apdb-server-3: /dev/nvme0n3p1  375G  166G  209G  45% /data/apdb3
           apdb-server-3: /dev/nvme0n4p1  375G  164G  212G  44% /data/apdb4
           apdb-server-3: /dev/nvme0n5p1  375G  167G  209G  45% /data/apdb5
           apdb-server-3: /dev/nvme0n6p1  375G  165G  211G  44% /data/apdb6
           apdb-server-3: /dev/nvme0n7p1  375G  166G  210G  45% /data/apdb7
           apdb-server-3: /dev/nvme0n8p1  375G  161G  215G  43% /data/apdb8
           apdb-server-6: Filesystem      Size  Used Avail Use% Mounted on
           apdb-server-6: /dev/nvme0n1p1  375G  175G  200G  47% /data/apdb1
           apdb-server-6: /dev/nvme0n2p1  375G  173G  203G  47% /data/apdb2
           apdb-server-6: /dev/nvme0n3p1  375G  171G  205G  46% /data/apdb3
           apdb-server-6: /dev/nvme0n4p1  375G  172G  204G  46% /data/apdb4
           apdb-server-6: /dev/nvme0n5p1  375G  171G  205G  46% /data/apdb5
           apdb-server-6: /dev/nvme0n6p1  375G  173G  202G  47% /data/apdb6
           apdb-server-6: /dev/nvme0n7p1  375G  172G  204G  46% /data/apdb7
           apdb-server-6: /dev/nvme0n8p1  375G  172G  204G  46% /data/apdb8
           apdb-server-2: Filesystem      Size  Used Avail Use% Mounted on
           apdb-server-2: /dev/nvme0n1p1  375G  179G  196G  48% /data/apdb1
           apdb-server-2: /dev/nvme0n2p1  375G  171G  205G  46% /data/apdb2
           apdb-server-2: /dev/nvme0n3p1  375G  171G  205G  46% /data/apdb3
           apdb-server-2: /dev/nvme0n4p1  375G  174G  202G  47% /data/apdb4
           apdb-server-2: /dev/nvme0n5p1  375G  170G  206G  46% /data/apdb5
           apdb-server-2: /dev/nvme0n6p1  375G  172G  204G  46% /data/apdb6
           apdb-server-2: /dev/nvme0n7p1  375G  170G  206G  46% /data/apdb7
           apdb-server-2: /dev/nvme0n8p1  375G  175G  201G  47% /data/apdb8
          apdb-server-10: Filesystem      Size  Used Avail Use% Mounted on
          apdb-server-10: /dev/nvme0n1p1  375G  186G  190G  50% /data/apdb1
          apdb-server-10: /dev/nvme0n2p1  375G  182G  193G  49% /data/apdb2
          apdb-server-10: /dev/nvme0n3p1  375G  185G  191G  50% /data/apdb3
          apdb-server-10: /dev/nvme0n4p1  375G  181G  195G  49% /data/apdb4
          apdb-server-10: /dev/nvme0n5p1  375G  187G  189G  50% /data/apdb5
          apdb-server-10: /dev/nvme0n6p1  375G  183G  193G  49% /data/apdb6
          apdb-server-10: /dev/nvme0n7p1  375G  182G  194G  49% /data/apdb7
          apdb-server-10: /dev/nvme0n8p1  375G  177G  199G  47% /data/apdb8
           apdb-server-4: Filesystem      Size  Used Avail Use% Mounted on
           apdb-server-4: /dev/nvme0n1p1  375G  176G  200G  47% /data/apdb1
           apdb-server-4: /dev/nvme0n2p1  375G  170G  206G  46% /data/apdb2
           apdb-server-4: /dev/nvme0n3p1  375G  172G  203G  46% /data/apdb3
           apdb-server-4: /dev/nvme0n4p1  375G  170G  206G  46% /data/apdb4
           apdb-server-4: /dev/nvme0n5p1  375G  171G  204G  46% /data/apdb5
           apdb-server-4: /dev/nvme0n6p1  375G  172G  204G  46% /data/apdb6
           apdb-server-4: /dev/nvme0n7p1  375G  171G  204G  46% /data/apdb7
           apdb-server-4: /dev/nvme0n8p1  375G  166G  209G  45% /data/apdb8
          apdb-server-12: Filesystem      Size  Used Avail Use% Mounted on
          apdb-server-12: /dev/nvme0n1p1  375G  177G  198G  48% /data/apdb1
          apdb-server-12: /dev/nvme0n2p1  375G  170G  206G  46% /data/apdb2
          apdb-server-12: /dev/nvme0n3p1  375G  168G  208G  45% /data/apdb3
          apdb-server-12: /dev/nvme0n4p1  375G  167G  209G  45% /data/apdb4
          apdb-server-12: /dev/nvme0n5p1  375G  169G  207G  46% /data/apdb5
          apdb-server-12: /dev/nvme0n6p1  375G  169G  207G  45% /data/apdb6
          apdb-server-12: /dev/nvme0n7p1  375G  169G  207G  45% /data/apdb7
          apdb-server-12: /dev/nvme0n8p1  375G  171G  205G  46% /data/apdb8
           apdb-server-8: Filesystem      Size  Used Avail Use% Mounted on
           apdb-server-8: /dev/nvme0n1p1  375G  164G  212G  44% /data/apdb1
           apdb-server-8: /dev/nvme0n2p1  375G  159G  217G  43% /data/apdb2
           apdb-server-8: /dev/nvme0n3p1  375G  155G  221G  42% /data/apdb3
           apdb-server-8: /dev/nvme0n4p1  375G  158G  218G  42% /data/apdb4
           apdb-server-8: /dev/nvme0n5p1  375G  158G  218G  43% /data/apdb5
           apdb-server-8: /dev/nvme0n6p1  375G  158G  218G  43% /data/apdb6
           apdb-server-8: /dev/nvme0n7p1  375G  157G  219G  42% /data/apdb7
           apdb-server-8: /dev/nvme0n8p1  375G  161G  215G  43% /data/apdb8
          

          Total 15.79 TiB.

           Pictures will follow, there is some interesting stuff.

          Show
          salnikov Andy Salnikov added a comment - Stopping the test after 139352 visits generated. andy_salnikov_gmail_com@apdb-server-1:~$ nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.128.0.118 1.35 TiB 256 ? 52884083-091e-4321-8086-59d98848b6d2 rack1 UN 10.128.0.91 1.32 TiB 256 ? c1cc8f80-d4b5-49ce-b57a-de6e3dea1a38 rack1 UN 10.128.0.119 1.23 TiB 256 ? 12469887-ec7a-4681-889b-4b29745f4488 rack1 UN 10.128.0.103 1.24 TiB 256 ? 1a49a9d8-003e-4ef4-87db-6bba2e91c214 rack1 UN 10.128.0.101 1.33 TiB 256 ? 2afee2f5-a4ab-464d-b874-c9c7f09f7456 rack1 UN 10.128.0.57 1.29 TiB 256 ? ae31e318-27f9-4503-9a98-d0bfb82d8601 rack1 UN 10.128.0.123 1.31 TiB 256 ? a3ada1f5-7cc9-4136-946b-6d23d8c8e087 rack1 UN 10.128.0.121 1.41 TiB 256 ? 8bc1da41-c923-467b-80f7-d9295fee14e5 rack1 UN 10.128.0.122 1.27 TiB 256 ? cb36cd32-7df8-4ba8-9a8f-9155d72bc994 rack1 UN 10.128.0.100 1.26 TiB 256 ? 179b107f-e813-4e2d-be63-43e7a97e25b4 rack1 UN 10.128.0.52 1.33 TiB 256 ? 7d2f5695-b8d0-405c-b695-42b1a6977261 rack1 UN 10.128.0.37 1.33 TiB 256 ? 5cceed5b-8482-47d7-ba21-e7a9e8edf21b rack1 Total 15.67 TiB of data. Per node: (lsst-scipipe-4f18ecb) [andy_salnikov_gmail_com@apdb-client-1 apdb-gcloud]$ shmux -m -s -c "df -h /data/apdb*" -- apdb-server-{1..12} apdb-server-1: Filesystem Size Used Avail Use% Mounted on apdb-server-1: /dev/nvme0n1p1 375G 179G 196G 48% /data/apdb1 apdb-server-1: /dev/nvme0n2p1 375G 171G 205G 46% /data/apdb2 apdb-server-1: /dev/nvme0n3p1 375G 170G 205G 46% /data/apdb3 apdb-server-1: /dev/nvme0n4p1 375G 173G 203G 46% /data/apdb4 apdb-server-1: /dev/nvme0n5p1 375G 168G 207G 45% /data/apdb5 apdb-server-1: /dev/nvme0n6p1 375G 170G 205G 46% /data/apdb6 apdb-server-1: /dev/nvme0n7p1 375G 171G 205G 46% /data/apdb7 apdb-server-1: /dev/nvme0n8p1 375G 171G 205G 46% /data/apdb8 apdb-server-7: Filesystem Size Used Avail Use% Mounted on apdb-server-7: /dev/nvme0n1p1 375G 183G 193G 49% /data/apdb1 apdb-server-7: /dev/nvme0n2p1 375G 173G 203G 46% /data/apdb2 apdb-server-7: /dev/nvme0n3p1 375G 172G 204G 46% /data/apdb3 apdb-server-7: /dev/nvme0n4p1 375G 175G 201G 47% /data/apdb4 apdb-server-7: /dev/nvme0n5p1 375G 171G 205G 46% /data/apdb5 apdb-server-7: /dev/nvme0n6p1 375G 172G 204G 46% /data/apdb6 apdb-server-7: /dev/nvme0n7p1 375G 175G 200G 47% /data/apdb7 apdb-server-7: /dev/nvme0n8p1 375G 175G 201G 47% /data/apdb8 apdb-server-9: Filesystem Size Used Avail Use% Mounted on apdb-server-9: /dev/nvme0n1p1 375G 164G 212G 44% /data/apdb1 apdb-server-9: /dev/nvme0n2p1 375G 160G 216G 43% /data/apdb2 apdb-server-9: /dev/nvme0n3p1 375G 161G 215G 43% /data/apdb3 apdb-server-9: /dev/nvme0n4p1 375G 159G 217G 43% /data/apdb4 apdb-server-9: /dev/nvme0n5p1 375G 160G 216G 43% /data/apdb5 apdb-server-9: /dev/nvme0n6p1 375G 161G 215G 43% /data/apdb6 apdb-server-9: /dev/nvme0n7p1 375G 161G 215G 43% /data/apdb7 apdb-server-9: /dev/nvme0n8p1 375G 160G 216G 43% /data/apdb8 apdb-server-11: Filesystem Size Used Avail Use% Mounted on apdb-server-11: /dev/nvme0n1p1 375G 167G 209G 45% /data/apdb1 apdb-server-11: /dev/nvme0n2p1 375G 163G 213G 44% /data/apdb2 apdb-server-11: /dev/nvme0n3p1 375G 164G 212G 44% /data/apdb3 apdb-server-11: /dev/nvme0n4p1 375G 161G 215G 43% /data/apdb4 apdb-server-11: /dev/nvme0n5p1 375G 164G 212G 44% /data/apdb5 apdb-server-11: /dev/nvme0n6p1 375G 163G 213G 44% /data/apdb6 apdb-server-11: /dev/nvme0n7p1 375G 164G 212G 44% /data/apdb7 apdb-server-11: /dev/nvme0n8p1 375G 166G 210G 45% /data/apdb8 apdb-server-5: Filesystem Size Used Avail Use% Mounted on apdb-server-5: /dev/nvme0n1p1 375G 167G 208G 45% /data/apdb1 apdb-server-5: /dev/nvme0n2p1 375G 162G 214G 44% /data/apdb2 apdb-server-5: /dev/nvme0n3p1 375G 161G 215G 43% /data/apdb3 apdb-server-5: /dev/nvme0n4p1 375G 161G 214G 43% /data/apdb4 apdb-server-5: /dev/nvme0n5p1 375G 160G 216G 43% /data/apdb5 apdb-server-5: /dev/nvme0n6p1 375G 160G 216G 43% /data/apdb6 apdb-server-5: /dev/nvme0n7p1 375G 162G 213G 44% /data/apdb7 apdb-server-5: /dev/nvme0n8p1 375G 163G 212G 44% /data/apdb8 apdb-server-3: Filesystem Size Used Avail Use% Mounted on apdb-server-3: /dev/nvme0n1p1 375G 176G 200G 47% /data/apdb1 apdb-server-3: /dev/nvme0n2p1 375G 166G 210G 45% /data/apdb2 apdb-server-3: /dev/nvme0n3p1 375G 166G 209G 45% /data/apdb3 apdb-server-3: /dev/nvme0n4p1 375G 164G 212G 44% /data/apdb4 apdb-server-3: /dev/nvme0n5p1 375G 167G 209G 45% /data/apdb5 apdb-server-3: /dev/nvme0n6p1 375G 165G 211G 44% /data/apdb6 apdb-server-3: /dev/nvme0n7p1 375G 166G 210G 45% /data/apdb7 apdb-server-3: /dev/nvme0n8p1 375G 161G 215G 43% /data/apdb8 apdb-server-6: Filesystem Size Used Avail Use% Mounted on apdb-server-6: /dev/nvme0n1p1 375G 175G 200G 47% /data/apdb1 apdb-server-6: /dev/nvme0n2p1 375G 173G 203G 47% /data/apdb2 apdb-server-6: /dev/nvme0n3p1 375G 171G 205G 46% /data/apdb3 apdb-server-6: /dev/nvme0n4p1 375G 172G 204G 46% /data/apdb4 apdb-server-6: /dev/nvme0n5p1 375G 171G 205G 46% /data/apdb5 apdb-server-6: /dev/nvme0n6p1 375G 173G 202G 47% /data/apdb6 apdb-server-6: /dev/nvme0n7p1 375G 172G 204G 46% /data/apdb7 apdb-server-6: /dev/nvme0n8p1 375G 172G 204G 46% /data/apdb8 apdb-server-2: Filesystem Size Used Avail Use% Mounted on apdb-server-2: /dev/nvme0n1p1 375G 179G 196G 48% /data/apdb1 apdb-server-2: /dev/nvme0n2p1 375G 171G 205G 46% /data/apdb2 apdb-server-2: /dev/nvme0n3p1 375G 171G 205G 46% /data/apdb3 apdb-server-2: /dev/nvme0n4p1 375G 174G 202G 47% /data/apdb4 apdb-server-2: /dev/nvme0n5p1 375G 170G 206G 46% /data/apdb5 apdb-server-2: /dev/nvme0n6p1 375G 172G 204G 46% /data/apdb6 apdb-server-2: /dev/nvme0n7p1 375G 170G 206G 46% /data/apdb7 apdb-server-2: /dev/nvme0n8p1 375G 175G 201G 47% /data/apdb8 apdb-server-10: Filesystem Size Used Avail Use% Mounted on apdb-server-10: /dev/nvme0n1p1 375G 186G 190G 50% /data/apdb1 apdb-server-10: /dev/nvme0n2p1 375G 182G 193G 49% /data/apdb2 apdb-server-10: /dev/nvme0n3p1 375G 185G 191G 50% /data/apdb3 apdb-server-10: /dev/nvme0n4p1 375G 181G 195G 49% /data/apdb4 apdb-server-10: /dev/nvme0n5p1 375G 187G 189G 50% /data/apdb5 apdb-server-10: /dev/nvme0n6p1 375G 183G 193G 49% /data/apdb6 apdb-server-10: /dev/nvme0n7p1 375G 182G 194G 49% /data/apdb7 apdb-server-10: /dev/nvme0n8p1 375G 177G 199G 47% /data/apdb8 apdb-server-4: Filesystem Size Used Avail Use% Mounted on apdb-server-4: /dev/nvme0n1p1 375G 176G 200G 47% /data/apdb1 apdb-server-4: /dev/nvme0n2p1 375G 170G 206G 46% /data/apdb2 apdb-server-4: /dev/nvme0n3p1 375G 172G 203G 46% /data/apdb3 apdb-server-4: /dev/nvme0n4p1 375G 170G 206G 46% /data/apdb4 apdb-server-4: /dev/nvme0n5p1 375G 171G 204G 46% /data/apdb5 apdb-server-4: /dev/nvme0n6p1 375G 172G 204G 46% /data/apdb6 apdb-server-4: /dev/nvme0n7p1 375G 171G 204G 46% /data/apdb7 apdb-server-4: /dev/nvme0n8p1 375G 166G 209G 45% /data/apdb8 apdb-server-12: Filesystem Size Used Avail Use% Mounted on apdb-server-12: /dev/nvme0n1p1 375G 177G 198G 48% /data/apdb1 apdb-server-12: /dev/nvme0n2p1 375G 170G 206G 46% /data/apdb2 apdb-server-12: /dev/nvme0n3p1 375G 168G 208G 45% /data/apdb3 apdb-server-12: /dev/nvme0n4p1 375G 167G 209G 45% /data/apdb4 apdb-server-12: /dev/nvme0n5p1 375G 169G 207G 46% /data/apdb5 apdb-server-12: /dev/nvme0n6p1 375G 169G 207G 45% /data/apdb6 apdb-server-12: /dev/nvme0n7p1 375G 169G 207G 45% /data/apdb7 apdb-server-12: /dev/nvme0n8p1 375G 171G 205G 46% /data/apdb8 apdb-server-8: Filesystem Size Used Avail Use% Mounted on apdb-server-8: /dev/nvme0n1p1 375G 164G 212G 44% /data/apdb1 apdb-server-8: /dev/nvme0n2p1 375G 159G 217G 43% /data/apdb2 apdb-server-8: /dev/nvme0n3p1 375G 155G 221G 42% /data/apdb3 apdb-server-8: /dev/nvme0n4p1 375G 158G 218G 42% /data/apdb4 apdb-server-8: /dev/nvme0n5p1 375G 158G 218G 43% /data/apdb5 apdb-server-8: /dev/nvme0n6p1 375G 158G 218G 43% /data/apdb6 apdb-server-8: /dev/nvme0n7p1 375G 157G 219G 42% /data/apdb7 apdb-server-8: /dev/nvme0n8p1 375G 161G 215G 43% /data/apdb8 Total 15.79 TiB.  Pictures will follow, there is some interesting stuff.
          Hide
          salnikov Andy Salnikov added a comment -

          Digging into the details of this test:

          • Total of 139352 visits generated, but with different configurations so this is not a uniform test
          • all non-indexed columns in the tables are packed into a single BLOB column using CBOR for serialization (https://cbor.io/) which is a binary compact format
          • forced sources have very narrow table so only few columns in a BLOB, other tables are very wide
          • with CBOR all column names are packed into the same BLOB which inflates data size, interesting to see how Cassandra compression could help here

          First look at data sizes stored in Cassandra shows a big increase in data volume, for this test it makes about 11.4 TiB per 100k visits compared to typical 5.9 TiB in previous tests, almost a factor of two increase. Cassandra shows compression ratio around 55% for two widest tables. I think this dramatic increase in data volume stored probably rules out an option of using JSON-style serialization format, if we want to pack the data then we'll need a known schema on client side, management of that sort of schema could be a major issue.

          Here are few plots that compare numbers from this test to few previous short tests in the time order from gcp-5 to current gcp-9, see DM-27785 for description of the tests (this test is the rightmost series).

          Here is how data size on disk grows with time, dramatically faster for current test:

          Compression ratio for all tests for different tables:

          Timing measurements show some interesting results. It appears that on server side there is an apparent improvement in read/write latency (these two plots are for DiaSource table) which I believe measures time per individual partition processing on replica nodes:


          What is also interesting on the latter plot is that latency in leftmost series is also very low, and that test (gcp-5) was using 2-month granularity for time partitioning. I think this could indicate that latency also depends on how many concurrent queries are executing.

          Latency for client requests shows some improvement but not a lot (this I think measures timing on coordinator node so it includes whole request processing chain):

          Timing measured on client side shows some interesting trends too.

          This is insert time which improves significantly (but it was never a bottleneck):

          Select time OTOH does not show much improvement compared to earlier results but it has an interesting structure that I will explain in the next comment with zoomed in plots:

          Show
          salnikov Andy Salnikov added a comment - Digging into the details of this test: Total of 139352 visits generated, but with different configurations so this is not a uniform test all non-indexed columns in the tables are packed into a single BLOB column using CBOR for serialization ( https://cbor.io/ ) which is a binary compact format forced sources have very narrow table so only few columns in a BLOB, other tables are very wide with CBOR all column names are packed into the same BLOB which inflates data size, interesting to see how Cassandra compression could help here First look at data sizes stored in Cassandra shows a big increase in data volume, for this test it makes about 11.4 TiB per 100k visits compared to typical 5.9 TiB in previous tests, almost a factor of two increase. Cassandra shows compression ratio around 55% for two widest tables. I think this dramatic increase in data volume stored probably rules out an option of using JSON-style serialization format, if we want to pack the data then we'll need a known schema on client side, management of that sort of schema could be a major issue. Here are few plots that compare numbers from this test to few previous short tests in the time order from gcp-5 to current gcp-9, see DM-27785 for description of the tests (this test is the rightmost series). Here is how data size on disk grows with time, dramatically faster for current test: Compression ratio for all tests for different tables: Timing measurements show some interesting results. It appears that on server side there is an apparent improvement in read/write latency (these two plots are for DiaSource table) which I believe measures time per individual partition processing on replica nodes: What is also interesting on the latter plot is that latency in leftmost series is also very low, and that test (gcp-5) was using 2-month granularity for time partitioning. I think this could indicate that latency also depends on how many concurrent queries are executing. Latency for client requests shows some improvement but not a lot (this I think measures timing on coordinator node so it includes whole request processing chain): Timing measured on client side shows some interesting trends too. This is insert time which improves significantly (but it was never a bottleneck): Select time OTOH does not show much improvement compared to earlier results but it has an interesting structure that I will explain in the next comment with zoomed in plots:
          Hide
          salnikov Andy Salnikov added a comment -

          Here are two interesting plots that reflect the changes applied during the test.

          Client-side select time (real and CPU):


          • Initial section of the plot more or less matches other tests
          • Around 2021-02-17 20:47 (first dashed vertical line) I switched to somewhat optimized conversion of unpacked BLOB data to pandas. This improved timing slightly for DiaSources but less noticeable for DiaForcedSources. Even with this optimization CPU time is still a large fraction of real time.
          • To see how much we spend on pandas conversion I completely disabled that conversion for both source tables around 2021-02-18 18:18 (second dashed line). CPU time dropped to almost zero for both tables ad real time reduced significantly for DiaSource. What is interesting is that real time for DiaForcedSource table is practically unchanged. I think this could be an effect of concurrent queries from multiple clients which causes non-local effects for server-side performance. One way to test this is to switch order of retrieval of sources (right now I fetch DiaSources immediately followed by DiaForcedSources).
          • Around 2021-02-19 18:18 I enabled query-per-partition while still having pandas conversion disabled (third dashed line). This slightly improved DiaSources timing but had very little effect on DiaForcedSources.

          I think one conclusion here is that our in-memory conversion of query results is not optimal. I am not sure if we could optimize pandas conversion further without going to C++ level. I think that in general pandas is a best match to data returned from databases (including SQL) which is usually a stream of tuples/records with identical layout. Re-arranging that into a bunch of 2-dim arrays will certainly need significant effort. Maybe we should reconsider our us of pandas, at least as return type of APDB and keep the format as close to wire format as we could. Still there

          Show
          salnikov Andy Salnikov added a comment - Here are two interesting plots that reflect the changes applied during the test. Client-side select time (real and CPU): Initial section of the plot more or less matches other tests Around 2021-02-17 20:47 (first dashed vertical line) I switched to somewhat optimized conversion of unpacked BLOB data to pandas. This improved timing slightly for DiaSources but less noticeable for DiaForcedSources. Even with this optimization CPU time is still a large fraction of real time. To see how much we spend on pandas conversion I completely disabled that conversion for both source tables around 2021-02-18 18:18 (second dashed line). CPU time dropped to almost zero for both tables ad real time reduced significantly for DiaSource. What is interesting is that real time for DiaForcedSource table is practically unchanged. I think this could be an effect of concurrent queries from multiple clients which causes non-local effects for server-side performance. One way to test this is to switch order of retrieval of sources (right now I fetch DiaSources immediately followed by DiaForcedSources). Around 2021-02-19 18:18 I enabled query-per-partition while still having pandas conversion disabled (third dashed line). This slightly improved DiaSources timing but had very little effect on DiaForcedSources. I think one conclusion here is that our in-memory conversion of query results is not optimal. I am not sure if we could optimize pandas conversion further without going to C++ level. I think that in general pandas is a best match to data returned from databases (including SQL) which is usually a stream of tuples/records with identical layout. Re-arranging that into a bunch of 2-dim arrays will certainly need significant effort. Maybe we should reconsider our us of pandas, at least as return type of APDB and keep the format as close to wire format as we could. Still there
          Hide
          salnikov Andy Salnikov added a comment -

          And for completeness two standard plots from notebook which show select time as a function of visit number (real and CPU time):


          Same features as above, but this plot makes it easier to see that the initial slope is steeper compared to previous tests without BLOBs, even after first attempt to optimize BLOB conversion CPU time is still significantly higher than before, and real time is higher too.

          Show
          salnikov Andy Salnikov added a comment - And for completeness two standard plots from notebook which show select time as a function of visit number (real and CPU time): Same features as above, but this plot makes it easier to see that the initial slope is steeper compared to previous tests without BLOBs, even after first attempt to optimize BLOB conversion CPU time is still significantly higher than before, and real time is higher too.
          Hide
          salnikov Andy Salnikov added a comment -

          Closing this ticket, I want to do one more test with increased number of detected sources.

          Show
          salnikov Andy Salnikov added a comment - Closing this ticket, I want to do one more test with increased number of detected sources.

            People

            Assignee:
            salnikov Andy Salnikov
            Reporter:
            salnikov Andy Salnikov
            Watchers:
            Andy Salnikov, Colin Slater, Fritz Mueller
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                CI Builds

                No builds found.