Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-27785

S21 APDB Prototype

    XMLWordPrintable

    Details

    • Type: Epic
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Epic Name:
      S21 APDB Prototype
    • Story Points:
      38
    • WBS:
      02C.06.01.01
    • Team:
      Data Access and Database
    • Cycle:
      Spring 2021

      Attachments

        Activity

        Hide
        salnikov Andy Salnikov added a comment -

        Summary of all tests (just for my own record for which ticket did what). Each "gcp" test starts from scratch with empty database.

        gcp-1:

        • 2020-12-18 23:04 to 2020-12-20 18:30 (all times in UTC)
        • 50k visits
        • initial test, same setup as PDAC, 3 node cluster
        • DM-28136

        gcp-2:

        • 2020-12-23 00:50 to 2020-12-25 15:57
        • 6-node cluster
        • still using afw.Table for in-memory data
        • 75k visits of useful data, storage errors later, 89k visits total
        • DM-28154

        gcp-3:

        • 2020-12-30 01:51 to 2021-01-02 04:20
        • 6-node cluster
        • switched to pandas for in-memory representation
        • 100k visits
        • DM-28154

        gcp-4:

        • 2021-01-05 19:23 to 2021-01-19 23:44
        • 12-node cluster, long test >1 year
        • 450k visits
        • DM-28172

        gcp-5:

        • 2021-01-23 04:11 to 2021-01-26 00:42
        • 2-month temporal partitioning
        • 124k visits
        • DM-28467

        gcp-6:

        • 2021-01-27 22:02 to 2021-01-30 02:07
        • native time partitioning
        • one single SELECT (WHERE apdb_part IN (...) AND apdb_time_part IN (...))
        • 100k visits
        • DM-28522

        gcp-7:

        • 2021-02-01 19:49 to 2021-02-04 22:00
        • native time partitioning
        • first 100k visits with one query per time partition (13 queries)
        • remaining 30k visits with one query per spatial partition (~200 queries)
        • 130542 visits
        • DM-28522

        gcp-8:

        • 2021-02-09 00:02 to 2021-02-11 00:24
        • native time partitioning
        • optimized result to pandas conversion (concatenate results then convert)
        • first 79k visits with one query per time partition (13 queries)
        • remaining visits with one query per spatial partition (~200 queries)
        • 91242 visits total
        • DM-28522

        gcp-9:

        • 2021-02-17 03:08 to 2021-02-20 00:17
        • packing fields into BLOBS with CBOR
        • also check timing without converting sources into pandas
        • 139352 visits
        • DM-28820
        Show
        salnikov Andy Salnikov added a comment - Summary of all tests (just for my own record for which ticket did what). Each "gcp" test starts from scratch with empty database. gcp-1: 2020-12-18 23:04 to 2020-12-20 18:30 (all times in UTC) 50k visits initial test, same setup as PDAC, 3 node cluster DM-28136 gcp-2: 2020-12-23 00:50 to 2020-12-25 15:57 6-node cluster still using afw.Table for in-memory data 75k visits of useful data, storage errors later, 89k visits total DM-28154 gcp-3: 2020-12-30 01:51 to 2021-01-02 04:20 6-node cluster switched to pandas for in-memory representation 100k visits DM-28154 gcp-4: 2021-01-05 19:23 to 2021-01-19 23:44 12-node cluster, long test >1 year 450k visits DM-28172 gcp-5: 2021-01-23 04:11 to 2021-01-26 00:42 2-month temporal partitioning 124k visits DM-28467 gcp-6: 2021-01-27 22:02 to 2021-01-30 02:07 native time partitioning one single SELECT (WHERE apdb_part IN (...) AND apdb_time_part IN (...)) 100k visits DM-28522 gcp-7: 2021-02-01 19:49 to 2021-02-04 22:00 native time partitioning first 100k visits with one query per time partition (13 queries) remaining 30k visits with one query per spatial partition (~200 queries) 130542 visits DM-28522 gcp-8: 2021-02-09 00:02 to 2021-02-11 00:24 native time partitioning optimized result to pandas conversion (concatenate results then convert) first 79k visits with one query per time partition (13 queries) remaining visits with one query per spatial partition (~200 queries) 91242 visits total DM-28522 gcp-9: 2021-02-17 03:08 to 2021-02-20 00:17 packing fields into BLOBS with CBOR also check timing without converting sources into pandas 139352 visits DM-28820

          People

          Assignee:
          Unassigned Unassigned
          Reporter:
          fritzm Fritz Mueller
          Watchers:
          Andy Salnikov, Fritz Mueller
          Votes:
          0 Vote for this issue
          Watchers:
          2 Start watching this issue

            Dates

            Created:
            Updated:
            Resolved:

              Jenkins

              No builds found.