Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-17654

PPDB Scaling Test in Google Cloud

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: L1 Database
    • Labels:
    • Story Points:
      8
    • Sprint:
      DB_S19_02, DB_S19_03
    • Team:
      Data Access and Database

      Description

      Perform PPDB scaling test in Google Cloud with other database technologies like postgres and using node configuration not available at NCSA.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment - - edited

            Ignoring for now store time and trying to understand what to do with DIASource select time. Like in Oracle it scales linearly and even though it's faster with Postgres than with Oracle we face the same issue here - we cannot scale it to 12 months needed by AP. I think we are at the limit of affordable time budget at 30k visits already (roughly 1 month). So we need to think how to break that linear scaling (and there is no guarantee it will stay linear at higher scale). I could think of couple of options:

            • Keep data on (fast) disk but parallelize access to it. I'm 99% that we are going to need many more servers to scale system that way. Current approach is already running ~200 parallel tiles and I believe our current bottleneck is on server side. We might need ~10 servers and reasonably low overhead for doing things in parallel (that feels like we need a real distributed database system).
            • Keeping more data in memory. If we could pre-load all necessary (for current visit) DIASource records into memory the SELECT would run much faster. This could be either SQL table that is kept in memory and updated for each new visit in advance, or some cache service a la memcached.

            Looking at the above numbers, DIASource needs about 0.5TB for one month of data, then 12 months is about 6TB in total (maybe little less). I do not think we could have a machine with so much RAM, so it probably means that we should forget about keeping it all in memory. Second option is to preload data that is needed for the next night and that means that that we could reduce size by about factor of 3, bringing us to 2TB. This is probably more realistic but still rather large number, possible issues here is how much time is needed to re-populate the cache if something happens during the night. Next option is to preload data as we go during the night, few minutes in advance if we know the (approximate) pointing for the next few visits. This should reduce the size of memory needed for it but it has its own drawbacks - loading will be done in parallel with other activities (writing to DIAObjects and reading/writing other tables) and will compete for resources.

            Show
            salnikov Andy Salnikov added a comment - - edited Ignoring for now store time and trying to understand what to do with DIASource select time. Like in Oracle it scales linearly and even though it's faster with Postgres than with Oracle we face the same issue here - we cannot scale it to 12 months needed by AP. I think we are at the limit of affordable time budget at 30k visits already (roughly 1 month). So we need to think how to break that linear scaling (and there is no guarantee it will stay linear at higher scale). I could think of couple of options: Keep data on (fast) disk but parallelize access to it. I'm 99% that we are going to need many more servers to scale system that way. Current approach is already running ~200 parallel tiles and I believe our current bottleneck is on server side. We might need ~10 servers and reasonably low overhead for doing things in parallel (that feels like we need a real distributed database system). Keeping more data in memory. If we could pre-load all necessary (for current visit) DIASource records into memory the SELECT would run much faster. This could be either SQL table that is kept in memory and updated for each new visit in advance, or some cache service a la memcached. Looking at the above numbers, DIASource needs about 0.5TB for one month of data, then 12 months is about 6TB in total (maybe little less). I do not think we could have a machine with so much RAM, so it probably means that we should forget about keeping it all in memory. Second option is to preload data that is needed for the next night and that means that that we could reduce size by about factor of 3, bringing us to 2TB. This is probably more realistic but still rather large number, possible issues here is how much time is needed to re-populate the cache if something happens during the night. Next option is to preload data as we go during the night, few minutes in advance if we know the (approximate) pointing for the next few visits. This should reduce the size of memory needed for it but it has its own drawbacks - loading will be done in parallel with other activities (writing to DIAObjects and reading/writing other tables) and will compete for resources.
            Hide
            salnikov Andy Salnikov added a comment -

            Independently of DIASource there is one optimization that we could do for DIAObjectLast table - just keep it whole in memory (re-populate it once on database restart). DIAObjectLast is not a big concern now though, it is the fastest of the three (and smallest) so I'm not going to look at it in this ticket anyways.

            Show
            salnikov Andy Salnikov added a comment - Independently of DIASource there is one optimization that we could do for DIAObjectLast table - just keep it whole in memory (re-populate it once on database restart). DIAObjectLast is not a big concern now though, it is the fastest of the three (and smallest) so I'm not going to look at it in this ticket anyways.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            I ran the test for a bit longer extending it 57k visits to see if it timing stays linear with visits. Here is a bunch of plots from this extended test.

            The total per-visit time continues to grow afrer 30k visit but it is not very linear anymore (or it is linear but with different slope):

            Select time still scales linearly, major contribution comes from selecting DIASources:

            but insert time stays more or less constant after 30k visits, largest contribution to insert time comes from DIAObject:

            That time stays about constant after 30k, this can be explained by the number of inserted DIAObject as it should stabilize after 30k as we have a cutoff of 30 days for forced photometry of "noise" objects (at least this is what is implemented in my prototype).

            DIASource select is clearly still a major concern as it will continue to grow until we reach 12 months of history cutoff. Significant insert time is likely a limitation of cloud storage.

            Show
            salnikov Andy Salnikov added a comment - - edited I ran the test for a bit longer extending it 57k visits to see if it timing stays linear with visits. Here is a bunch of plots from this extended test. The total per-visit time continues to grow afrer 30k visit but it is not very linear anymore (or it is linear but with different slope): Select time still scales linearly, major contribution comes from selecting DIASources: but insert time stays more or less constant after 30k visits, largest contribution to insert time comes from DIAObject: That time stays about constant after 30k, this can be explained by the number of inserted DIAObject as it should stabilize after 30k as we have a cutoff of 30 days for forced photometry of "noise" objects (at least this is what is implemented in my prototype). DIASource select is clearly still a major concern as it will continue to grow until we reach 12 months of history cutoff. Significant insert time is likely a limitation of cloud storage.
            Hide
            salnikov Andy Salnikov added a comment -

            Data/table sizes after 57k visits:

                    table_name         | row_estimate |   total    |   index    |   toast    |   table    
            ---------------------------+--------------+------------+------------+------------+------------
             DiaObject                 |  3.82072e+09 | 3003 GB    | 352 GB     | 8192 bytes | 2651 GB
             DiaSource                 |  8.59906e+08 | 847 GB     | 118 GB     |            | 729 GB
             DiaForcedSource           |  3.82071e+09 | 548 GB     | 248 GB     |            | 301 GB
             DiaObjectLast             |  3.14808e+08 | 86 GB      | 29 GB      |            | 58 GB
            ---------------------------+--------------+------------+------------+------------+------------
             Totals                    |              | 4484 GB    | 747 GB     |            | 3739 GB
            

            and individual relations/indices:

                         relation             |  size   
            ----------------------------------+---------
             public.DiaObject                 | 2650 GB
             public.DiaSource                 | 729 GB
             public.DiaForcedSource           | 301 GB
             public.PK_DiaForcedSource        | 146 GB
             public.PK_DiaObject              | 146 GB
             public.IDX_DiaObject_htmId20     | 104 GB
             public.IDX_DiaObject_valStart    | 101 GB
             public.IDX_DiaFSource_ccdVisitId | 101 GB
             public.DiaObjectLast             | 58 GB
             public.PK_DiaSource              | 26 GB
             public.IDX_DiaSource_htmId20     | 23 GB
             public.IDX_DiaSource_diaObjId    | 23 GB
             public.IDX_DiaSource_ccdVisitId  | 23 GB
             public.IDX_DiaSource_ssObjId     | 23 GB
             public.PK_DiaObjectLast          | 15 GB
             public.IDX_DiaObjLast_diaObjId   | 14 GB
             public.PpdbProtoVisits           | 2472 kB
             public.IDX_PpdbProtoVisits_vTime | 1264 kB
             public.PK_PpdbProtoVisits        | 1264 kB
            

            Show
            salnikov Andy Salnikov added a comment - Data/table sizes after 57k visits: table_name | row_estimate | total | index | toast | table ---------------------------+--------------+------------+------------+------------+------------ DiaObject | 3.82072e+09 | 3003 GB | 352 GB | 8192 bytes | 2651 GB DiaSource | 8.59906e+08 | 847 GB | 118 GB | | 729 GB DiaForcedSource | 3.82071e+09 | 548 GB | 248 GB | | 301 GB DiaObjectLast | 3.14808e+08 | 86 GB | 29 GB | | 58 GB ---------------------------+--------------+------------+------------+------------+------------ Totals | | 4484 GB | 747 GB | | 3739 GB and individual relations/indices: relation | size ----------------------------------+--------- public.DiaObject | 2650 GB public.DiaSource | 729 GB public.DiaForcedSource | 301 GB public.PK_DiaForcedSource | 146 GB public.PK_DiaObject | 146 GB public.IDX_DiaObject_htmId20 | 104 GB public.IDX_DiaObject_valStart | 101 GB public.IDX_DiaFSource_ccdVisitId | 101 GB public.DiaObjectLast | 58 GB public.PK_DiaSource | 26 GB public.IDX_DiaSource_htmId20 | 23 GB public.IDX_DiaSource_diaObjId | 23 GB public.IDX_DiaSource_ccdVisitId | 23 GB public.IDX_DiaSource_ssObjId | 23 GB public.PK_DiaObjectLast | 15 GB public.IDX_DiaObjLast_diaObjId | 14 GB public.PpdbProtoVisits | 2472 kB public.IDX_PpdbProtoVisits_vTime | 1264 kB public.PK_PpdbProtoVisits | 1264 kB
            Hide
            salnikov Andy Salnikov added a comment -

            Couple of trivial updates to dax_ppdb and l1dbproto, no review necessary.

            Show
            salnikov Andy Salnikov added a comment - Couple of trivial updates to dax_ppdb and l1dbproto, no review necessary.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              vaikunth Vaikunth Thukral
              Watchers:
              Andy Salnikov, Fritz Mueller, Vaikunth Thukral
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  CI Builds

                  No builds found.