Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-6370

Produce performance figures for initial L1 database design

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: L1 Database
    • Labels:

      Description

      Using my AP prototype script produce some performance numbers and plots for L1 database operations. Do it for baseline schema as defined in current baseline (cat package) and also try to get somewhat improved mysql schema (I know already that baseline indices do not work very well for Object table).

        Attachments

        1. mysql-baseline-01.png
          mysql-baseline-01.png
          81 kB
        2. mysql-baseline-02.png
          mysql-baseline-02.png
          13 kB
        3. mysql-baseline-03.png
          mysql-baseline-03.png
          14 kB
        4. mysql-pk-htm-01.png
          mysql-pk-htm-01.png
          166 kB
        5. mysql-pk-htm-02.png
          mysql-pk-htm-02.png
          16 kB
        6. mysql-pk-htm-03.png
          mysql-pk-htm-03.png
          15 kB
        7. pg-baseline-01.png
          pg-baseline-01.png
          92 kB
        8. pg-baseline-02.png
          pg-baseline-02.png
          15 kB
        9. pg-baseline-03.png
          pg-baseline-03.png
          15 kB
        10. pg-pk-htm-01.png
          pg-pk-htm-01.png
          107 kB
        11. pg-pk-htm-02.png
          pg-pk-htm-02.png
          16 kB
        12. pg-pk-htm-03.png
          pg-pk-htm-03.png
          16 kB

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            For reference baseline schema defines following indices for L1 tables:

            DiaObject:
                PRIMARY: (diaObjectId, validityStart)
                IDX_DiaObject_htmId20: htmId20
                IDX_DiaObject_validityStart: validityStart
             
            DiaSource:
                PRIMARY: diaSourceId
                IDX_DiaSource_diaObjectId: diaObjectId
                IDX_DiaSource_htmId20: htmId20
                IDX_DiaSource_filterName: filterName
                IDX_DiaSource_ccdVisitId: ccdVisitId
                IDX_DiaSource_ssObjectId: ssObjectId
             
            DiaForcedSource:
                PRIMARY: (diaObjectId, ccdVisitId)
                IDX_DiaForcedSource_ccdVisitId: ccdVisitId
            

            InnoDB stores records in the order of their primary key, to get best performance primary key needs to be selected carefully. PostgresQL seems to be storing records in their appearance order, primary key is just a regular index.

            Show
            salnikov Andy Salnikov added a comment - For reference baseline schema defines following indices for L1 tables: DiaObject: PRIMARY: (diaObjectId, validityStart) IDX_DiaObject_htmId20: htmId20 IDX_DiaObject_validityStart: validityStart   DiaSource: PRIMARY: diaSourceId IDX_DiaSource_diaObjectId: diaObjectId IDX_DiaSource_htmId20: htmId20 IDX_DiaSource_filterName: filterName IDX_DiaSource_ccdVisitId: ccdVisitId IDX_DiaSource_ssObjectId: ssObjectId   DiaForcedSource: PRIMARY: (diaObjectId, ccdVisitId) IDX_DiaForcedSource_ccdVisitId: ccdVisitId InnoDB stores records in the order of their primary key, to get best performance primary key needs to be selected carefully. PostgresQL seems to be storing records in their appearance order, primary key is just a regular index.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Few plots for baseline schema with the above indices.

            Note: there is still no reading of Source and ForcedSource tables currently, only Object table is read, so this numbers are far from realistic. I expect that reading Forced/Sources to be slower than Objects.

            The data covers visits 0 to ~3800 which corresponds approximately 5 nights of LSST data collection. There are 58M DiaSources and 75M DiaObjects and DiaForcedSources in database after that number of visits.

            MySQL - wall-clock (real) time in seconds to read (blue dots) and write (red crosses) data as a function of visit number, starting with empty database. The gaps are due to lost logs (AFS ate it)

            Here is the same plots in box-and-whiskers style binned in 100 visits:

            Similar plots for PostgreSQL:


            Writing time can be split into individual table update time: truncating Objects, inserting new Objects, inserting Sources, and inserting Forced Sources. Here is the break up of average writing times for visits around 3800:

              MySQL PostgreSQL
            DiaObject select 120 10
            Total Write 80 90
            . DiaObject truncate 12 28
            . DiaObject insert 12 8
            . DiaSource insert 18 25
            . DiaForcedSource insert 38 30

            Few observations from these numbers:

            • performance is nowhere close to what we need for production
            • watching the system shows >90% busy for I/O and servers are not maxing out CPU.
            • MySQL is slower in selecting DiaObjects, this is probably explained by non-optimal data locality (because of primary index on Object table and InnoDB ordering). For postgres data locality seems to be better because data is stored in order of arriving and that order preserves locality.
            • truncate time is worse for postgres, this could probably be improved if we use HOT updates
            • the numbers are very likely get worse with the increasing number of objects per table
            Show
            salnikov Andy Salnikov added a comment - - edited Few plots for baseline schema with the above indices. Note: there is still no reading of Source and ForcedSource tables currently, only Object table is read, so this numbers are far from realistic. I expect that reading Forced/Sources to be slower than Objects. The data covers visits 0 to ~3800 which corresponds approximately 5 nights of LSST data collection. There are 58M DiaSources and 75M DiaObjects and DiaForcedSources in database after that number of visits. MySQL - wall-clock (real) time in seconds to read (blue dots) and write (red crosses) data as a function of visit number, starting with empty database. The gaps are due to lost logs (AFS ate it) Here is the same plots in box-and-whiskers style binned in 100 visits: Similar plots for PostgreSQL: Writing time can be split into individual table update time: truncating Objects, inserting new Objects, inserting Sources, and inserting Forced Sources. Here is the break up of average writing times for visits around 3800:   MySQL PostgreSQL DiaObject select 120 10 Total Write 80 90 . DiaObject truncate 12 28 . DiaObject insert 12 8 . DiaSource insert 18 25 . DiaForcedSource insert 38 30 Few observations from these numbers: performance is nowhere close to what we need for production watching the system shows >90% busy for I/O and servers are not maxing out CPU. MySQL is slower in selecting DiaObjects, this is probably explained by non-optimal data locality (because of primary index on Object table and InnoDB ordering). For postgres data locality seems to be better because data is stored in order of arriving and that order preserves locality. truncate time is worse for postgres, this could probably be improved if we use HOT updates the numbers are very likely get worse with the increasing number of objects per table
            Hide
            salnikov Andy Salnikov added a comment - - edited

            DiaObject locality

            InnoDB engine uses primary key to order rows in a table, default baseline PK does not give very good locality of data for DiaObject. Trying to improve it with using different PK which uses HTM ID as a leading component of a PK. In addition to HTM ID we also need other components to make PK unique, this are usual diaObjectId and iovStart columns (same as baseline schema PK).

            Here are the plots for MySQL/InnoDB using this PK, everything else stays exactly the same in the schema:

            Mysql wall-clock time for reading DiaObjects and storing all three tables (i.e. still no reading of sources):


            Same new PK used for PostgreSQL tables:



            And some performance numbers at max visit number ~5000 (so they are not directly comparable to tables above):

              MySQL PostgreSQL
            DiaObject select 28 19
            Total Write 90 150
            . DiaObject truncate 25 40
            . DiaObject insert 13 10
            . DiaSource insert 14 45
            . DiaForcedSource insert 37 50

            With this change in PK MySQL performance for DiaObject reads is much better than with baseline PK. What is interesting is that mysql performance is still lower than postgres performance and for postgres I did nothing special about locality control (I believe postgres stores data in the order that is comes in and stores PK as a regular index). May be worth to try similar ordering for mysql to see it it helps.

            Write time is more or less comparable for mysql and postgres, there are some variations probably relating to PK ordering in mysql.

            Show
            salnikov Andy Salnikov added a comment - - edited DiaObject locality InnoDB engine uses primary key to order rows in a table, default baseline PK does not give very good locality of data for DiaObject. Trying to improve it with using different PK which uses HTM ID as a leading component of a PK. In addition to HTM ID we also need other components to make PK unique, this are usual diaObjectId and iovStart columns (same as baseline schema PK). Here are the plots for MySQL/InnoDB using this PK, everything else stays exactly the same in the schema: Mysql wall-clock time for reading DiaObjects and storing all three tables (i.e. still no reading of sources): Same new PK used for PostgreSQL tables: And some performance numbers at max visit number ~5000 (so they are not directly comparable to tables above):   MySQL PostgreSQL DiaObject select 28 19 Total Write 90 150 . DiaObject truncate 25 40 . DiaObject insert 13 10 . DiaSource insert 14 45 . DiaForcedSource insert 37 50 With this change in PK MySQL performance for DiaObject reads is much better than with baseline PK. What is interesting is that mysql performance is still lower than postgres performance and for postgres I did nothing special about locality control (I believe postgres stores data in the order that is comes in and stores PK as a regular index). May be worth to try similar ordering for mysql to see it it helps. Write time is more or less comparable for mysql and postgres, there are some variations probably relating to PK ordering in mysql.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Brief summary before closing this thicket:

            • Prototype script ran on initially empty L1 database for a small number of visits (covering just few nights) and without reading DiaSource and DiaForcedSource tables.
            • One can probably say at this point that performance of spinning disk access is not adequate for simple database-only access pattern without intermediate caches, even for small amount of data in database total time to read and store data exceeds allowed budget and time will definitely grow with the amount of data in database.
            • We should try to run with SSDs to see can achieve reasonable performance with lower latency
            • In any case whole data set will likely not fit on SSDs and we'll need to partition our data based on access patterns

            I'll do more tests with spinning disks to understand the difference in numbers and truing to figure out how indexing can be improved (but will need some input from science people about that).

            And there are couple more things to consider:

            • AP works with a single CCD so there is am inherent parallelism (factor ~200). WE need to understand what is the best approach to L1 database access from those 200 independent processes, either each of them selects small are covering one CCD or if it is better to run one big select covering whole FOV.
            • HTM index is not perfect in terms of data locality, if I remember correctly from discussion with Serge Q3C is supposed to be better in that respect, we should try Q3C in prototype (I'm not sure if it is already implemented in sphgeom).
            Show
            salnikov Andy Salnikov added a comment - - edited Brief summary before closing this thicket: Prototype script ran on initially empty L1 database for a small number of visits (covering just few nights) and without reading DiaSource and DiaForcedSource tables. One can probably say at this point that performance of spinning disk access is not adequate for simple database-only access pattern without intermediate caches, even for small amount of data in database total time to read and store data exceeds allowed budget and time will definitely grow with the amount of data in database. We should try to run with SSDs to see can achieve reasonable performance with lower latency In any case whole data set will likely not fit on SSDs and we'll need to partition our data based on access patterns I'll do more tests with spinning disks to understand the difference in numbers and truing to figure out how indexing can be improved (but will need some input from science people about that). And there are couple more things to consider: AP works with a single CCD so there is am inherent parallelism (factor ~200). WE need to understand what is the best approach to L1 database access from those 200 independent processes, either each of them selects small are covering one CCD or if it is better to run one big select covering whole FOV. HTM index is not perfect in terms of data locality, if I remember correctly from discussion with Serge Q3C is supposed to be better in that respect, we should try Q3C in prototype (I'm not sure if it is already implemented in sphgeom).
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Closing this, no review necessary, work to continue in the next sprint.

            Show
            salnikov Andy Salnikov added a comment - - edited Closing this, no review necessary, work to continue in the next sprint.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              fritzm Fritz Mueller
              Watchers:
              Andy Salnikov, Fritz Mueller
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  CI Builds

                  No builds found.