Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-5509

alert production database next steps (April)

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None

      Description

      Place-holder for additional alert production database work after investigate design task completes. We should split this into smaller stories for a total of 18 points this cycle.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            We had a brief discussion with Jacek regarding L1 schema and related issues. One thing that came up is the time limit for which we keep information in the L1 database. I think that current assumption is that we keep data for "up to one year", not sure it this statement applies to all tables or only to largest/longest ones (DiaForcedSource has largest record count).

            It's not clear to me what mechanism we are going to use to keep that limit. Are we going to cut history based on sliding window, or is it going to be reset at the moment of DRP switch. DPDD gives a detailed and consistent overview of the DRP switching process for L1, but it does not mention what happens to the data which exceed 1 year limit.

            Another issue with DRP switching is that process of "catching up" from DRP cutoff time to a present time is extended (not clear what time it's going to take but it should be non-negligible as DRP itself can take multiple months). During that catch up time there will be two active L1 instances, one for processing "current" DRP, and one doing catch-up for new DRP, which means that we need double capacity.

            Show
            salnikov Andy Salnikov added a comment - We had a brief discussion with Jacek regarding L1 schema and related issues. One thing that came up is the time limit for which we keep information in the L1 database. I think that current assumption is that we keep data for "up to one year", not sure it this statement applies to all tables or only to largest/longest ones (DiaForcedSource has largest record count). It's not clear to me what mechanism we are going to use to keep that limit. Are we going to cut history based on sliding window, or is it going to be reset at the moment of DRP switch. DPDD gives a detailed and consistent overview of the DRP switching process for L1, but it does not mention what happens to the data which exceed 1 year limit. Another issue with DRP switching is that process of "catching up" from DRP cutoff time to a present time is extended (not clear what time it's going to take but it should be non-negligible as DRP itself can take multiple months). During that catch up time there will be two active L1 instances, one for processing "current" DRP, and one doing catch-up for new DRP, which means that we need double capacity.
            Hide
            salnikov Andy Salnikov added a comment -

            Thanks To Jacek, I have discovered Serge's slides from DB Architecture review in 2013 and these slides have plenty of useful ideas (thanks Serge). I added a link to the slides (which goes to a password-protected pages and it was not accessible until few days ago), there is a DocuShare collection for the review but that collection is missing this presentation for some reason.

            My brief summary of what Serge had apparently had a great deal of thought about:

            • Main queries for DiaObject will be position-based, using indexing for HTM ID for searching should make it fast
            • If we make DiaObject table ordered by HTM ID it will also give us storage locality to reduce seeks (For InnoDB tables it means that HTM ID must be a leading part of the primary index)
            • Same applies to Dia[Forced]Source, sorting them by HTM ID of the corresponding DiaObject would help with locality
            • Partitioning should be used to reduce size of the tables/indices
              • Proposed partitioning scheme: one last-night partition, few per-month partitions (for all three major tables)
              • During daytime processing last-night partition is merged with current month partition
              • last-night partition must be transactional
              • per-month partitions for sources can be non-transactional as they are not updated, DiaObject must still be transactional (validityEnd updates can go to any partition)

            This partitioning scheme implies that DiaObject (spacial) queries have to be run on all partitions at once because it's not know in which partition validityEnd=Infinity interval is going to end up. For sources matching the DiaObject we will search more than one partition based on how far in time we want to look for sources (I'm still unclear whether we need all sources to re-compute new version of DiaObject).

            Show
            salnikov Andy Salnikov added a comment - Thanks To Jacek, I have discovered Serge's slides from DB Architecture review in 2013 and these slides have plenty of useful ideas (thanks Serge). I added a link to the slides (which goes to a password-protected pages and it was not accessible until few days ago), there is a DocuShare collection for the review but that collection is missing this presentation for some reason. My brief summary of what Serge had apparently had a great deal of thought about: Main queries for DiaObject will be position-based, using indexing for HTM ID for searching should make it fast If we make DiaObject table ordered by HTM ID it will also give us storage locality to reduce seeks (For InnoDB tables it means that HTM ID must be a leading part of the primary index) Same applies to Dia[Forced]Source, sorting them by HTM ID of the corresponding DiaObject would help with locality Partitioning should be used to reduce size of the tables/indices Proposed partitioning scheme: one last-night partition, few per-month partitions (for all three major tables) During daytime processing last-night partition is merged with current month partition last-night partition must be transactional per-month partitions for sources can be non-transactional as they are not updated, DiaObject must still be transactional (validityEnd updates can go to any partition) This partitioning scheme implies that DiaObject (spacial) queries have to be run on all partitions at once because it's not know in which partition validityEnd=Infinity interval is going to end up. For sources matching the DiaObject we will search more than one partition based on how far in time we want to look for sources (I'm still unclear whether we need all sources to re-compute new version of DiaObject).
            Hide
            salnikov Andy Salnikov added a comment -

            One thing that bothers me and which I could not find mentioned anywhere is possible data dependency between visits (processing in L1). If there is a spacial overlap between two successive visits then processing for the next visit should wait until processing for previous visit stores all data from overlap region in the database. This does not play nice with possible optimizations like pre-fetching DiaObjects before the visit processing start. This needs to be clarified with all parties and if there is indeed an overlap we might need special scheduling for L1DP which takes into account data dependency (and this may negatively affect latency).

            Show
            salnikov Andy Salnikov added a comment - One thing that bothers me and which I could not find mentioned anywhere is possible data dependency between visits (processing in L1). If there is a spacial overlap between two successive visits then processing for the next visit should wait until processing for previous visit stores all data from overlap region in the database. This does not play nice with possible optimizations like pre-fetching DiaObjects before the visit processing start. This needs to be clarified with all parties and if there is indeed an overlap we might need special scheduling for L1DP which takes into account data dependency (and this may negatively affect latency).
            Hide
            smonkewitz Serge Monkewitz added a comment -

            I was told that there would be some amount of time during which the same FOV would not be revisited, but I don't recall ever getting a hard number. Once upon a time I implemented a C++ AP prototype, and it made a provision for this scenario by assuming that AP pipeline jobs would be run on a consistent machine. It would record updates both in permanent storage, and in a shared memory region, so that latency of writing out updates to permanent storage was not in the critical path. The problem with this approach is that if you have back-to-back visits to the same part of the sky, and the write to storage for the first one fails, you can imagine ending up with inconsistent in-memory and on-disk states, where what's on disk does not reflect exactly what was used for processing. I also tried to make the permanent storage phase fast by partitioning the sky into chunks (much like Qserv) and writing out chunk deltas (to be consolidated as part of day time processing).

            Anyway, it was fairly complicated, and I'm not sure how useful it is to look at in detail (permanent storage back then was the file system, not a database back-end), but some of the comments in the following file might be worth a skim to get some ideas:
            https://github.com/lsst-dm/legacy-ap/blob/master/include/lsst/ap/Chunk.h

            Show
            smonkewitz Serge Monkewitz added a comment - I was told that there would be some amount of time during which the same FOV would not be revisited, but I don't recall ever getting a hard number. Once upon a time I implemented a C++ AP prototype, and it made a provision for this scenario by assuming that AP pipeline jobs would be run on a consistent machine. It would record updates both in permanent storage, and in a shared memory region, so that latency of writing out updates to permanent storage was not in the critical path. The problem with this approach is that if you have back-to-back visits to the same part of the sky, and the write to storage for the first one fails, you can imagine ending up with inconsistent in-memory and on-disk states, where what's on disk does not reflect exactly what was used for processing. I also tried to make the permanent storage phase fast by partitioning the sky into chunks (much like Qserv) and writing out chunk deltas (to be consolidated as part of day time processing). Anyway, it was fairly complicated, and I'm not sure how useful it is to look at in detail (permanent storage back then was the file system, not a database back-end), but some of the comments in the following file might be worth a skim to get some ideas: https://github.com/lsst-dm/legacy-ap/blob/master/include/lsst/ap/Chunk.h
            Hide
            salnikov Andy Salnikov added a comment -

            Random thoughts on L1/L2 association and L1 reprocessing.

            L1 database is "based" on some DR (and I'm not sure how it works before DR1) in a sense that DiaObject references Objects from some particular DR (which does not have to be the latest one due to catch-up needed after reprocessing). I think it's an obvious assumption that all referenced Objects should belong to the same DR, otherwise it's going to be a big mess. This probably has some implications for L1 reprocessing, if It runs in parallel with regular L2 DRP then there is a question about how to associate L2 Objects when DRP has not finished yet. I can think about couple of options:

            • run L1 reprocessing on a given patch only after L2 reprocessing is complete for a given patch and small buffer zone around it (not sure if something like this is possible to do)
            • do not associate L1 DiaObjects with L2 Objects until after DRP has finished

            I think latter is probably more reliable mechanism.

            Data dependency issue (see above) is important for L1 reprocessing as well, maybe even more important than for regular L2 processing. For regular processing there may be a guarantee that there is a relatively big window between re-visits of the same area, but for re-processing we want to go as fast as possible. Some smart scheduling and/or synchronization will be needed to avoid races in that case.

            There is a very nice graph of L1 processing timeline in Mario's talk at 2015 Bootcamp (page 35), here is screenshot (Powepoint file is large and can hang your slow laptop):

            Show
            salnikov Andy Salnikov added a comment - Random thoughts on L1/L2 association and L1 reprocessing. L1 database is "based" on some DR (and I'm not sure how it works before DR1) in a sense that DiaObject references Objects from some particular DR (which does not have to be the latest one due to catch-up needed after reprocessing). I think it's an obvious assumption that all referenced Objects should belong to the same DR, otherwise it's going to be a big mess. This probably has some implications for L1 reprocessing, if It runs in parallel with regular L2 DRP then there is a question about how to associate L2 Objects when DRP has not finished yet. I can think about couple of options: run L1 reprocessing on a given patch only after L2 reprocessing is complete for a given patch and small buffer zone around it (not sure if something like this is possible to do) do not associate L1 DiaObjects with L2 Objects until after DRP has finished I think latter is probably more reliable mechanism. Data dependency issue (see above) is important for L1 reprocessing as well, maybe even more important than for regular L2 processing. For regular processing there may be a guarantee that there is a relatively big window between re-visits of the same area, but for re-processing we want to go as fast as possible. Some smart scheduling and/or synchronization will be needed to avoid races in that case. There is a very nice graph of L1 processing timeline in Mario's talk at 2015 Bootcamp (page 35), here is screenshot (Powepoint file is large and can hang your slow laptop):
            Hide
            salnikov Andy Salnikov added a comment -

            At today's scipi-wg meeting (https://confluence.lsstcorp.org/display/DM/Science+Pipelines+Definition+Working+Group) there was a long and very interesting discussion about L1 database and DIA in DRP. Here is my incomplete summary of the things that are database-related:

            • current baseline where L1 database is replaced every year by the the updated/incompatible version from the most recent DR is seen as too disruptive and complicated
            • there is a proposal (https://confluence.lsstcorp.org/pages/viewpage.action?pageId=45580703) to replace this scheme with a never-replaced "living" L1 database:
            • L1 database keeps all data since very beginning, this has implications for space of course and it needs to be clarified
            • DIAObjects in L1 are created from DIASources from past 12 months only (sliding window) and 30 days of forced sources (not clear why such asymmetry)
            • DIASource matching only uses DIAObjects that are not older than 12 months (DIAObject is "retired" if there are no new observations in 12 months)
            • DIAObjects are associated with L2 Objects and this becomes even more complicated now because L1 will span many data releases (see below)
            • Improvements to L1 database will only happen in the form of a better pipeline software, not by updating/replacing existing L1 data, software can be updated frequently
            • In DRP/L2 current idea (see https://community.lsst.org/t/unifying-diaobject-and-object-in-drp/716) is that there is no separate DIAObject, instead a subset of L2 Objects is created from L2 DIASources (and marked with some flag)
            • as L1 DIAObject links to L2 Objects it means automatically that it will link to L2 DIAObject (if there is a L2 DIAObject, and what if there is more than one L2 DIAObject in close vicinity?)

            L2 linking:

            • major issue now is that DIAObject will need to be linked to Objects in more than one DR. The idea is that people will do analysis on L1 database and they want to use most recent L2 DR instead of the DR when DIAObject was initially created. This will likely mean that we need to link existing DIAObject to every new DR as they appear (unless position-based matching is a better tool)
            • I can imagine that when DIAObject is created we only associate it with L2 Object from current DR and not any old DR, later when new DR comes out we add associations to that DR, I'd think that only most recent DIAObject version needs to be associated to a new DR, but there may be other ideas regarding this

            Kian-Tat Lim, please correct anything that I misunderstood or forgot to include

            Show
            salnikov Andy Salnikov added a comment - At today's scipi-wg meeting ( https://confluence.lsstcorp.org/display/DM/Science+Pipelines+Definition+Working+Group ) there was a long and very interesting discussion about L1 database and DIA in DRP. Here is my incomplete summary of the things that are database-related: current baseline where L1 database is replaced every year by the the updated/incompatible version from the most recent DR is seen as too disruptive and complicated there is a proposal ( https://confluence.lsstcorp.org/pages/viewpage.action?pageId=45580703 ) to replace this scheme with a never-replaced "living" L1 database: L1 database keeps all data since very beginning, this has implications for space of course and it needs to be clarified DIAObjects in L1 are created from DIASources from past 12 months only (sliding window) and 30 days of forced sources (not clear why such asymmetry) DIASource matching only uses DIAObjects that are not older than 12 months (DIAObject is "retired" if there are no new observations in 12 months) DIAObjects are associated with L2 Objects and this becomes even more complicated now because L1 will span many data releases (see below) Improvements to L1 database will only happen in the form of a better pipeline software, not by updating/replacing existing L1 data, software can be updated frequently In DRP/L2 current idea (see https://community.lsst.org/t/unifying-diaobject-and-object-in-drp/716 ) is that there is no separate DIAObject, instead a subset of L2 Objects is created from L2 DIASources (and marked with some flag) as L1 DIAObject links to L2 Objects it means automatically that it will link to L2 DIAObject (if there is a L2 DIAObject, and what if there is more than one L2 DIAObject in close vicinity?) L2 linking: major issue now is that DIAObject will need to be linked to Objects in more than one DR. The idea is that people will do analysis on L1 database and they want to use most recent L2 DR instead of the DR when DIAObject was initially created. This will likely mean that we need to link existing DIAObject to every new DR as they appear (unless position-based matching is a better tool) I can imagine that when DIAObject is created we only associate it with L2 Object from current DR and not any old DR, later when new DR comes out we add associations to that DR, I'd think that only most recent DIAObject version needs to be associated to a new DR, but there may be other ideas regarding this Kian-Tat Lim , please correct anything that I misunderstood or forgot to include
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Looking closer at the sizing model for L1 database. LDM-141 is the main source of the numbers, and LDM-135 table is based on numbers from that spreadsheet (and we discovered that LDM-135 table has a typo in its table).

            After talking to K-T and trying to better to understand the model used to derive those numbers I managed to extract some figures from LDM-141 and produce few plots for current baseline and also for new proposed L1 approach (living database). I re-coded the model in IPython notebook (excel is too heavy for me) and could play with it to produce more details. Here are the plots:

            Current baseline

            In this model we "reset" DiaObject table to only keep most recent object version after each DR. DR "release time" is assumed to be several months later than RD "cutoff time" (also proportional to the data volume). Note that size below reflect only the current L1 database size, it does not include the size of the old archived copies.

            Number of rows per table as function of time:

            Size of tables as function of time (row size is taken on LDM-141 which may be underestimated):

            New proposal

            In new L1 design we do not reset DiaObjects but continue accumulating forever.

            Number of rows per table as function of time:

            Size of tables as function of time (row size is taken on LDM-141 which may be underestimated):

            Note also that size does not include any indexing or storage overhead.

            Show
            salnikov Andy Salnikov added a comment - - edited Looking closer at the sizing model for L1 database. LDM-141 is the main source of the numbers, and LDM-135 table is based on numbers from that spreadsheet (and we discovered that LDM-135 table has a typo in its table). After talking to K-T and trying to better to understand the model used to derive those numbers I managed to extract some figures from LDM-141 and produce few plots for current baseline and also for new proposed L1 approach (living database). I re-coded the model in IPython notebook (excel is too heavy for me) and could play with it to produce more details. Here are the plots: Current baseline In this model we "reset" DiaObject table to only keep most recent object version after each DR. DR "release time" is assumed to be several months later than RD "cutoff time" (also proportional to the data volume). Note that size below reflect only the current L1 database size, it does not include the size of the old archived copies. Number of rows per table as function of time: Size of tables as function of time (row size is taken on LDM-141 which may be underestimated): New proposal In new L1 design we do not reset DiaObjects but continue accumulating forever. Number of rows per table as function of time: Size of tables as function of time (row size is taken on LDM-141 which may be underestimated): Note also that size does not include any indexing or storage overhead.
            Hide
            salnikov Andy Salnikov added a comment -

            Closing, nothing to review yet.

            Show
            salnikov Andy Salnikov added a comment - Closing, nothing to review yet.

              People

              • Assignee:
                salnikov Andy Salnikov
                Reporter:
                fritzm Fritz Mueller
                Watchers:
                Andy Salnikov, Fritz Mueller, Jacek Becla, Kian-Tat Lim, Serge Monkewitz
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel