Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-21972

Create simple code to simulate script loading

    XMLWordPrintable

    Details

    • Story Points:
      2
    • Sprint:
      TSSW Sprint - Oct 28 - Nov 10
    • Team:
      Telescope and Site

      Description

      Create a simple set of code that simulates loading scripts and can be used to quickly try different solutions to speed up script loading.

      This continues the work started in DM-21175 to speed up script loading.

        Attachments

          Issue Links

            Activity

            Hide
            rowen Russell Owen added a comment -

            Could you please sanity check the code and configuration at https://github.com/r-owen/wait_for_historical_data

            Show
            rowen Russell Owen added a comment - Could you please sanity check the code and configuration at https://github.com/r-owen/wait_for_historical_data
            Hide
            dmills Dave Mills added a comment -

            Tests of easily repeatable configurations established.
            It might be worth trying again with a permanent durability service process in place on the node.

            Show
            dmills Dave Mills added a comment - Tests of easily repeatable configurations established. It might be worth trying again with a permanent durability service process in place on the node.
            Hide
            rowen Russell Owen added a comment -

            Somebody from ADLink had a suggestion that seems to do the trick. Here are the new results. I will attach the configuration file and paste the explanation in another comment. Based on these results I believe I should change the script queue to load just a few scripts (configurable number) of scripts. We don't need to do more an there's no point having too many processes running. The rest will be queued without launching.

            [saluser@a3e36235cbda wait_for_historical_data]$ bash run_variants.sh 
            INFO:script_runner.py:OSPL_URI=file:///home/saluser/tsrepos/examples/dds/wait_for_historical_data/ospl_standard.xml
            INFO:script_runner.py:Took 1.13 to make domain 0.01 to make components 0.29 to make topics
            INFO:script_runner.py:Took 0.16 seconds to launch 9 scripts
            source       local    make_domain  make_components  make_topics before_start start        wait_history  read_history
            Script  5     9.69    1.12         0.01             2.35        0.05         0.00         4.53          0.00
            Script  3     9.74    1.14         0.00             2.28        0.06         0.00         4.56          0.00
            Script  4     9.77    1.12         0.00             2.35        0.12         0.00         4.65          0.00
            Script  7     9.84    1.12         0.01             2.18        0.08         0.00         4.86          0.00
            Script  1     9.84    1.12         0.00             2.46        0.04         0.00         4.63          0.00
            Script  2     9.91    1.15         0.00             2.38        0.03         0.00         4.69          0.00
            Script  8     9.91    1.15         0.00             2.30        0.06         0.00         4.75          0.00
            Script  9     9.95    1.13         0.01             2.41        0.02         0.00         4.69          0.00
            Script  6    10.08    1.12         0.00             2.37        0.05         0.00         4.91          0.00
             
            INFO:script_runner.py:OSPL_URI=file:///home/saluser/tsrepos/examples/dds/wait_for_historical_data/ospl_adlink.xml
            INFO:script_runner.py:Took 1.12 to make domain 0.00 to make components 0.23 to make topics
            INFO:script_runner.py:Took 0.08 seconds to launch 9 scripts
            source       local    make_domain  make_components  make_topics before_start start        wait_history  read_history
            Script  4     5.08    1.12         0.00             2.08        0.05         0.00         0.02          0.00
            Script  1     5.29    1.13         0.01             2.20        0.06         0.00         0.06          0.00
            Script  8     5.32    1.23         0.00             2.14        0.06         0.00         0.05          0.01
            Script  9     5.41    1.13         0.01             2.20        0.06         0.00         0.11          0.00
            Script  6     5.62    1.12         0.00             2.12        0.08         0.00         0.53          0.00
            Script  5     5.69    1.21         0.00             2.09        0.04         0.00         0.31          0.00
            Script  2     5.78    1.12         0.01             2.16        0.07         0.00         0.68          0.00
            Script  7     5.82    1.21         0.00             2.03        0.05         0.00         0.52          0.00
            Script  3    12.21    1.18         0.00             2.26        0.04         0.00         6.77          0.00
            [saluser@a3e36235cbda wait_for_historical_data]$ 
            

            Show
            rowen Russell Owen added a comment - Somebody from ADLink had a suggestion that seems to do the trick. Here are the new results. I will attach the configuration file and paste the explanation in another comment. Based on these results I believe I should change the script queue to load just a few scripts (configurable number) of scripts. We don't need to do more an there's no point having too many processes running. The rest will be queued without launching. [saluser@a3e36235cbda wait_for_historical_data]$ bash run_variants.sh INFO:script_runner.py:OSPL_URI=file:///home/saluser/tsrepos/examples/dds/wait_for_historical_data/ospl_standard.xml INFO:script_runner.py:Took 1.13 to make domain 0.01 to make components 0.29 to make topics INFO:script_runner.py:Took 0.16 seconds to launch 9 scripts source local make_domain make_components make_topics before_start start wait_history read_history Script 5 9.69 1.12 0.01 2.35 0.05 0.00 4.53 0.00 Script 3 9.74 1.14 0.00 2.28 0.06 0.00 4.56 0.00 Script 4 9.77 1.12 0.00 2.35 0.12 0.00 4.65 0.00 Script 7 9.84 1.12 0.01 2.18 0.08 0.00 4.86 0.00 Script 1 9.84 1.12 0.00 2.46 0.04 0.00 4.63 0.00 Script 2 9.91 1.15 0.00 2.38 0.03 0.00 4.69 0.00 Script 8 9.91 1.15 0.00 2.30 0.06 0.00 4.75 0.00 Script 9 9.95 1.13 0.01 2.41 0.02 0.00 4.69 0.00 Script 6 10.08 1.12 0.00 2.37 0.05 0.00 4.91 0.00   INFO:script_runner.py:OSPL_URI=file:///home/saluser/tsrepos/examples/dds/wait_for_historical_data/ospl_adlink.xml INFO:script_runner.py:Took 1.12 to make domain 0.00 to make components 0.23 to make topics INFO:script_runner.py:Took 0.08 seconds to launch 9 scripts source local make_domain make_components make_topics before_start start wait_history read_history Script 4 5.08 1.12 0.00 2.08 0.05 0.00 0.02 0.00 Script 1 5.29 1.13 0.01 2.20 0.06 0.00 0.06 0.00 Script 8 5.32 1.23 0.00 2.14 0.06 0.00 0.05 0.01 Script 9 5.41 1.13 0.01 2.20 0.06 0.00 0.11 0.00 Script 6 5.62 1.12 0.00 2.12 0.08 0.00 0.53 0.00 Script 5 5.69 1.21 0.00 2.09 0.04 0.00 0.31 0.00 Script 2 5.78 1.12 0.01 2.16 0.07 0.00 0.68 0.00 Script 7 5.82 1.21 0.00 2.03 0.05 0.00 0.52 0.00 Script 3 12.21 1.18 0.00 2.26 0.04 0.00 6.77 0.00 [saluser@a3e36235cbda wait_for_historical_data]$
            Hide
            rowen Russell Owen added a comment - - edited

            Can you try the attached ospl_allchanges.xml
            Whit this config I managed to get these numbers:

            INFO:script_runner.py:OSPL_URI=file:///home/thijs/wait_for_historical_data/ospl_allchanges.xml 
            INFO:script_runner.py:Took 1.12 to make domain 0.00 to make components 0.12 to make topics 
            INFO:script_runner.py:Took 0.08 seconds to launch 9 scripts 
            source local make_domain make_components make_topics before_start start wait_history read_history 
            Script 9 5.08 1.29 0.00 3.02 0.05 0.00 0.09 0.00 
            Script 2 5.08 1.24 0.00 2.75 0.05 0.00 0.60 0.00 
            Script 4 5.08 1.33 0.00 3.18 0.03 0.00 0.16 0.00 
            Script 5 5.08 1.28 0.00 3.27 0.04 0.00 0.09 0.00 
            Script 3 5.10 1.24 0.00 3.11 0.07 0.00 0.47 0.00 
            Script 1 5.11 1.25 0.00 3.05 0.04 0.00 0.58 0.00 
            Script 6 5.17 1.27 0.00 2.93 0.06 0.00 0.62 0.00 
            Script 8 5.29 1.25 0.00 3.14 0.04 0.00 0.55 0.00 
            Script 7 5.44 1.31 0.00 3.21 0.05 0.00 0.59 0.00 
            

            Some background information on the options to tweak durability:

            Performance impact at startup
            The performance at boot time is for a large part determined by the InitialDiscoveryPeriod, the ExpiryTime, the amount data that needs to be retrieved, and the speed of the network channel that is used to retrieve the data. When persistency is configured required, also the performance of the disks may determine the overall performance at startup.

            When a durability service starts, the first thing it does is look for other durability services (called fellows) that are present on the domain. The InitialDiscoveryPeriod specifies the time to look for other fellows.

            When this time has expired a master durability service for each namespace will be choosen. The master durability service is the durability service that will need to gather all available data for a namespace, and distribute it to all the other durability services that have a namespace for it. Basically, there are two algorithms to determine a masters:

            The legacy algorithm, which is configured when the policy for the name has NO masterPriority attribute or when the attribute is 255. When this algorithm is used there is a negotiation phase between the fellows to determine which one of the them is selected as master (using a leader election protocol). In this protocol first a proposal will be done for a candidate master, and only if all durability services agree the master is confirmed. If no agreement is reached then another round of proposals is started. This algorithm will take at best the Network/ExpiryTime in case agreement is reached the first time, but depending on the number of rounds that are needed this may be more.

            If the new master selection algorithm is used (which is configured when masterPriority != 255 are used for the policy that applies to the namespace) then a master will be selected immediately. This is good in situations where the master will not change once it is choosen, but may lead to additional alignments in situations where the master can change (e.g., because a "better" master with a higher priority arrives later and a change of master is needed).

            When a master has been selected then the master has to acquire all the available data for

            Best Regards,

            Thijs Sassen

            Show
            rowen Russell Owen added a comment - - edited Can you try the attached ospl_allchanges.xml Whit this config I managed to get these numbers: INFO:script_runner.py:OSPL_URI=file:///home/thijs/wait_for_historical_data/ospl_allchanges.xml INFO:script_runner.py:Took 1.12 to make domain 0.00 to make components 0.12 to make topics INFO:script_runner.py:Took 0.08 seconds to launch 9 scripts source local make_domain make_components make_topics before_start start wait_history read_history Script 9 5.08 1.29 0.00 3.02 0.05 0.00 0.09 0.00 Script 2 5.08 1.24 0.00 2.75 0.05 0.00 0.60 0.00 Script 4 5.08 1.33 0.00 3.18 0.03 0.00 0.16 0.00 Script 5 5.08 1.28 0.00 3.27 0.04 0.00 0.09 0.00 Script 3 5.10 1.24 0.00 3.11 0.07 0.00 0.47 0.00 Script 1 5.11 1.25 0.00 3.05 0.04 0.00 0.58 0.00 Script 6 5.17 1.27 0.00 2.93 0.06 0.00 0.62 0.00 Script 8 5.29 1.25 0.00 3.14 0.04 0.00 0.55 0.00 Script 7 5.44 1.31 0.00 3.21 0.05 0.00 0.59 0.00 Some background information on the options to tweak durability: Performance impact at startup The performance at boot time is for a large part determined by the InitialDiscoveryPeriod, the ExpiryTime, the amount data that needs to be retrieved, and the speed of the network channel that is used to retrieve the data. When persistency is configured required, also the performance of the disks may determine the overall performance at startup. When a durability service starts, the first thing it does is look for other durability services (called fellows) that are present on the domain. The InitialDiscoveryPeriod specifies the time to look for other fellows. When this time has expired a master durability service for each namespace will be choosen. The master durability service is the durability service that will need to gather all available data for a namespace, and distribute it to all the other durability services that have a namespace for it. Basically, there are two algorithms to determine a masters: The legacy algorithm, which is configured when the policy for the name has NO masterPriority attribute or when the attribute is 255. When this algorithm is used there is a negotiation phase between the fellows to determine which one of the them is selected as master (using a leader election protocol). In this protocol first a proposal will be done for a candidate master, and only if all durability services agree the master is confirmed. If no agreement is reached then another round of proposals is started. This algorithm will take at best the Network/ExpiryTime in case agreement is reached the first time, but depending on the number of rounds that are needed this may be more. If the new master selection algorithm is used (which is configured when masterPriority != 255 are used for the policy that applies to the namespace) then a master will be selected immediately. This is good in situations where the master will not change once it is choosen, but may lead to additional alignments in situations where the master can change (e.g., because a "better" master with a higher priority arrives later and a change of master is needed). When a master has been selected then the master has to acquire all the available data for Best Regards, Thijs Sassen
            Hide
            rowen Russell Owen added a comment -

            On that same ADLink ticket I described our architecture and asked for advice on OpenSplice configuration options. In particular are they changes they suggested appropriate for our production environment and can they recommend any similar settings for our long running CSCs. I'll report back what I hear and if they suggest changes I'll file a ticket with those recommended changes.

            Show
            rowen Russell Owen added a comment - On that same ADLink ticket I described our architecture and asked for advice on OpenSplice configuration options. In particular are they changes they suggested appropriate for our production environment and can they recommend any similar settings for our long running CSCs. I'll report back what I hear and if they suggest changes I'll file a ticket with those recommended changes.

              People

              Assignee:
              rowen Russell Owen
              Reporter:
              rowen Russell Owen
              Reviewers:
              Dave Mills
              Watchers:
              Dave Mills, Russell Owen, Tiago Ribeiro
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.