XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
2
• Sprint:
TSSW Sprint - Oct 28 - Nov 10
• Team:
Telescope and Site

#### Description

Create a simple set of code that simulates loading scripts and can be used to quickly try different solutions to speed up script loading.

This continues the work started in DM-21175 to speed up script loading.

3 kB

#### Activity

Hide
Russell Owen added a comment -

Could you please sanity check the code and configuration at https://github.com/r-owen/wait_for_historical_data

Show
Russell Owen added a comment - Could you please sanity check the code and configuration at https://github.com/r-owen/wait_for_historical_data
Hide
Dave Mills added a comment -

Tests of easily repeatable configurations established.
It might be worth trying again with a permanent durability service process in place on the node.

Show
Dave Mills added a comment - Tests of easily repeatable configurations established. It might be worth trying again with a permanent durability service process in place on the node.
Hide
Russell Owen added a comment -

Somebody from ADLink had a suggestion that seems to do the trick. Here are the new results. I will attach the configuration file and paste the explanation in another comment. Based on these results I believe I should change the script queue to load just a few scripts (configurable number) of scripts. We don't need to do more an there's no point having too many processes running. The rest will be queued without launching.

 [saluser@a3e36235cbda wait_for_historical_data]$bash run_variants.sh  INFO:script_runner.py:OSPL_URI=file:///home/saluser/tsrepos/examples/dds/wait_for_historical_data/ospl_standard.xml INFO:script_runner.py:Took 1.13 to make domain 0.01 to make components 0.29 to make topics INFO:script_runner.py:Took 0.16 seconds to launch 9 scripts source local make_domain make_components make_topics before_start start wait_history read_history Script 5 9.69 1.12 0.01 2.35 0.05 0.00 4.53 0.00 Script 3 9.74 1.14 0.00 2.28 0.06 0.00 4.56 0.00 Script 4 9.77 1.12 0.00 2.35 0.12 0.00 4.65 0.00 Script 7 9.84 1.12 0.01 2.18 0.08 0.00 4.86 0.00 Script 1 9.84 1.12 0.00 2.46 0.04 0.00 4.63 0.00 Script 2 9.91 1.15 0.00 2.38 0.03 0.00 4.69 0.00 Script 8 9.91 1.15 0.00 2.30 0.06 0.00 4.75 0.00 Script 9 9.95 1.13 0.01 2.41 0.02 0.00 4.69 0.00 Script 6 10.08 1.12 0.00 2.37 0.05 0.00 4.91 0.00   INFO:script_runner.py:OSPL_URI=file:///home/saluser/tsrepos/examples/dds/wait_for_historical_data/ospl_adlink.xml INFO:script_runner.py:Took 1.12 to make domain 0.00 to make components 0.23 to make topics INFO:script_runner.py:Took 0.08 seconds to launch 9 scripts source local make_domain make_components make_topics before_start start wait_history read_history Script 4 5.08 1.12 0.00 2.08 0.05 0.00 0.02 0.00 Script 1 5.29 1.13 0.01 2.20 0.06 0.00 0.06 0.00 Script 8 5.32 1.23 0.00 2.14 0.06 0.00 0.05 0.01 Script 9 5.41 1.13 0.01 2.20 0.06 0.00 0.11 0.00 Script 6 5.62 1.12 0.00 2.12 0.08 0.00 0.53 0.00 Script 5 5.69 1.21 0.00 2.09 0.04 0.00 0.31 0.00 Script 2 5.78 1.12 0.01 2.16 0.07 0.00 0.68 0.00 Script 7 5.82 1.21 0.00 2.03 0.05 0.00 0.52 0.00 Script 3 12.21 1.18 0.00 2.26 0.04 0.00 6.77 0.00 [saluser@a3e36235cbda wait_for_historical_data]$ 

Show
Russell Owen added a comment - Somebody from ADLink had a suggestion that seems to do the trick. Here are the new results. I will attach the configuration file and paste the explanation in another comment. Based on these results I believe I should change the script queue to load just a few scripts (configurable number) of scripts. We don't need to do more an there's no point having too many processes running. The rest will be queued without launching. [saluser@a3e36235cbda wait_for_historical_data]$bash run_variants.sh INFO:script_runner.py:OSPL_URI=file:///home/saluser/tsrepos/examples/dds/wait_for_historical_data/ospl_standard.xml INFO:script_runner.py:Took 1.13 to make domain 0.01 to make components 0.29 to make topics INFO:script_runner.py:Took 0.16 seconds to launch 9 scripts source local make_domain make_components make_topics before_start start wait_history read_history Script 5 9.69 1.12 0.01 2.35 0.05 0.00 4.53 0.00 Script 3 9.74 1.14 0.00 2.28 0.06 0.00 4.56 0.00 Script 4 9.77 1.12 0.00 2.35 0.12 0.00 4.65 0.00 Script 7 9.84 1.12 0.01 2.18 0.08 0.00 4.86 0.00 Script 1 9.84 1.12 0.00 2.46 0.04 0.00 4.63 0.00 Script 2 9.91 1.15 0.00 2.38 0.03 0.00 4.69 0.00 Script 8 9.91 1.15 0.00 2.30 0.06 0.00 4.75 0.00 Script 9 9.95 1.13 0.01 2.41 0.02 0.00 4.69 0.00 Script 6 10.08 1.12 0.00 2.37 0.05 0.00 4.91 0.00 INFO:script_runner.py:OSPL_URI=file:///home/saluser/tsrepos/examples/dds/wait_for_historical_data/ospl_adlink.xml INFO:script_runner.py:Took 1.12 to make domain 0.00 to make components 0.23 to make topics INFO:script_runner.py:Took 0.08 seconds to launch 9 scripts source local make_domain make_components make_topics before_start start wait_history read_history Script 4 5.08 1.12 0.00 2.08 0.05 0.00 0.02 0.00 Script 1 5.29 1.13 0.01 2.20 0.06 0.00 0.06 0.00 Script 8 5.32 1.23 0.00 2.14 0.06 0.00 0.05 0.01 Script 9 5.41 1.13 0.01 2.20 0.06 0.00 0.11 0.00 Script 6 5.62 1.12 0.00 2.12 0.08 0.00 0.53 0.00 Script 5 5.69 1.21 0.00 2.09 0.04 0.00 0.31 0.00 Script 2 5.78 1.12 0.01 2.16 0.07 0.00 0.68 0.00 Script 7 5.82 1.21 0.00 2.03 0.05 0.00 0.52 0.00 Script 3 12.21 1.18 0.00 2.26 0.04 0.00 6.77 0.00 [saluser@a3e36235cbda wait_for_historical_data]$
Hide
Russell Owen added a comment - - edited

Can you try the attached ospl_allchanges.xml
Whit this config I managed to get these numbers:

 INFO:script_runner.py:OSPL_URI=file:///home/thijs/wait_for_historical_data/ospl_allchanges.xml  INFO:script_runner.py:Took 1.12 to make domain 0.00 to make components 0.12 to make topics  INFO:script_runner.py:Took 0.08 seconds to launch 9 scripts  source local make_domain make_components make_topics before_start start wait_history read_history  Script 9 5.08 1.29 0.00 3.02 0.05 0.00 0.09 0.00  Script 2 5.08 1.24 0.00 2.75 0.05 0.00 0.60 0.00  Script 4 5.08 1.33 0.00 3.18 0.03 0.00 0.16 0.00  Script 5 5.08 1.28 0.00 3.27 0.04 0.00 0.09 0.00  Script 3 5.10 1.24 0.00 3.11 0.07 0.00 0.47 0.00  Script 1 5.11 1.25 0.00 3.05 0.04 0.00 0.58 0.00  Script 6 5.17 1.27 0.00 2.93 0.06 0.00 0.62 0.00  Script 8 5.29 1.25 0.00 3.14 0.04 0.00 0.55 0.00  Script 7 5.44 1.31 0.00 3.21 0.05 0.00 0.59 0.00 

Some background information on the options to tweak durability:

Performance impact at startup
The performance at boot time is for a large part determined by the InitialDiscoveryPeriod, the ExpiryTime, the amount data that needs to be retrieved, and the speed of the network channel that is used to retrieve the data. When persistency is configured required, also the performance of the disks may determine the overall performance at startup.

When a durability service starts, the first thing it does is look for other durability services (called fellows) that are present on the domain. The InitialDiscoveryPeriod specifies the time to look for other fellows.

When this time has expired a master durability service for each namespace will be choosen. The master durability service is the durability service that will need to gather all available data for a namespace, and distribute it to all the other durability services that have a namespace for it. Basically, there are two algorithms to determine a masters:

The legacy algorithm, which is configured when the policy for the name has NO masterPriority attribute or when the attribute is 255. When this algorithm is used there is a negotiation phase between the fellows to determine which one of the them is selected as master (using a leader election protocol). In this protocol first a proposal will be done for a candidate master, and only if all durability services agree the master is confirmed. If no agreement is reached then another round of proposals is started. This algorithm will take at best the Network/ExpiryTime in case agreement is reached the first time, but depending on the number of rounds that are needed this may be more.

If the new master selection algorithm is used (which is configured when masterPriority != 255 are used for the policy that applies to the namespace) then a master will be selected immediately. This is good in situations where the master will not change once it is choosen, but may lead to additional alignments in situations where the master can change (e.g., because a "better" master with a higher priority arrives later and a change of master is needed).

When a master has been selected then the master has to acquire all the available data for

Best Regards,

Thijs Sassen

Show
Russell Owen added a comment - - edited Can you try the attached ospl_allchanges.xml Whit this config I managed to get these numbers: INFO:script_runner.py:OSPL_URI=file:///home/thijs/wait_for_historical_data/ospl_allchanges.xml INFO:script_runner.py:Took 1.12 to make domain 0.00 to make components 0.12 to make topics INFO:script_runner.py:Took 0.08 seconds to launch 9 scripts source local make_domain make_components make_topics before_start start wait_history read_history Script 9 5.08 1.29 0.00 3.02 0.05 0.00 0.09 0.00 Script 2 5.08 1.24 0.00 2.75 0.05 0.00 0.60 0.00 Script 4 5.08 1.33 0.00 3.18 0.03 0.00 0.16 0.00 Script 5 5.08 1.28 0.00 3.27 0.04 0.00 0.09 0.00 Script 3 5.10 1.24 0.00 3.11 0.07 0.00 0.47 0.00 Script 1 5.11 1.25 0.00 3.05 0.04 0.00 0.58 0.00 Script 6 5.17 1.27 0.00 2.93 0.06 0.00 0.62 0.00 Script 8 5.29 1.25 0.00 3.14 0.04 0.00 0.55 0.00 Script 7 5.44 1.31 0.00 3.21 0.05 0.00 0.59 0.00 Some background information on the options to tweak durability: Performance impact at startup The performance at boot time is for a large part determined by the InitialDiscoveryPeriod, the ExpiryTime, the amount data that needs to be retrieved, and the speed of the network channel that is used to retrieve the data. When persistency is configured required, also the performance of the disks may determine the overall performance at startup. When a durability service starts, the first thing it does is look for other durability services (called fellows) that are present on the domain. The InitialDiscoveryPeriod specifies the time to look for other fellows. When this time has expired a master durability service for each namespace will be choosen. The master durability service is the durability service that will need to gather all available data for a namespace, and distribute it to all the other durability services that have a namespace for it. Basically, there are two algorithms to determine a masters: The legacy algorithm, which is configured when the policy for the name has NO masterPriority attribute or when the attribute is 255. When this algorithm is used there is a negotiation phase between the fellows to determine which one of the them is selected as master (using a leader election protocol). In this protocol first a proposal will be done for a candidate master, and only if all durability services agree the master is confirmed. If no agreement is reached then another round of proposals is started. This algorithm will take at best the Network/ExpiryTime in case agreement is reached the first time, but depending on the number of rounds that are needed this may be more. If the new master selection algorithm is used (which is configured when masterPriority != 255 are used for the policy that applies to the namespace) then a master will be selected immediately. This is good in situations where the master will not change once it is choosen, but may lead to additional alignments in situations where the master can change (e.g., because a "better" master with a higher priority arrives later and a change of master is needed). When a master has been selected then the master has to acquire all the available data for Best Regards, Thijs Sassen
Hide
Russell Owen added a comment -

On that same ADLink ticket I described our architecture and asked for advice on OpenSplice configuration options. In particular are they changes they suggested appropriate for our production environment and can they recommend any similar settings for our long running CSCs. I'll report back what I hear and if they suggest changes I'll file a ticket with those recommended changes.

Show
Russell Owen added a comment - On that same ADLink ticket I described our architecture and asked for advice on OpenSplice configuration options. In particular are they changes they suggested appropriate for our production environment and can they recommend any similar settings for our long running CSCs. I'll report back what I hear and if they suggest changes I'll file a ticket with those recommended changes.

#### People

Assignee:
Russell Owen
Reporter:
Russell Owen
Reviewers:
Dave Mills
Watchers:
Dave Mills, Russell Owen, Tiago Ribeiro