Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-12099

{squash,bokeh}.lsst.codes is down

    Details

      Description

      According to nagios, squash/bokeh have been down for ~4 days now:

      (for 4d 23h 9m 46s) (Has been acknowledged)

        Attachments

          Activity

          Hide
          jhoblitt Joshua Hoblitt added a comment -

          Note that this is also causing the jenkins validate_drp job to fail as it can't push results to squash. Eg. https://ci.lsst.codes/job/sqre/job/validate_drp/1076/

          Show
          jhoblitt Joshua Hoblitt added a comment - Note that this is also causing the jenkins validate_drp job to fail as it can't push results to squash. Eg. https://ci.lsst.codes/job/sqre/job/validate_drp/1076/
          Hide
          jhoblitt Joshua Hoblitt added a comment -

          The squash ec2 instance isn't respond to ssh connection attempts. I'm going to reboot it and if that fails, rebuild the instance.

          Show
          jhoblitt Joshua Hoblitt added a comment - The squash ec2 instance isn't respond to ssh connection attempts. I'm going to reboot it and if that fails, rebuild the instance.
          Hide
          jhoblitt Joshua Hoblitt added a comment -

          Stopping/starting the instance seems to have brought it back. Starting on Sept 28th, and continuing through Sept 29th, the the OOM was being constantly triggered on a uwsgi process:

          Sep 29 13:53:43 jenkins-squash kernel: uwsgi invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
          Sep 29 13:53:43 jenkins-squash kernel: uwsgi cpuset=/ mems_allowed=0
          Sep 29 13:53:43 jenkins-squash kernel: CPU: 1 PID: 31130 Comm: uwsgi Not tainted 3.10.0-229.20.1.el7.x86_64 #1
          Sep 29 13:53:43 jenkins-squash kernel: Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
          Sep 29 13:53:43 jenkins-squash kernel: ffff8800369ab8e0 00000000a87b6ab5 ffff8800e763ba58 ffffffff816045b6
          Sep 29 13:53:43 jenkins-squash kernel: ffff8800e763bae8 ffffffff815ff57f ffff8800e75cfa60 ffff8800e75cfa78
          Sep 29 13:53:43 jenkins-squash kernel: 05a0351200000202 fbfeeffb00000000 0000000000000006 ffffffff81117b03
          Sep 29 13:53:43 jenkins-squash kernel: Call Trace:
          Sep 29 13:53:43 jenkins-squash kernel: [<ffffffff816045b6>] dump_stack+0x19/0x1b
          Sep 29 13:53:43 jenkins-squash kernel: [<ffffffff815ff57f>] dump_header+0x8e/0x214
          Sep 29 13:53:43 jenkins-squash kernel: [<ffffffff81117b03>] ? proc_do_uts_string+0xf3/0x130
          Sep 29 13:53:43 jenkins-squash kernel: [<ffffffff8115a44e>] oom_kill_process+0x24e/0x3b0
          Sep 29 13:53:43 jenkins-squash kernel: [<ffffffff8115ac76>] out_of_memory+0x4b6/0x4f0
          Sep 29 13:53:43 jenkins-squash kernel: [<ffffffff81160e35>] __alloc_pages_nodemask+0xa95/0xb90
          Sep 29 13:53:43 jenkins-squash kernel: [<ffffffff811a28aa>] alloc_pages_vma+0x9a/0x140
          Sep 29 13:53:43 jenkins-squash kernel: [<ffffffff81182f6f>] handle_mm_fault+0x9ef/0xd60
          Sep 29 13:53:43 jenkins-squash kernel: [<ffffffff8160f866>] __do_page_fault+0x156/0x520
          Sep 29 13:53:43 jenkins-squash kernel: [<ffffffff81187cd9>] ? vma_merge+0x229/0x330
          Sep 29 13:53:43 jenkins-squash kernel: [<ffffffff81188e79>] ? do_brk+0x209/0x330
          Sep 29 13:53:43 jenkins-squash kernel: [<ffffffff8160fc4a>] do_page_fault+0x1a/0x70
          Sep 29 13:53:43 jenkins-squash kernel: [<ffffffff8160be88>] page_fault+0x28/0x30
          Sep 29 13:53:43 jenkins-squash kernel: Mem-Info:
          Sep 29 13:53:43 jenkins-squash kernel: Node 0 DMA per-cpu:
          Sep 29 13:53:43 jenkins-squash kernel: CPU    0: hi:    0, btch:   1 usd:   0
          Sep 29 13:53:43 jenkins-squash kernel: CPU    1: hi:    0, btch:   1 usd:   0
          Sep 29 13:53:43 jenkins-squash kernel: Node 0 DMA32 per-cpu:
          Sep 29 13:53:43 jenkins-squash kernel: CPU    0: hi:  186, btch:  31 usd:   0
          Sep 29 13:53:43 jenkins-squash kernel: CPU    1: hi:  186, btch:  31 usd:   0
          Sep 29 13:53:43 jenkins-squash kernel: active_anon:732973 inactive_anon:149209 isolated_anon:0
           active_file:1 inactive_file:1 isolated_file:0
           unevictable:0 dirty:0 writeback:0 unstable:0
           free:14839 slab_reclaimable:4563 slab_unreclaimable:7677
           mapped:1408 shmem:1449 pagetables:3575 bounce:0
           free_cma:0
          Sep 29 13:53:43 jenkins-squash kernel: Node 0 DMA free:14528kB min:192kB low:240kB high:288kB active_anon:324kB inactive_anon:808kB active_file:0kB inactive_file:0kB unevict
          able:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:84kB shmem:84kB slab_reclaimable:84kB slab_unreclai
          mable:60kB kernel_stack:32kB pagetables:24kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
          Sep 29 13:53:43 jenkins-squash kernel: lowmem_reserve[]: 0 3586 3586 3586
          Sep 29 13:53:43 jenkins-squash kernel: Node 0 DMA32 free:44828kB min:44860kB low:56072kB high:67288kB active_anon:2931568kB inactive_anon:596028kB active_file:4kB inactive_f
          ile:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3673728kB mlocked:0kB dirty:0kB writeback:0kB mapped:5548kB shmem:5712kB slab_reclaim
          able:18168kB slab_unreclaimable:30648kB kernel_stack:2640kB pagetables:14276kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:230 all_unreclaimable? ye
          s
          Sep 29 13:53:43 jenkins-squash kernel: lowmem_reserve[]: 0 0 0 0
          Sep 29 13:53:43 jenkins-squash kernel: Node 0 DMA: 10*4kB (UEM) 5*8kB (UE) 5*16kB (UM) 1*32kB (E) 0*64kB 2*128kB (EM) 3*256kB (UEM) 2*512kB (EM) 2*1024kB (UE) 3*2048kB (EMR)
           1*4096kB (M) = 14528kB
          Sep 29 13:53:43 jenkins-squash kernel: Node 0 DMA32: 919*4kB (UEM) 146*8kB (UEM) 535*16kB (UE) 348*32kB (UEM) 157*64kB (UEM) 40*128kB (UE) 2*256kB (UM) 1*512kB (M) 4*1024kB 
          (M) 0*2048kB 0*4096kB = 44828kB
          Sep 29 13:53:43 jenkins-squash kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
          Sep 29 13:53:43 jenkins-squash kernel: 19585 total pagecache pages
          Sep 29 13:53:43 jenkins-squash kernel: 18134 pages in swap cache
          Sep 29 13:53:43 jenkins-squash kernel: Swap cache stats: add 19713833, delete 19695699, find 10849744/12120331
          Sep 29 13:53:43 jenkins-squash kernel: Free swap  = 0kB
          Sep 29 13:53:43 jenkins-squash kernel: Total swap = 1048572kB
          Sep 29 13:53:43 jenkins-squash kernel: 982941 pages RAM
          Sep 29 13:53:43 jenkins-squash kernel: 0 pages HighMem/MovableOnly
          Sep 29 13:53:43 jenkins-squash kernel: 60533 pages reserved
          Sep 29 13:53:43 jenkins-squash kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
          Sep 29 13:53:43 jenkins-squash kernel: [  489]     0   489    28783        0      24      356             0 lvmetad
          Sep 29 13:53:43 jenkins-squash kernel: [  586]     0   586    29171       26      26       89         -1000 auditd
          Sep 29 13:53:43 jenkins-squash kernel: [  607]     0   607   108366      213      59      293             0 NetworkManager
          Sep 29 13:53:43 jenkins-squash kernel: [  613]     0   613   109196      228     100      866             0 rsyslogd
          Sep 29 13:53:43 jenkins-squash kernel: [  614]     0   614   137539      106      88     3066             0 tuned
          Sep 29 13:53:43 jenkins-squash kernel: [  620]    81   620    25145      121      18       59          -900 dbus-daemon
          Sep 29 13:53:43 jenkins-squash kernel: [  625]     0   625    27501        1      13       31             0 agetty
          Sep 29 13:53:43 jenkins-squash kernel: [  627]     0   627    27501        1      10       33             0 agetty
          Sep 29 13:53:43 jenkins-squash kernel: [  644]   999   644   128561       85      50     1346             0 polkitd
          Sep 29 13:53:43 jenkins-squash kernel: [ 2079]     0  2079    20627        0      41      229         -1000 sshd
          Sep 29 13:53:43 jenkins-squash kernel: [ 4633]     0  4633     4788       25      13       33             0 irqbalance
          Sep 29 13:53:43 jenkins-squash kernel: [ 4745]    38  4745     7343       38      18      111             0 ntpd
          Sep 29 13:53:43 jenkins-squash kernel: [ 5502]    99  5502    40951       66      31       95             0 gmond
          Sep 29 13:53:43 jenkins-squash kernel: [ 5997]     0  5997    31035       27      18      128             0 crond
          Sep 29 13:53:43 jenkins-squash kernel: [ 6056]     0  6056    15638     1540      35     1598             0 systemd-journal
          Sep 29 13:53:43 jenkins-squash kernel: [ 6057]     0  6057    11219        1      22      381         -1000 systemd-udevd
          Sep 29 13:53:43 jenkins-squash kernel: [ 4432]   994  4432    43366       10      39      929             0 uwsgi
          Sep 29 13:53:43 jenkins-squash kernel: [ 4434]   994  4434    16787       22      32      261             0 uwsgi
          Sep 29 13:53:43 jenkins-squash kernel: [ 4435]   994  4435    64974       29      80     5424             0 uwsgi
          Sep 29 13:53:43 jenkins-squash kernel: [ 1351]     0  1351    24930        0      26      318             0 nginx
          Sep 29 13:53:43 jenkins-squash kernel: [ 1352]   993  1352    25293      217      30      518             0 nginx
          Sep 29 13:53:43 jenkins-squash kernel: [32285]   994 32285   311948    31754     534   181312             0 uwsgi
          Sep 29 13:53:43 jenkins-squash kernel: [ 6010]   994  6010   310572   198331     531    13397             0 uwsgi
          Sep 29 13:53:43 jenkins-squash kernel: [26272]   997 26272     3918      159      12     1028             0 oauth2_proxy
          Sep 29 13:53:43 jenkins-squash kernel: [26283]   997 26283     4182      920      12      153             0 oauth2_proxy
          Sep 29 13:53:43 jenkins-squash kernel: [27634]     0 27634     6048       44      15       29             0 systemd-logind
          Sep 29 13:53:43 jenkins-squash kernel: [28289]   994 28289   385540   240515     603     6522             0 uwsgi
          Sep 29 13:53:43 jenkins-squash kernel: [29201]     0 29201    27626       71      57     3045             0 dhclient
          Sep 29 13:53:43 jenkins-squash kernel: [31130]   994 31130   486734   380300     862      284             0 uwsgi
          Sep 29 13:53:43 jenkins-squash kernel: [31164]   994 31164     1075       24       7        0             0 scl
          Sep 29 13:53:43 jenkins-squash kernel: [31165]   994 31165    28280       47      12        0             0 bash
          Sep 29 13:53:43 jenkins-squash kernel: [31168]   994 31168   105885    11642     130        0             0 bokeh
          Sep 29 13:53:43 jenkins-squash kernel: Out of memory: Kill process 31130 (uwsgi) score 322 or sacrifice child
          Sep 29 13:53:43 jenkins-squash kernel: Killed process 31130 (uwsgi) total-vm:1946936kB, anon-rss:1521164kB, file-rss:36kB
          
          

          Show
          jhoblitt Joshua Hoblitt added a comment - Stopping/starting the instance seems to have brought it back. Starting on Sept 28th, and continuing through Sept 29th, the the OOM was being constantly triggered on a uwsgi process: Sep 29 13 : 53 : 43 jenkins-squash kernel: uwsgi invoked oom-killer: gfp_mask= 0x280da , order= 0 , oom_score_adj= 0 Sep 29 13 : 53 : 43 jenkins-squash kernel: uwsgi cpuset=/ mems_allowed= 0 Sep 29 13 : 53 : 43 jenkins-squash kernel: CPU: 1 PID: 31130 Comm: uwsgi Not tainted 3.10 . 0 - 229.20 . 1 .el7.x86_64 # 1 Sep 29 13 : 53 : 43 jenkins-squash kernel: Hardware name: Xen HVM domU, BIOS 4.2 .amazon 11 / 11 / 2016 Sep 29 13 : 53 : 43 jenkins-squash kernel: ffff8800369ab8e0 00000000a87b6ab5 ffff8800e763ba58 ffffffff816045b6 Sep 29 13 : 53 : 43 jenkins-squash kernel: ffff8800e763bae8 ffffffff815ff57f ffff8800e75cfa60 ffff8800e75cfa78 Sep 29 13 : 53 : 43 jenkins-squash kernel: 05a0351200000202 fbfeeffb00000000 0000000000000006 ffffffff81117b03 Sep 29 13 : 53 : 43 jenkins-squash kernel: Call Trace: Sep 29 13 : 53 : 43 jenkins-squash kernel: [<ffffffff816045b6>] dump_stack+ 0x19 / 0x1b Sep 29 13 : 53 : 43 jenkins-squash kernel: [<ffffffff815ff57f>] dump_header+ 0x8e / 0x214 Sep 29 13 : 53 : 43 jenkins-squash kernel: [<ffffffff81117b03>] ? proc_do_uts_string+ 0xf3 / 0x130 Sep 29 13 : 53 : 43 jenkins-squash kernel: [<ffffffff8115a44e>] oom_kill_process+ 0x24e / 0x3b0 Sep 29 13 : 53 : 43 jenkins-squash kernel: [<ffffffff8115ac76>] out_of_memory+ 0x4b6 / 0x4f0 Sep 29 13 : 53 : 43 jenkins-squash kernel: [<ffffffff81160e35>] __alloc_pages_nodemask+ 0xa95 / 0xb90 Sep 29 13 : 53 : 43 jenkins-squash kernel: [<ffffffff811a28aa>] alloc_pages_vma+ 0x9a / 0x140 Sep 29 13 : 53 : 43 jenkins-squash kernel: [<ffffffff81182f6f>] handle_mm_fault+ 0x9ef / 0xd60 Sep 29 13 : 53 : 43 jenkins-squash kernel: [<ffffffff8160f866>] __do_page_fault+ 0x156 / 0x520 Sep 29 13 : 53 : 43 jenkins-squash kernel: [<ffffffff81187cd9>] ? vma_merge+ 0x229 / 0x330 Sep 29 13 : 53 : 43 jenkins-squash kernel: [<ffffffff81188e79>] ? do_brk+ 0x209 / 0x330 Sep 29 13 : 53 : 43 jenkins-squash kernel: [<ffffffff8160fc4a>] do_page_fault+ 0x1a / 0x70 Sep 29 13 : 53 : 43 jenkins-squash kernel: [<ffffffff8160be88>] page_fault+ 0x28 / 0x30 Sep 29 13 : 53 : 43 jenkins-squash kernel: Mem-Info: Sep 29 13 : 53 : 43 jenkins-squash kernel: Node 0 DMA per-cpu: Sep 29 13 : 53 : 43 jenkins-squash kernel: CPU 0 : hi: 0 , btch: 1 usd: 0 Sep 29 13 : 53 : 43 jenkins-squash kernel: CPU 1 : hi: 0 , btch: 1 usd: 0 Sep 29 13 : 53 : 43 jenkins-squash kernel: Node 0 DMA32 per-cpu: Sep 29 13 : 53 : 43 jenkins-squash kernel: CPU 0 : hi: 186 , btch: 31 usd: 0 Sep 29 13 : 53 : 43 jenkins-squash kernel: CPU 1 : hi: 186 , btch: 31 usd: 0 Sep 29 13 : 53 : 43 jenkins-squash kernel: active_anon: 732973 inactive_anon: 149209 isolated_anon: 0 active_file: 1 inactive_file: 1 isolated_file: 0 unevictable: 0 dirty: 0 writeback: 0 unstable: 0 free: 14839 slab_reclaimable: 4563 slab_unreclaimable: 7677 mapped: 1408 shmem: 1449 pagetables: 3575 bounce: 0 free_cma: 0 Sep 29 13 : 53 : 43 jenkins-squash kernel: Node 0 DMA free:14528kB min:192kB low:240kB high:288kB active_anon:324kB inactive_anon:808kB active_file:0kB inactive_file:0kB unevict able:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:84kB shmem:84kB slab_reclaimable:84kB slab_unreclai mable:60kB kernel_stack:32kB pagetables:24kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned: 0 all_unreclaimable? yes Sep 29 13 : 53 : 43 jenkins-squash kernel: lowmem_reserve[]: 0 3586 3586 3586 Sep 29 13 : 53 : 43 jenkins-squash kernel: Node 0 DMA32 free:44828kB min:44860kB low:56072kB high:67288kB active_anon:2931568kB inactive_anon:596028kB active_file:4kB inactive_f ile:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3673728kB mlocked:0kB dirty:0kB writeback:0kB mapped:5548kB shmem:5712kB slab_reclaim able:18168kB slab_unreclaimable:30648kB kernel_stack:2640kB pagetables:14276kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned: 230 all_unreclaimable? ye s Sep 29 13 : 53 : 43 jenkins-squash kernel: lowmem_reserve[]: 0 0 0 0 Sep 29 13 : 53 : 43 jenkins-squash kernel: Node 0 DMA: 10 *4kB (UEM) 5 *8kB (UE) 5 *16kB (UM) 1 *32kB (E) 0 *64kB 2 *128kB (EM) 3 *256kB (UEM) 2 *512kB (EM) 2 *1024kB (UE) 3 *2048kB (EMR) 1 *4096kB (M) = 14528kB Sep 29 13 : 53 : 43 jenkins-squash kernel: Node 0 DMA32: 919 *4kB (UEM) 146 *8kB (UEM) 535 *16kB (UE) 348 *32kB (UEM) 157 *64kB (UEM) 40 *128kB (UE) 2 *256kB (UM) 1 *512kB (M) 4 *1024kB (M) 0 *2048kB 0 *4096kB = 44828kB Sep 29 13 : 53 : 43 jenkins-squash kernel: Node 0 hugepages_total= 0 hugepages_free= 0 hugepages_surp= 0 hugepages_size=2048kB Sep 29 13 : 53 : 43 jenkins-squash kernel: 19585 total pagecache pages Sep 29 13 : 53 : 43 jenkins-squash kernel: 18134 pages in swap cache Sep 29 13 : 53 : 43 jenkins-squash kernel: Swap cache stats: add 19713833 , delete 19695699 , find 10849744 / 12120331 Sep 29 13 : 53 : 43 jenkins-squash kernel: Free swap = 0kB Sep 29 13 : 53 : 43 jenkins-squash kernel: Total swap = 1048572kB Sep 29 13 : 53 : 43 jenkins-squash kernel: 982941 pages RAM Sep 29 13 : 53 : 43 jenkins-squash kernel: 0 pages HighMem/MovableOnly Sep 29 13 : 53 : 43 jenkins-squash kernel: 60533 pages reserved Sep 29 13 : 53 : 43 jenkins-squash kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 489 ] 0 489 28783 0 24 356 0 lvmetad Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 586 ] 0 586 29171 26 26 89 - 1000 auditd Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 607 ] 0 607 108366 213 59 293 0 NetworkManager Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 613 ] 0 613 109196 228 100 866 0 rsyslogd Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 614 ] 0 614 137539 106 88 3066 0 tuned Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 620 ] 81 620 25145 121 18 59 - 900 dbus-daemon Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 625 ] 0 625 27501 1 13 31 0 agetty Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 627 ] 0 627 27501 1 10 33 0 agetty Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 644 ] 999 644 128561 85 50 1346 0 polkitd Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 2079 ] 0 2079 20627 0 41 229 - 1000 sshd Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 4633 ] 0 4633 4788 25 13 33 0 irqbalance Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 4745 ] 38 4745 7343 38 18 111 0 ntpd Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 5502 ] 99 5502 40951 66 31 95 0 gmond Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 5997 ] 0 5997 31035 27 18 128 0 crond Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 6056 ] 0 6056 15638 1540 35 1598 0 systemd-journal Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 6057 ] 0 6057 11219 1 22 381 - 1000 systemd-udevd Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 4432 ] 994 4432 43366 10 39 929 0 uwsgi Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 4434 ] 994 4434 16787 22 32 261 0 uwsgi Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 4435 ] 994 4435 64974 29 80 5424 0 uwsgi Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 1351 ] 0 1351 24930 0 26 318 0 nginx Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 1352 ] 993 1352 25293 217 30 518 0 nginx Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 32285 ] 994 32285 311948 31754 534 181312 0 uwsgi Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 6010 ] 994 6010 310572 198331 531 13397 0 uwsgi Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 26272 ] 997 26272 3918 159 12 1028 0 oauth2_proxy Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 26283 ] 997 26283 4182 920 12 153 0 oauth2_proxy Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 27634 ] 0 27634 6048 44 15 29 0 systemd-logind Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 28289 ] 994 28289 385540 240515 603 6522 0 uwsgi Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 29201 ] 0 29201 27626 71 57 3045 0 dhclient Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 31130 ] 994 31130 486734 380300 862 284 0 uwsgi Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 31164 ] 994 31164 1075 24 7 0 0 scl Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 31165 ] 994 31165 28280 47 12 0 0 bash Sep 29 13 : 53 : 43 jenkins-squash kernel: [ 31168 ] 994 31168 105885 11642 130 0 0 bokeh Sep 29 13 : 53 : 43 jenkins-squash kernel: Out of memory: Kill process 31130 (uwsgi) score 322 or sacrifice child Sep 29 13 : 53 : 43 jenkins-squash kernel: Killed process 31130 (uwsgi) total-vm:1946936kB, anon-rss:1521164kB, file-rss:36kB
          Hide
          jhoblitt Joshua Hoblitt added a comment -

          I'm guessing that uwsgi caching loading python modules, state, and/or data across multiple threads is running the system out of memory. Via top, I do see process sizes increase greatly (not reflected in the captures below) and then drop back down, which I'm guessing is db data being retrieved to send to the client:

          Accessing just the dashboard landing page:

          top - 09:57:10 up 12 min,  1 user,  load average: 0.04, 0.03, 0.05
          Tasks:   7 total,   0 running,   7 sleeping,   0 stopped,   0 zombie
          %Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
          KiB Mem :  3689632 total,  1491876 free,  1896956 used,   300800 buff/cache
          KiB Swap:  1048572 total,  1048572 free,        0 used.  1625408 avail Mem 
           
            PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                  
            750 uwsgi     20   0 1261552 874804   8728 S   0.0 23.7   0:27.36 uwsgi                                                                                                    
            747 uwsgi     20   0 1249780 860612   8728 S   0.0 23.3   0:09.24 uwsgi                                                                                                    
            668 uwsgi     20   0  259932  28516   6700 S   0.0  0.8   0:00.62 uwsgi                                                                                                    
            748 uwsgi     20   0  259932  22764    948 S   0.0  0.6   0:00.00 uwsgi                                                                                                    
            749 uwsgi     20   0  259932  22764    948 S   0.0  0.6   0:00.00 uwsgi                                                                                                    
            646 uwsgi     20   0  173464   7856   4120 S   0.0  0.2   0:00.05 uwsgi                                                                                                    
            666 uwsgi     20   0   67148   1672    540 S   0.0  0.0   0:00.03 uwsgi      
          

          After accessing the KPMs:

          top - 09:58:43 up 14 min,  1 user,  load average: 0.09, 0.05, 0.05
          Tasks:   7 total,   0 running,   7 sleeping,   0 stopped,   0 zombie
          %Cpu(s):  8.3 us,  4.8 sy,  0.0 ni, 85.1 id,  0.0 wa,  0.0 hi,  0.0 si,  1.7 st
          KiB Mem :  3689632 total,  1046688 free,  2358620 used,   284324 buff/cache
          KiB Swap:  1048572 total,  1048572 free,        0 used.  1163724 avail Mem 
           
            PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                  
            750 uwsgi     20   0 1261808 874952   8728 S  22.9 23.7   0:38.09 uwsgi                                                                                                    
            747 uwsgi     20   0 1249780 862720   8728 S  10.3 23.4   0:14.34 uwsgi                                                                                                    
            748 uwsgi     20   0  804328 417444   8736 S  13.6 11.3   0:08.65 uwsgi                                                                                                    
            749 uwsgi     20   0  447880  60980   8704 S   0.0  1.7   0:00.83 uwsgi                                                                                                    
            668 uwsgi     20   0  259932  28516   6700 S   0.0  0.8   0:00.63 uwsgi                                                                                                    
            646 uwsgi     20   0  173464   7856   4120 S   0.0  0.2   0:00.06 uwsgi                                                                                                    
            666 uwsgi     20   0   67148   1672    540 S   0.0  0.0   0:00.03 uwsgi       
          

          Show
          jhoblitt Joshua Hoblitt added a comment - I'm guessing that uwsgi caching loading python modules, state, and/or data across multiple threads is running the system out of memory. Via top, I do see process sizes increase greatly (not reflected in the captures below) and then drop back down, which I'm guessing is db data being retrieved to send to the client: Accessing just the dashboard landing page: top - 09 : 57 : 10 up 12 min, 1 user, load average: 0.04 , 0.03 , 0.05 Tasks: 7 total, 0 running, 7 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.0 sy, 0.0 ni, 100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 3689632 total, 1491876 free, 1896956 used, 300800 buff/cache KiB Swap: 1048572 total, 1048572 free, 0 used. 1625408 avail Mem   PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 750 uwsgi 20 0 1261552 874804 8728 S 0.0 23.7 0 : 27.36 uwsgi 747 uwsgi 20 0 1249780 860612 8728 S 0.0 23.3 0 : 09.24 uwsgi 668 uwsgi 20 0 259932 28516 6700 S 0.0 0.8 0 : 00.62 uwsgi 748 uwsgi 20 0 259932 22764 948 S 0.0 0.6 0 : 00.00 uwsgi 749 uwsgi 20 0 259932 22764 948 S 0.0 0.6 0 : 00.00 uwsgi 646 uwsgi 20 0 173464 7856 4120 S 0.0 0.2 0 : 00.05 uwsgi 666 uwsgi 20 0 67148 1672 540 S 0.0 0.0 0 : 00.03 uwsgi After accessing the KPMs: top - 09 : 58 : 43 up 14 min, 1 user, load average: 0.09 , 0.05 , 0.05 Tasks: 7 total, 0 running, 7 sleeping, 0 stopped, 0 zombie %Cpu(s): 8.3 us, 4.8 sy, 0.0 ni, 85.1 id, 0.0 wa, 0.0 hi, 0.0 si, 1.7 st KiB Mem : 3689632 total, 1046688 free, 2358620 used, 284324 buff/cache KiB Swap: 1048572 total, 1048572 free, 0 used. 1163724 avail Mem   PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 750 uwsgi 20 0 1261808 874952 8728 S 22.9 23.7 0 : 38.09 uwsgi 747 uwsgi 20 0 1249780 862720 8728 S 10.3 23.4 0 : 14.34 uwsgi 748 uwsgi 20 0 804328 417444 8736 S 13.6 11.3 0 : 08.65 uwsgi 749 uwsgi 20 0 447880 60980 8704 S 0.0 1.7 0 : 00.83 uwsgi 668 uwsgi 20 0 259932 28516 6700 S 0.0 0.8 0 : 00.63 uwsgi 646 uwsgi 20 0 173464 7856 4120 S 0.0 0.2 0 : 00.06 uwsgi 666 uwsgi 20 0 67148 1672 540 S 0.0 0.0 0 : 00.03 uwsgi
          Hide
          jhoblitt Joshua Hoblitt added a comment -

          I've limited the number of uwsgi/squash worker processes to 2 in hopes of ensuring there is enough free memory to buffer the large unconstrained db queries that I've expressed concern about in the past. The next best mitigation probably to switch to an instance type with more memory. The KPM page is still slow to load and sometimes has to be reloaded in order to get a display. I suspect that is related to some sort of u-a side websocket timeout but that is a wild guess.

          Show
          jhoblitt Joshua Hoblitt added a comment - I've limited the number of uwsgi/squash worker processes to 2 in hopes of ensuring there is enough free memory to buffer the large unconstrained db queries that I've expressed concern about in the past. The next best mitigation probably to switch to an instance type with more memory. The KPM page is still slow to load and sometimes has to be reloaded in order to get a display. I suspect that is related to some sort of u-a side websocket timeout but that is a wild guess.

            People

            • Assignee:
              jhoblitt Joshua Hoblitt
              Reporter:
              jhoblitt Joshua Hoblitt
              Watchers:
              Angelo Fausti, Frossie Economou, Joshua Hoblitt, Simon Krughoff
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Summary Panel