Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-11947

random internal compiler errors

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Continuous Integration
    • Labels:
      None

      Description

      Random internal compiler errors are being observed when building c++ code. Initially, it was beloved this was only occurring under py2 but it has also be observed with py3.

      :::::  [2017-09-19T10:49:24.541101Z] c++: internal compiler error: Killed (program cc1plus)
      :::::  [2017-09-19T10:49:24.546648Z] Please submit a full bug report,
      :::::  [2017-09-19T10:49:24.546677Z] with preprocessed source if appropriate.
      :::::  [2017-09-19T10:49:24.546690Z] See <http://bugzilla.redhat.com/bugzilla> for instructions.
      :::::  [2017-09-19T10:49:24.551936Z] scons: *** [src/KernelSolution.os] Error 4
      :::::  [2017-09-19T10:49:34.546600Z] scons: building terminated because of errors.
      

      The error above was observed on el7-4, which does have OOM invocations in the system journal from early this morning. Strangely, the system was listing a surprising low amount of available memory, even after forcing a cache dump. Running processes could only account for less than 1GiB of memory in usage.

      $ cat /proc/meminfo 
      MemTotal:       14973596 kB
      MemFree:         7954100 kB
      MemAvailable:    7898996 kB
      Buffers:               0 kB
      ...
      

      A reboot seemed to add ~6GiB of available memory. It appears that there might be a slow kernel memory leak.

      $ cat /proc/meminfo 
      MemTotal:       15234532 kB
      MemFree:        14383516 kB
      MemAvailable:   14511768 kB
      ...
      

        Attachments

          Issue Links

            Activity

            Hide
            jhoblitt Joshua Hoblitt added a comment -

            It appears that all of the unrebooted el7 nodes are reporting less available memory than expected.

            # https://stackoverflow.com/questions/26583979/run-a-remote-command-on-all-jenkins-slaves-via-masterss-script-console
            import hudson.util.RemotingDiagnostics
             
            cmd = 'def proc = "free".execute(); proc.waitFor(); println proc.in.text'
            cmd2 = 'def proc = "uptime".execute(); proc.waitFor(); println proc.in.text'
             
            for (slave in hudson.model.Hudson.instance.slaves) {
                println slave.name
                println RemotingDiagnostics.executeGroovy(cmd, slave.getChannel())
                println RemotingDiagnostics.executeGroovy(cmd2, slave.getChannel())
            }
            

            ...
             
            jenkins-el6-1
                         total       used       free     shared    buffers     cached
            Mem:      15297728   13612324    1685404        140      66580   12745740
            -/+ buffers/cache:     800004   14497724
            Swap:            0          0          0
             
             
             09:00:25 up 454 days, 14:31,  0 users,  load average: 0.00, 0.00, 0.00
             
             
            jenkins-el6-2
                         total       used       free     shared    buffers     cached
            Mem:      15297728   11018440    4279288        140     102660    9976424
            -/+ buffers/cache:     939356   14358372
            Swap:            0          0          0
             
             
             09:00:25 up 474 days, 12:56,  0 users,  load average: 0.00, 0.00, 0.00
             
             
            jenkins-el7-1
                          total        used        free      shared  buff/cache   available
            Mem:       14973596     2830672    11653732      369108      489192    11608060
            Swap:             0           0           0
             
             
             09:00:25 up 182 days, 16:28,  0 users,  load average: 0.00, 0.01, 0.05
             
             
            jenkins-el7-2
                          total        used        free      shared  buff/cache   available
            Mem:       14973596     3359216     1761788      487076     9852592    10833480
            Swap:             0           0           0
             
             
             09:00:25 up 210 days, 22:38,  0 users,  load average: 0.00, 0.01, 0.05
             
             
            jenkins-el7-3
                          total        used        free      shared  buff/cache   available
            Mem:       14973596     4429620     7463980      475580     3079996     9765416
            Swap:             0           0           0
             
             
             09:00:25 up 210 days, 22:28,  0 users,  load average: 0.00, 0.01, 0.05
             
             
            jenkins-el7-4
                          total        used        free      shared  buff/cache   available
            Mem:       15234532      479032    13299028       16628     1456472    14418868
            Swap:             0           0           0
             
             
             09:00:25 up 28 min,  0 users,  load average: 0.00, 0.01, 0.05
             
             
            jenkins-master
                          total        used        free      shared  buff/cache   available
            Mem:        3689628     2355504      156052      187460     1178072      877176
            Swap:             0           0           0
             
             
             09:00:25 up 454 days, 14:59,  0 users,  load average: 0.00, 0.01, 0.05
             
             
            jenkins-snowflake-1
                          total        used        free      shared  buff/cache   available
            Mem:       16005784      465904     1157620       33000    14382260    15206456
            Swap:             0           0           0
             
             
             09:00:25 up 12 days, 22:13,  0 users,  load average: 0.00, 0.01, 0.05
             
            ...
            

            Show
            jhoblitt Joshua Hoblitt added a comment - It appears that all of the unrebooted el7 nodes are reporting less available memory than expected. # https: //stackoverflow.com/questions/26583979/run-a-remote-command-on-all-jenkins-slaves-via-masterss-script-console import hudson.util.RemotingDiagnostics   cmd = 'def proc = "free".execute(); proc.waitFor(); println proc.in.text' cmd2 = 'def proc = "uptime".execute(); proc.waitFor(); println proc.in.text'   for (slave in hudson.model.Hudson.instance.slaves) { println slave.name println RemotingDiagnostics.executeGroovy(cmd, slave.getChannel()) println RemotingDiagnostics.executeGroovy(cmd2, slave.getChannel()) } ...   jenkins-el6- 1 total used free shared buffers cached Mem: 15297728 13612324 1685404 140 66580 12745740 -/+ buffers/cache: 800004 14497724 Swap: 0 0 0     09 : 00 : 25 up 454 days, 14 : 31 , 0 users, load average: 0.00 , 0.00 , 0.00     jenkins-el6- 2 total used free shared buffers cached Mem: 15297728 11018440 4279288 140 102660 9976424 -/+ buffers/cache: 939356 14358372 Swap: 0 0 0     09 : 00 : 25 up 474 days, 12 : 56 , 0 users, load average: 0.00 , 0.00 , 0.00     jenkins-el7- 1 total used free shared buff/cache available Mem: 14973596 2830672 11653732 369108 489192 11608060 Swap: 0 0 0     09 : 00 : 25 up 182 days, 16 : 28 , 0 users, load average: 0.00 , 0.01 , 0.05     jenkins-el7- 2 total used free shared buff/cache available Mem: 14973596 3359216 1761788 487076 9852592 10833480 Swap: 0 0 0     09 : 00 : 25 up 210 days, 22 : 38 , 0 users, load average: 0.00 , 0.01 , 0.05     jenkins-el7- 3 total used free shared buff/cache available Mem: 14973596 4429620 7463980 475580 3079996 9765416 Swap: 0 0 0     09 : 00 : 25 up 210 days, 22 : 28 , 0 users, load average: 0.00 , 0.01 , 0.05     jenkins-el7- 4 total used free shared buff/cache available Mem: 15234532 479032 13299028 16628 1456472 14418868 Swap: 0 0 0     09 : 00 : 25 up 28 min, 0 users, load average: 0.00 , 0.01 , 0.05     jenkins-master total used free shared buff/cache available Mem: 3689628 2355504 156052 187460 1178072 877176 Swap: 0 0 0     09 : 00 : 25 up 454 days, 14 : 59 , 0 users, load average: 0.00 , 0.01 , 0.05     jenkins-snowflake- 1 total used free shared buff/cache available Mem: 16005784 465904 1157620 33000 14382260 15206456 Swap: 0 0 0     09 : 00 : 25 up 12 days, 22 : 13 , 0 users, load average: 0.00 , 0.01 , 0.05   ...
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            My hunch is that docker's frequent mounting/unmounting of loopback filesystems was resulting in a very slow memory leak on the fairly ancient kernel version that was running.

            All of the centos 7 build nodes have been upgraded to 3.10.0-693.2.2.el7.x86_64 and rebooted.

            import hudson.util.RemotingDiagnostics
             
            def cmds = ['free', 'uptime', 'uname -r']
             
            for (slave in hudson.model.Hudson.instance.slaves) {
              println slave.name
              cmds.each { it ->
                def run = "def proc = '${it}'.execute(); proc.waitFor(); println proc.in.text"
                print RemotingDiagnostics.executeGroovy(run, slave.getChannel())
              }
            }
            

            jenkins-el6-1
                         total       used       free     shared    buffers     cached
            Mem:      15297728   13613516    1684212        140      67576   12745764
            -/+ buffers/cache:     800176   14497552
            Swap:            0          0          0
             
             09:32:31 up 454 days, 15:03,  0 users,  load average: 0.03, 0.01, 0.00
             
            2.6.32-504.23.4.el6.x86_64
             
            jenkins-el6-2
                         total       used       free     shared    buffers     cached
            Mem:      15297728   11019372    4278356        140     103624    9976448
            -/+ buffers/cache:     939300   14358428
            Swap:            0          0          0
             
             09:32:31 up 474 days, 13:28,  0 users,  load average: 0.13, 0.10, 0.03
             
            2.6.32-504.23.4.el6.x86_64
             
            jenkins-el7-1
                          total        used        free      shared  buff/cache   available
            Mem:       15234532      429252    14352016       16620      453264    14486824
            Swap:             0           0           0
             
             09:32:31 up 18 min,  0 users,  load average: 0.00, 0.01, 0.05
             
            3.10.0-693.2.2.el7.x86_64
             
            jenkins-el7-2
                          total        used        free      shared  buff/cache   available
            Mem:       15234532      455272    14196504       16636      582756    14446000
            Swap:             0           0           0
             
             09:32:31 up 18 min,  0 users,  load average: 0.00, 0.01, 0.05
             
            3.10.0-693.2.2.el7.x86_64
             
            jenkins-el7-3
                          total        used        free      shared  buff/cache   available
            Mem:       15234532      450288    14196664       16780      587580    14450652
            Swap:             0           0           0
             
             09:32:31 up 11 min,  0 users,  load average: 0.06, 0.09, 0.06
             
            3.10.0-693.2.2.el7.x86_64
             
            jenkins-el7-4
                          total        used        free      shared  buff/cache   available
            Mem:       15234532      420168    14398928       16708      415436    14505052
            Swap:             0           0           0
             
             09:32:31 up 11 min,  0 users,  load average: 0.00, 0.03, 0.02
             
            3.10.0-693.2.2.el7.x86_64
             
            jenkins-master
                          total        used        free      shared  buff/cache   available
            Mem:        3689628     2386080      174364      187460     1129184      848636
            Swap:             0           0           0
             
             09:32:31 up 454 days, 15:31,  0 users,  load average: 0.00, 0.01, 0.05
             
            3.10.0-229.4.2.el7.x86_64
             
            jenkins-snowflake-1
                          total        used        free      shared  buff/cache   available
            Mem:       16266720      403548    15404268       16708      458904    15552224
            Swap:             0           0           0
             
             09:32:31 up 6 min,  0 users,  load average: 0.03, 0.05, 0.04
             
            3.10.0-693.2.2.el7.x86_64
            

            Show
            jhoblitt Joshua Hoblitt added a comment - My hunch is that docker's frequent mounting/unmounting of loopback filesystems was resulting in a very slow memory leak on the fairly ancient kernel version that was running. All of the centos 7 build nodes have been upgraded to 3.10.0-693.2.2.el7.x86_64 and rebooted. import hudson.util.RemotingDiagnostics   def cmds = [ 'free' , 'uptime' , 'uname -r' ]   for (slave in hudson.model.Hudson.instance.slaves) { println slave.name cmds.each { it -> def run = "def proc = '${it}'.execute(); proc.waitFor(); println proc.in.text" print RemotingDiagnostics.executeGroovy(run, slave.getChannel()) } } jenkins-el6- 1 total used free shared buffers cached Mem: 15297728 13613516 1684212 140 67576 12745764 -/+ buffers/cache: 800176 14497552 Swap: 0 0 0   09 : 32 : 31 up 454 days, 15 : 03 , 0 users, load average: 0.03 , 0.01 , 0.00   2.6 . 32 - 504.23 . 4 .el6.x86_64   jenkins-el6- 2 total used free shared buffers cached Mem: 15297728 11019372 4278356 140 103624 9976448 -/+ buffers/cache: 939300 14358428 Swap: 0 0 0   09 : 32 : 31 up 474 days, 13 : 28 , 0 users, load average: 0.13 , 0.10 , 0.03   2.6 . 32 - 504.23 . 4 .el6.x86_64   jenkins-el7- 1 total used free shared buff/cache available Mem: 15234532 429252 14352016 16620 453264 14486824 Swap: 0 0 0   09 : 32 : 31 up 18 min, 0 users, load average: 0.00 , 0.01 , 0.05   3.10 . 0 - 693.2 . 2 .el7.x86_64   jenkins-el7- 2 total used free shared buff/cache available Mem: 15234532 455272 14196504 16636 582756 14446000 Swap: 0 0 0   09 : 32 : 31 up 18 min, 0 users, load average: 0.00 , 0.01 , 0.05   3.10 . 0 - 693.2 . 2 .el7.x86_64   jenkins-el7- 3 total used free shared buff/cache available Mem: 15234532 450288 14196664 16780 587580 14450652 Swap: 0 0 0   09 : 32 : 31 up 11 min, 0 users, load average: 0.06 , 0.09 , 0.06   3.10 . 0 - 693.2 . 2 .el7.x86_64   jenkins-el7- 4 total used free shared buff/cache available Mem: 15234532 420168 14398928 16708 415436 14505052 Swap: 0 0 0   09 : 32 : 31 up 11 min, 0 users, load average: 0.00 , 0.03 , 0.02   3.10 . 0 - 693.2 . 2 .el7.x86_64   jenkins-master total used free shared buff/cache available Mem: 3689628 2386080 174364 187460 1129184 848636 Swap: 0 0 0   09 : 32 : 31 up 454 days, 15 : 31 , 0 users, load average: 0.00 , 0.01 , 0.05   3.10 . 0 - 229.4 . 2 .el7.x86_64   jenkins-snowflake- 1 total used free shared buff/cache available Mem: 16266720 403548 15404268 16708 458904 15552224 Swap: 0 0 0   09 : 32 : 31 up 6 min, 0 users, load average: 0.03 , 0.05 , 0.04   3.10 . 0 - 693.2 . 2 .el7.x86_64
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            I'm moving this ticket into review status as a 'wait and see what happens".

            Show
            jhoblitt Joshua Hoblitt added a comment - I'm moving this ticket into review status as a 'wait and see what happens".
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            No additional reports of ICEs have been received – assuming this issue is fixed.

            Show
            jhoblitt Joshua Hoblitt added a comment - No additional reports of ICEs have been received – assuming this issue is fixed.

              People

              • Assignee:
                jhoblitt Joshua Hoblitt
                Reporter:
                jhoblitt Joshua Hoblitt
                Reviewers:
                Joshua Hoblitt
                Watchers:
                Joshua Hoblitt, Tim Jenness
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel