Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-3102

Resolve segmentation fault in LoggingEvent destructor

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: log, Qserv
    • Labels:
      None

      Description

      There seems to be a possible race condition in log4cxx::spi::LoggingEvent::~LoggingEvent. I've had multiple segmentation faults in that function. In all cases, another thread was involved in writing. In at least 2 cases, the second thread was in XrdCl::LogOutFile::Write.

        Attachments

          Issue Links

            Activity

            Hide
            fritzm Fritz Mueller added a comment - - edited

            Okay, some progress!

            Updated John's branches with all the latest from Andy H, and things are seeming much better. Have not been able to repro the LoggingEvent dtor bug since. However, I have seen this crash twice in about 20 minutes:

            SEGV in XrdCl::Stream::HandleIncMsgJob::Run

            #0 0x00007f535400eca0 in ?? ()
            #1 0x00007f539b511ffc in XrdCl::Stream::HandleIncMsgJob::Run (this=0x7f5374000f50, arg=0x7f5374000cb0) at /home/fritzm/code/lsst/xrootd/src/./XrdCl/XrdClStream.hh:286
            #2 0x00007f539b56db22 in XrdCl::JobManager::RunJobs (this=0x7f538000e490) at /home/fritzm/code/lsst/xrootd/src/XrdCl/XrdClJobManager.cc:148
            #3 0x00007f539b56d69c in RunRunnerThread (arg=0x7f538000e490) at /home/fritzm/code/lsst/xrootd/src/XrdCl/XrdClJobManager.cc:33
            #4 0x00007f53abe89df3 in start_thread (arg=0x7f5384ff9700) at pthread_create.c:308
            #5 0x00007f53ab4ae1ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

            Andy Hanushevsky, if you think this is unrelated, I'll close this issue and open a new one?

            Show
            fritzm Fritz Mueller added a comment - - edited Okay, some progress! Updated John's branches with all the latest from Andy H, and things are seeming much better. Have not been able to repro the LoggingEvent dtor bug since. However, I have seen this crash twice in about 20 minutes: SEGV in XrdCl::Stream::HandleIncMsgJob::Run #0 0x00007f535400eca0 in ?? () #1 0x00007f539b511ffc in XrdCl::Stream::HandleIncMsgJob::Run (this=0x7f5374000f50, arg=0x7f5374000cb0) at /home/fritzm/code/lsst/xrootd/src/./XrdCl/XrdClStream.hh:286 #2 0x00007f539b56db22 in XrdCl::JobManager::RunJobs (this=0x7f538000e490) at /home/fritzm/code/lsst/xrootd/src/XrdCl/XrdClJobManager.cc:148 #3 0x00007f539b56d69c in RunRunnerThread (arg=0x7f538000e490) at /home/fritzm/code/lsst/xrootd/src/XrdCl/XrdClJobManager.cc:33 #4 0x00007f53abe89df3 in start_thread (arg=0x7f5384ff9700) at pthread_create.c:308 #5 0x00007f53ab4ae1ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Andy Hanushevsky , if you think this is unrelated, I'll close this issue and open a new one?
            Hide
            abh Andy Hanushevsky added a comment -

            Hi Fritz,

            This looks unrelated but let's leave the ticket open for now. I will drop
            by your office tomorrow and we can go through the core file (I assume
            you have one). Is there a log with debug statements that I can look at?

            Andy

            Show
            abh Andy Hanushevsky added a comment - Hi Fritz, This looks unrelated but let's leave the ticket open for now. I will drop by your office tomorrow and we can go through the core file (I assume you have one). Is there a log with debug statements that I can look at? Andy
            Hide
            fritzm Fritz Mueller added a comment -

            Hmm, saw it twice in a row, but not since. Don't have a core file yet. Will see if I can get one...

            Show
            fritzm Fritz Mueller added a comment - Hmm, saw it twice in a row, but not since. Don't have a core file yet. Will see if I can get one...
            Hide
            fritzm Fritz Mueller added a comment -

            Okay: Andy H. has merged xrootd/xrootd br. master -> xrootd/xrootd br. xrdssi, picking up among other things a fix that may be related to the HandleIncMsgJob::Run bug mentioned above.

            I have rebased lsst/xrootd br. master on top of the latest xrootd/xrootd br. xrdssi, pushed, and tagged as xrdssi-1.0.4.

            Tomorrow morning John will get the czar fixes we have been running with up for review. If we get those reviewed and merged then we can all be running on the same page from lsst/qserv br. master and lsst/xrootd br. master. If we can no longer see this bug (I am hopeful – it has looked quite good on my machine) then we'll claim victory.

            Standing by and awaiting John Gates reviews/merges...

            Show
            fritzm Fritz Mueller added a comment - Okay: Andy H. has merged xrootd/xrootd br. master -> xrootd/xrootd br. xrdssi, picking up among other things a fix that may be related to the HandleIncMsgJob::Run bug mentioned above. I have rebased lsst/xrootd br. master on top of the latest xrootd/xrootd br. xrdssi, pushed, and tagged as xrdssi-1.0.4. Tomorrow morning John will get the czar fixes we have been running with up for review. If we get those reviewed and merged then we can all be running on the same page from lsst/qserv br. master and lsst/xrootd br. master. If we can no longer see this bug (I am hopeful – it has looked quite good on my machine) then we'll claim victory. Standing by and awaiting John Gates reviews/merges...
            Hide
            fritzm Fritz Mueller added a comment -

            Chased this through to several xrootd fixes – all reviewed and committed to lsst/xrootd br. master now.

            Show
            fritzm Fritz Mueller added a comment - Chased this through to several xrootd fixes – all reviewed and committed to lsst/xrootd br. master now.

              People

              • Assignee:
                jgates John Gates
                Reporter:
                jgates John Gates
                Watchers:
                Andy Hanushevsky, Andy Salnikov, Fritz Mueller, Jacek Becla, John Gates
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel