Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 347183

Summary: Deadlock reading/writing to log
Product: [Eclipse Project] Equinox Reporter: Julien HENRY <julien.henry>
Component: FrameworkAssignee: Thomas Watson <tjwatson>
Status: RESOLVED FIXED QA Contact:
Severity: critical    
Priority: P3 CC: aniefer, curtis.windatt.public, dominik.stadler, hshsce, igor, konrad_w, michael.wenig, pwebster, remy.suen, simon_kaegi, tjwatson
Version: 3.7Flags: simon_kaegi: review+
Target Milestone: 3.7.1   
Hardware: PC   
OS: Linux   
Whiteboard:
Attachments:
Description Flags
Eclipse workspace log with stacktrace
none
Thread dump
none
conceptual patch
none
updated patch + test none

Description Julien HENRY CLA 2011-05-25 11:52:57 EDT
Build Identifier: 20110519-1049

I have installed a feature in my Eclipse using a local update site. Then I have removed this update site from the list (but the feature is still installed).

Now I am trying to create my own update site that will include this feature. I have created a new Update Site Project, added the feature, and finally I tried to build it (running "build all"). The process hang during "Publishing metadata". I have to kill Eclipse (no way to cancel the process).
Looking at the logs I can see:

!ENTRY org.eclipse.equinox.p2.transport.ecf 2 0 2011-05-25 17:38:21.924
!MESSAGE Connection to memory:1306337901862/p2.index failed on unknown protocol: memory. Retry attempt 0 started

See attached stack trace.

Reproducible: Always
Comment 1 Julien HENRY CLA 2011-05-25 11:54:08 EDT
Created attachment 196567 [details]
Eclipse workspace log with stacktrace
Comment 2 Curtis Windatt CLA 2011-05-25 12:10:19 EDT
This is likely a p2 issue.  See bug 345497.  Not sure why this would cause Eclipse to hang though.
Comment 3 Andrew Niefer CLA 2011-05-25 15:30:38 EDT
This error shouldn't case eclipse to hang, a stack trace of what the other threads are doing would be useful.

I think you can get this by hittin ctrl-break in the console if you start using eclipsec.exe.  Or there is information here:http://wiki.eclipse.org/How_to_report_a_deadlock
Comment 4 Julien HENRY CLA 2011-05-26 03:25:22 EDT
Just to be perfectly clear: this is the "build all" process that hangs. Eclipse UI still respond to events. But of course it is not usable because every other action will be queued forever waiting for "build all" action to complete.
Comment 5 Julien HENRY CLA 2011-05-26 03:28:20 EDT
Created attachment 196632 [details]
Thread dump
Comment 6 Andrew Niefer CLA 2011-05-26 11:00:42 EDT
Moving to the framework for now.  In the stack trace, Worker-2 is the interesting thread:

at org.eclipse.equinox.log.internal.ExtendedLogReaderServiceFactory.logPrivileged(ExtendedLogReaderServiceFactory.java:165)
we acquire a read lock here at line 152

later we come back in 
at org.eclipse.equinox.log.internal.ExtendedLogReaderServiceFactory.addLogListener(ExtendedLogReaderServiceFactory.java:175)
and attempt to get a write lock.  We end up waiting on ourselves.
Comment 7 Thomas Watson CLA 2011-05-26 11:32:15 EDT
If we didn't block I suspect we would endup in a stack overflow.  It appears someone (org.sonar.ide.eclipse.internal.core.LogListener.log) is using logback to log the warning and someone else (org.eclipse.m2e.logback.appender.EclipseLogAppender) is trying to log the message back to org.eclipse.core.runtime.Platform.getLog which will then cause the message to go back through the same process.
Comment 8 Thomas Watson CLA 2011-05-26 15:08:24 EDT
Ultimately I think this is a bug in the interaction between org.sonar.ide.eclipse.internal.core.LogListener and org.eclipse.m2e.logback.appender.EclipseLogAppender

org.sonar.ide.eclipse.internal.core.LogListener is an ILogListener listening to all IStatus messages being logging them to logback.

org.eclipse.m2e.logback.appender.EclipseLogAppender is then configured with logback so it receives all messages from logback, it then proceeds to try and append the log message back to the Eclipse log.  This would ultimately end up with an endless recursive loop that I suspect would result in a stack overflow if we did not end up at this deadlock bug.

It looks like we should consider detecting this situation and throw some kind runtime exception from the read/write lock instead of blocking forever.
Comment 9 Thomas Watson CLA 2011-05-26 15:12:12 EDT
Created attachment 196698 [details]
conceptual patch

Here is a work in progress.  The patch requires much review and testing before considering.  It is too late to be doing this in 3.7 RC3, anything in this area would have to wait until 3.7.1.

You should probably open a separate bug against the m2e team.  It is not a good idea for these two log redirectors to be pointing to each other through logback.
Comment 10 Igor Fedorenko CLA 2011-05-26 20:11:13 EDT
Just for the record http://dev.eclipse.org/mhonarc/lists/cross-project-issues-dev/msg05659.html
Comment 11 Igor Fedorenko CLA 2011-06-29 05:34:16 EDT
*** Bug 350669 has been marked as a duplicate of this bug. ***
Comment 12 Igor Fedorenko CLA 2011-07-14 08:33:49 EDT
*** Bug 352065 has been marked as a duplicate of this bug. ***
Comment 13 Thomas Watson CLA 2011-07-20 09:42:10 EDT
*** Bug 352455 has been marked as a duplicate of this bug. ***
Comment 14 Thomas Watson CLA 2011-07-20 09:44:48 EDT
Simon, Could I spare some of your time to review this issue and proposed patch?
Comment 15 Simon Kaegi CLA 2011-07-20 10:31:20 EDT
Looks ok. It seems like we should be able to handle the various nesting cases except when we try to upgrade a read lock to a write lock. e.g. write>write nesting and write>read nesting should be ok it's just read>write we need to be careful about.
Comment 16 Thomas Watson CLA 2011-07-20 11:37:13 EDT
(In reply to comment #15)
> Looks ok. It seems like we should be able to handle the various nesting cases
> except when we try to upgrade a read lock to a write lock. e.g. write>write
> nesting and write>read nesting should be ok it's just read>write we need to be
> careful about.

For simplicity I did not want to go there.  The code currently only supports read>read nesting.  Currently the code does not make any call outs while holding the write lock.  So it should not be possible for some external code (e.g. listeners) to be called while holding the write lock which could result in re-entering the logging system and acquiring another read/write lock.  Given the usage of the code I see no need to support write>read nesting or write>write nesting for this internal class.
Comment 17 Thomas Watson CLA 2011-07-21 13:04:24 EDT
Created attachment 200108 [details]
updated patch + test

I found an issue with the original patch while testing.  Here is an updated patch with a testcase.

The original issue was that the writeLock method did not decrement the writersWaiting lock when an exception was thrown.  I moved the decrementing to a finally block.

As suspected, with this fixed the original scenario could result in a stack overflow because of endless recursion.  I have added a check to detect nested read>read locks and put an arbitrary limit of 50 on them.
Comment 20 Thomas Watson CLA 2011-07-21 16:47:46 EDT
(In reply to comment #19)
> http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?h=R3_7_maintenance&id=1c8a3449eb3a02a76b49ef8ebe4b5948d0bab7ad

This was released to 3.7 under this commit.