| Summary: | Deadlock reading/writing to log | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] Equinox | Reporter: | Julien HENRY <julien.henry> | ||||||||||
| Component: | Framework | Assignee: | Thomas Watson <tjwatson> | ||||||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||||||
| Severity: | critical | ||||||||||||
| Priority: | P3 | CC: | aniefer, curtis.windatt.public, dominik.stadler, hshsce, igor, konrad_w, michael.wenig, pwebster, remy.suen, simon_kaegi, tjwatson | ||||||||||
| Version: | 3.7 | Flags: | simon_kaegi:
review+
|
||||||||||
| Target Milestone: | 3.7.1 | ||||||||||||
| Hardware: | PC | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Julien HENRY
Created attachment 196567 [details]
Eclipse workspace log with stacktrace
This is likely a p2 issue. See bug 345497. Not sure why this would cause Eclipse to hang though. This error shouldn't case eclipse to hang, a stack trace of what the other threads are doing would be useful. I think you can get this by hittin ctrl-break in the console if you start using eclipsec.exe. Or there is information here:http://wiki.eclipse.org/How_to_report_a_deadlock Just to be perfectly clear: this is the "build all" process that hangs. Eclipse UI still respond to events. But of course it is not usable because every other action will be queued forever waiting for "build all" action to complete. Created attachment 196632 [details]
Thread dump
Moving to the framework for now. In the stack trace, Worker-2 is the interesting thread: at org.eclipse.equinox.log.internal.ExtendedLogReaderServiceFactory.logPrivileged(ExtendedLogReaderServiceFactory.java:165) we acquire a read lock here at line 152 later we come back in at org.eclipse.equinox.log.internal.ExtendedLogReaderServiceFactory.addLogListener(ExtendedLogReaderServiceFactory.java:175) and attempt to get a write lock. We end up waiting on ourselves. If we didn't block I suspect we would endup in a stack overflow. It appears someone (org.sonar.ide.eclipse.internal.core.LogListener.log) is using logback to log the warning and someone else (org.eclipse.m2e.logback.appender.EclipseLogAppender) is trying to log the message back to org.eclipse.core.runtime.Platform.getLog which will then cause the message to go back through the same process. Ultimately I think this is a bug in the interaction between org.sonar.ide.eclipse.internal.core.LogListener and org.eclipse.m2e.logback.appender.EclipseLogAppender org.sonar.ide.eclipse.internal.core.LogListener is an ILogListener listening to all IStatus messages being logging them to logback. org.eclipse.m2e.logback.appender.EclipseLogAppender is then configured with logback so it receives all messages from logback, it then proceeds to try and append the log message back to the Eclipse log. This would ultimately end up with an endless recursive loop that I suspect would result in a stack overflow if we did not end up at this deadlock bug. It looks like we should consider detecting this situation and throw some kind runtime exception from the read/write lock instead of blocking forever. Created attachment 196698 [details]
conceptual patch
Here is a work in progress. The patch requires much review and testing before considering. It is too late to be doing this in 3.7 RC3, anything in this area would have to wait until 3.7.1.
You should probably open a separate bug against the m2e team. It is not a good idea for these two log redirectors to be pointing to each other through logback.
Just for the record http://dev.eclipse.org/mhonarc/lists/cross-project-issues-dev/msg05659.html *** Bug 350669 has been marked as a duplicate of this bug. *** *** Bug 352065 has been marked as a duplicate of this bug. *** *** Bug 352455 has been marked as a duplicate of this bug. *** Simon, Could I spare some of your time to review this issue and proposed patch? Looks ok. It seems like we should be able to handle the various nesting cases except when we try to upgrade a read lock to a write lock. e.g. write>write nesting and write>read nesting should be ok it's just read>write we need to be careful about. (In reply to comment #15) > Looks ok. It seems like we should be able to handle the various nesting cases > except when we try to upgrade a read lock to a write lock. e.g. write>write > nesting and write>read nesting should be ok it's just read>write we need to be > careful about. For simplicity I did not want to go there. The code currently only supports read>read nesting. Currently the code does not make any call outs while holding the write lock. So it should not be possible for some external code (e.g. listeners) to be called while holding the write lock which could result in re-entering the logging system and acquiring another read/write lock. Given the usage of the code I see no need to support write>read nesting or write>write nesting for this internal class. Created attachment 200108 [details]
updated patch + test
I found an issue with the original patch while testing. Here is an updated patch with a testcase.
The original issue was that the writeLock method did not decrement the writersWaiting lock when an exception was thrown. I moved the decrementing to a finally block.
As suspected, with this fixed the original scenario could result in a stack overflow because of endless recursion. I have added a check to detect nested read>read locks and put an arbitrary limit of 50 on them.
Fix released to Juno with: http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=c525aa2a29d0d7db2d8b13d72fdd595977670311 (In reply to comment #19) > http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?h=R3_7_maintenance&id=1c8a3449eb3a02a76b49ef8ebe4b5948d0bab7ad This was released to 3.7 under this commit. |