Community
Participate
Working Groups
Created attachment 166031 [details] Thread dump of deadlocked eclipse Build ID: I20100422-1310 I was editing a Java file (Project.java from o.e.core.internal.resources). I think that I tried scrolling the editor right when Eclipse deadlocked. Thread dump attached.
Thread dump seems to indicate that Threads main Worker-48 Worker-50 Text Viewer Hover Presenter all try to access the shared AST, and are all locked in its object monitor. At the same time, Worker-49 Worker-48 Worker-41 Worker-40 hangs in SelectionListenerWithASTManager, and a couple of JavaReconciler Threads are also waiting...
To be investigated for M7.
I've also had my Eclipse hang these past couple of days. There's also bug 302581 which I presume is related.
>To be investigated for M7. I take this as a +1 for RC1. I need to look again whether the proposed fix from bug 302581 really was OK but currently I'm on a business trip.
John, can you please review the patch?
Created attachment 167926 [details] Fix
I don't understand why this patch fixes the problem. It looks like the problem is that threads are waiting on reconciliation, but no reconcile is happening. For this to happen it means either there was no notification, or the notification happened before the isReconciling() condition changed. Since the fIsReconciling field is already volatile, adding synchronization around assigning that field shouldn't make any difference. I don't know this code very well, but I noticed there is one code path at the top of the cached() method where there is no notification and it just returns. I can't clearly see how this leads to the deadlock, but the code is complicated enough that maybe it is possible. Ideally if any state changes that affects the outcome of isReconciling() there should be a notification within the same synchronized block.
>I don't understand why this patch fixes the problem. I thought that a possible scenario for the deadlock could be that for some reason the progress monitor gets canceled (fIsReconciling -> true) while getAST(...) is already inside the fWaitLock. The patch should have closed that hole - however, I noticed that this would not fix that scenario because if the thread dies then it won't be able to notify it later. After looking at the code and trying to reproduce the deadlock I found the scenario and the fix for the deadlock. Test Case: 0 start dev/host workspace 1. add the following breakpoints - but disable all at this point ASTProvider, line 318 ASTProvider, line 440 ASTProvider, line 452 ASTProvider, line 599 2. start target workspace that has a Java project and two CUs (C1, C2) 3. disable 'Mark Occurrences' in the target 4. paste this into Package Explorer: import java.util.HashMap;; public class C1 {HashMap m} public class C2 {} 5. switch/activated editor for C1 6. enable bpt 440 7. switch/activated editor for C2 ==> hit bpt 440 8. disable bpt 440 9. switch to C1 10. enable bpt 599 and 452 11. type a space in C1 ==> hit bpt 599 12. select HashMap m; and press Ctrl+C ==> hit bpt 452 13. resume all threads except main and the one that waits in bpt 440 14. enable bpt 318 15. resume the thread that waits in bpt 440 16. resume the main thread 17. resume all other waiting threads ==> deadlock The problem is that we call isReconciling(...) without synchronizing it with fReconcileLock and unfortunately we cannot add this inside the fWaitLock without introducing yet another deadlock. There are two possible fixes: 1) We synchronized the synchronized (fReconcileLock) {...} in aboutToBeReconciled(...) with the fWaitLock. This is risky. 2) We switch the two statements in the synchronized block in aboutToBeReconciled(...). This works because the important condition is is changed first and hence it will not wait inside getAST(...). That the fReconciling flag is set to true afterwards doesn't harm. Since we already have RC1 and introducing another lock is always risky, I will fix this using 2). I went through the fix with Markus and he agrees on that. John, can you agree with that as well?
Created attachment 168146 [details] Fix
+1 for RC1. Reproduced the deadlock with given steps, except for > 3. disable 'Mark Occurrences' in the target where I had to *enable* Mark Occurrences. Verified that the fix solves this issue.
Patch committed to HEAD.
Starting verification...
Verified in I20100516-0800 that the scenario from comment 8 can't happen any more and that the dump from comment 0 likely ran into this problem.