| Summary: | Deadlock on startup after having cancelled a background job | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] Platform | Reporter: | Philipe Mulet <philippe_mulet> | ||||||||||
| Component: | Resources | Assignee: | John Arthorne <john.arthorne> | ||||||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||||||
| Severity: | critical | ||||||||||||
| Priority: | P1 | CC: | eclipse, john.arthorne | ||||||||||
| Version: | 3.0 | ||||||||||||
| Target Milestone: | 3.0 M7 | ||||||||||||
| Hardware: | PC | ||||||||||||
| OS: | Windows 2000 | ||||||||||||
| Whiteboard: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Philipe Mulet
Created attachment 7649 [details]
Progress dialog
Created attachment 7650 [details]
Thread dump
Even worse. I killed Eclipse, then moved to SDK 20040129. Started up. Deadlock occured, never got IDE live. Curiously the same process which caused the hang in previous session once cancelled, caused grief during startup, without having me to cancel it. Created attachment 7652 [details]
Progress dialog during startup with 20040129
Created attachment 7653 [details]
Thread dump with 20040129
This is the same as the deadlock I reported in bug 50684. The UI fixes have not yet been released into a build, but they have just tagged for the 11:30 am build that is about to start. I'm not 100% sure if it will solve your deadlock, since you say you got it without even touching the blocked progress dialog. Exactly. It is a timing thing. After to restart attempts, then it started ok again, and since then I haven't got the problem. *** Bug 51532 has been marked as a duplicate of this bug. *** This bug still exists in I20040210. Here is what happens: 1) Operation A tries to lock a file in UI thread, but is blocked by background op 2) Op A is in ThreadJob.joinRun, waiting for the implicit job for that rule to start. This method is synchronized on the ThreadJob instance 3) While waiting, ThreadJob.joinRun checks for cancelation (isCanceled) 4) isCanceled spins the event loop, and Operation B is waiting in the async queue. 5) Op B starts, tries to lock a file 6) Op B waits in a similar manner on a different ThreadJob instance 7) Meanwhile, the background job ends, and the ThreadJob for Op A enters the run method. This run method then tries to enter a synchronized block on the instance, which is owned by the UI thread (see step 2). -> Deadlock: The ThreadJob for Op A owns the scheduling rule on the file and is waiting for the ThreadJob object monitor. The UI thread holds the ThreadJob object monitor and is waiting for the scheduling rule. The proposed fix is to NOT own the ThreadJob object monitor while calling isCanceled, since this calls arbitrary third party code in the progress monitor (standard rule of thumb for deadlock prevention). I have released a fix, but am having trouble coming up with a repeatable test case since it is so timing dependent. Marking as fixed as the deadlock problem seems to be solved. There are still some reported cases of multiple blocked jobs dialogs appearing, but this is captured in bug 51996. |