Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 50949

Summary: Deadlock on startup after having cancelled a background job
Product: [Eclipse Project] Platform Reporter: Philipe Mulet <philippe_mulet>
Component: ResourcesAssignee: John Arthorne <john.arthorne>
Status: RESOLVED FIXED QA Contact:
Severity: critical    
Priority: P1 CC: eclipse, john.arthorne
Version: 3.0   
Target Milestone: 3.0 M7   
Hardware: PC   
OS: Windows 2000   
Whiteboard:
Attachments:
Description Flags
Progress dialog
none
Thread dump
none
Progress dialog during startup with 20040129
none
Thread dump with 20040129 none

Description Philipe Mulet CLA 2004-01-30 05:56:55 EST
Build 20040128

Intending to sync my workspace after startup, I got the progress dialog 
complaining that some background job was still running.

I decided to cancel the background job (as unable to cancel the CVS attempt), 
and got deadlocked.
Comment 1 Philipe Mulet CLA 2004-01-30 05:57:43 EST
Created attachment 7649 [details]
Progress dialog
Comment 2 Philipe Mulet CLA 2004-01-30 05:59:11 EST
Created attachment 7650 [details]
Thread dump
Comment 3 Philipe Mulet CLA 2004-01-30 06:10:59 EST
Even worse. I killed Eclipse, then moved to SDK 20040129.
Started up. Deadlock occured, never got IDE live.

Curiously the same process which caused the hang in previous session once 
cancelled, caused grief during startup, without having me to cancel it.
Comment 4 Philipe Mulet CLA 2004-01-30 06:11:57 EST
Created attachment 7652 [details]
Progress dialog during startup with 20040129
Comment 5 Philipe Mulet CLA 2004-01-30 06:13:27 EST
Created attachment 7653 [details]
Thread dump with 20040129
Comment 6 John Arthorne CLA 2004-01-30 11:36:58 EST
This is the same as the deadlock I reported in bug 50684. The UI fixes have not
yet been released into a build, but they have just tagged for the 11:30 am build
that is about to start. I'm not 100% sure if it will solve your deadlock, since
you say you got it without even touching the blocked progress dialog.
Comment 7 Philipe Mulet CLA 2004-01-30 11:46:39 EST
Exactly. It is a timing thing. After to restart attempts, then it started ok 
again, and since then I haven't got the problem.
Comment 8 John Arthorne CLA 2004-02-11 14:09:27 EST
*** Bug 51532 has been marked as a duplicate of this bug. ***
Comment 9 John Arthorne CLA 2004-02-11 16:15:07 EST
This bug still exists in I20040210.  Here is what happens:

1) Operation A tries to lock a file in UI thread, but is blocked by background op
2) Op A is in ThreadJob.joinRun, waiting for the implicit job for that rule to
start.  This method is synchronized on the ThreadJob instance
3) While waiting, ThreadJob.joinRun checks for cancelation (isCanceled)
4) isCanceled spins the event loop, and Operation B is waiting in the async queue.
5) Op B starts, tries to lock a file
6) Op B waits in a similar manner on a different ThreadJob instance
7) Meanwhile, the background job ends, and the ThreadJob for Op A enters the run
method. This run method then tries to enter a synchronized block on the
instance, which is owned by the UI thread (see step 2).

-> Deadlock:  The ThreadJob for Op A owns the scheduling rule on the file and is
waiting for the ThreadJob object monitor.  The UI thread holds the ThreadJob
object monitor and is waiting for the scheduling rule.

The proposed fix is to NOT own the ThreadJob object monitor while calling
isCanceled, since this calls arbitrary third party code in the progress monitor
(standard rule of thumb for deadlock prevention).

I have released a fix, but am having trouble coming up with a repeatable test
case since it is so timing dependent.
Comment 10 John Arthorne CLA 2004-02-19 10:24:41 EST
Marking as fixed as the deadlock problem seems to be solved.  There are still
some reported cases of multiple blocked jobs dialogs appearing, but this is
captured in bug 51996.