Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 296056

Summary: Deadlock between syncExec and IJobManager.beginRule
Product: [Eclipse Project] Platform Reporter: John Arthorne <john.arthorne>
Component: RuntimeAssignee: John Arthorne <john.arthorne>
Status: RESOLVED FIXED QA Contact:
Severity: critical    
Priority: P3 CC: daniel_megert, hsoliwal, min123
Version: 3.6   
Target Milestone: 3.6 M4   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on: 283449    
Bug Blocks:    
Attachments:
Description Flags
Test case
none
Potential fix v01
none
v2 none

Description John Arthorne CLA 2009-11-24 16:02:10 EST
The following case now produces a deadlock (since N20091120-2000):

1) The UI is waiting to begin a scheduling rule
2) The thread that owns the scheduling rule attempts to perform a syncExec.
Comment 1 Min Idzelis CLA 2009-11-24 16:48:35 EST
Is the UI thread using a null progress monitor in beginRule?
Comment 2 John Arthorne CLA 2009-11-24 17:16:28 EST
(In reply to comment #1)
> Is the UI thread using a null progress monitor in beginRule?

I'm not sure. I'm trying to create a simple test case. Currently the only way to reproduce is to run the CVS tests which are very difficult to setup and very slow to run (about 15mins to produce the hang). The problem is that LockManager.aboutToWait needs to be called from within the wait loop but currently it is only called once at the beginning of ThreadJob.joinRun.
Comment 3 Min Idzelis CLA 2009-11-24 17:26:31 EST
You're right that LockManager.aboutToWait needs to be invoked within the loop. I'll see if I can come up with a way to fix.
Comment 4 Min Idzelis CLA 2009-11-24 17:42:14 EST
(In reply to comment #3)
> You're right that LockManager.aboutToWait needs to be invoked within the loop.
> I'll see if I can come up with a way to fix.

I think what is happening is that that progress monitor passed into beginRule was *not* null, which means that the EventLoop was *not automatically* run. If the EventLoop was being run automatically (i.e. null monitor), the isCanceled would have ran async jobs. 

The UISynchronizer's syncExec performs an asyncExec that runs the "pending work" of lock listener. (Regardless, it gives the locklistener the sync exec as pending work). 

If the asyncExec doesn't run, the job manager (pre 11/20) gave a "2nd change" to the lock listener to run it's pending work via it's aboutToWait method. So, I'm assuming that the progress monitor was not null.
Comment 5 John Arthorne CLA 2009-11-24 18:19:14 EST
Created attachment 153023 [details]
Test case

Test case to reproduce. You only get deadlock if the UI has a non-null rule. Otherwise the event loop monitor runs and the asyncExec is executed.
Comment 6 John Arthorne CLA 2009-11-24 18:25:29 EST
Created attachment 153024 [details]
Potential fix v01
Comment 7 Min Idzelis CLA 2009-11-24 21:15:01 EST
(In reply to comment #6)
> Created an attachment (id=153024) [details]
> Potential fix v01

Small tweak. Both this new return point (plus a previous point that threw OperationCanceledException) really need to shutdown() the WaitForRunThread before returning. 

It's not harmful that we forgot to do this, since the background thread should eventually terminate, but it would be odd/deceptive/unnecessary for this thread to be running after joinRun returns. 

The reason we put it in an if-clause is that if the shutdown was not successful, it means that it was already finished. That means we favor the job result over cancelation (or any return protected by shutdown() for that matter) in the case of a tie. 

Also removed some an old/stale comment. 

The code in joinRun and waitForRun is very similar - yet irritatingly different. I tried to consolidate the code into a single method, but wasn't able to come up with a good solution.
Comment 8 Min Idzelis CLA 2009-11-24 21:15:31 EST
Created attachment 153026 [details]
v2
Comment 9 Dani Megert CLA 2009-11-25 05:33:42 EST
Guess: this might also be the cause for failing JDT Text tests in I20091124-0800. Note that they only failed on Windows XP Sun 1.6.0.
Comment 10 John Arthorne CLA 2009-11-25 09:55:15 EST
(In reply to comment #7)
> The reason we put it in an if-clause is that if the shutdown was not
> successful, it means that it was already finished. That means we favor the job
> result over cancelation (or any return protected by shutdown() for that matter)
> in the case of a tie. 

I don't see why we would favour the job result over cancelation. If the operation has been canceled there isn't really any reason to acquire the rule.
Comment 11 Min Idzelis CLA 2009-11-25 10:28:22 EST
(In reply to comment #10)
> 
> I don't see why we would favour the job result over cancelation. If the
> operation has been canceled there isn't really any reason to acquire the rule.

This only applies in the case of the tie. For example, if cancelation occurred simultaneously with the WaitForRunThread finishing, it will return the result of the WaitForRunThread instead of throwing OperationCanceledOperation. 

The following code will throw exception even if the thread had completed

if (canceled) {
shutdown();
throw OperationCanceledException();
}

The following code will only throw an exception if the thread was still running. If it had completed, it will return the result of the thread. 

if (canceled) 
  if (shutdown())
    throw OperationCanceledException()

So shutdown() returns true if it was running and did not complete, false if it was already done.
Comment 12 John Arthorne CLA 2009-11-25 15:38:32 EST
I have fixed this deadlock in HEAD by reverting the yield implementation that caused it. I have released my regression test to the org.eclipse.ui.tests.concurrency test suite so that we can catch this problem if it happens again.