| Summary: | Deadlock between syncExec and IJobManager.beginRule | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] Platform | Reporter: | John Arthorne <john.arthorne> | ||||||||
| Component: | Runtime | Assignee: | John Arthorne <john.arthorne> | ||||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||||
| Severity: | critical | ||||||||||
| Priority: | P3 | CC: | daniel_megert, hsoliwal, min123 | ||||||||
| Version: | 3.6 | ||||||||||
| Target Milestone: | 3.6 M4 | ||||||||||
| Hardware: | PC | ||||||||||
| OS: | Windows XP | ||||||||||
| Whiteboard: | |||||||||||
| Bug Depends on: | 283449 | ||||||||||
| Bug Blocks: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
John Arthorne
Is the UI thread using a null progress monitor in beginRule? (In reply to comment #1) > Is the UI thread using a null progress monitor in beginRule? I'm not sure. I'm trying to create a simple test case. Currently the only way to reproduce is to run the CVS tests which are very difficult to setup and very slow to run (about 15mins to produce the hang). The problem is that LockManager.aboutToWait needs to be called from within the wait loop but currently it is only called once at the beginning of ThreadJob.joinRun. You're right that LockManager.aboutToWait needs to be invoked within the loop. I'll see if I can come up with a way to fix. (In reply to comment #3) > You're right that LockManager.aboutToWait needs to be invoked within the loop. > I'll see if I can come up with a way to fix. I think what is happening is that that progress monitor passed into beginRule was *not* null, which means that the EventLoop was *not automatically* run. If the EventLoop was being run automatically (i.e. null monitor), the isCanceled would have ran async jobs. The UISynchronizer's syncExec performs an asyncExec that runs the "pending work" of lock listener. (Regardless, it gives the locklistener the sync exec as pending work). If the asyncExec doesn't run, the job manager (pre 11/20) gave a "2nd change" to the lock listener to run it's pending work via it's aboutToWait method. So, I'm assuming that the progress monitor was not null. Created attachment 153023 [details]
Test case
Test case to reproduce. You only get deadlock if the UI has a non-null rule. Otherwise the event loop monitor runs and the asyncExec is executed.
Created attachment 153024 [details]
Potential fix v01
(In reply to comment #6) > Created an attachment (id=153024) [details] > Potential fix v01 Small tweak. Both this new return point (plus a previous point that threw OperationCanceledException) really need to shutdown() the WaitForRunThread before returning. It's not harmful that we forgot to do this, since the background thread should eventually terminate, but it would be odd/deceptive/unnecessary for this thread to be running after joinRun returns. The reason we put it in an if-clause is that if the shutdown was not successful, it means that it was already finished. That means we favor the job result over cancelation (or any return protected by shutdown() for that matter) in the case of a tie. Also removed some an old/stale comment. The code in joinRun and waitForRun is very similar - yet irritatingly different. I tried to consolidate the code into a single method, but wasn't able to come up with a good solution. Created attachment 153026 [details]
v2
Guess: this might also be the cause for failing JDT Text tests in I20091124-0800. Note that they only failed on Windows XP Sun 1.6.0. (In reply to comment #7) > The reason we put it in an if-clause is that if the shutdown was not > successful, it means that it was already finished. That means we favor the job > result over cancelation (or any return protected by shutdown() for that matter) > in the case of a tie. I don't see why we would favour the job result over cancelation. If the operation has been canceled there isn't really any reason to acquire the rule. (In reply to comment #10) > > I don't see why we would favour the job result over cancelation. If the > operation has been canceled there isn't really any reason to acquire the rule. This only applies in the case of the tie. For example, if cancelation occurred simultaneously with the WaitForRunThread finishing, it will return the result of the WaitForRunThread instead of throwing OperationCanceledOperation. The following code will throw exception even if the thread had completed if (canceled) { shutdown(); throw OperationCanceledException(); } The following code will only throw an exception if the thread was still running. If it had completed, it will return the result of the thread. if (canceled) if (shutdown()) throw OperationCanceledException() So shutdown() returns true if it was running and did not complete, false if it was already done. I have fixed this deadlock in HEAD by reverting the yield implementation that caused it. I have released my regression test to the org.eclipse.ui.tests.concurrency test suite so that we can catch this problem if it happens again. |