Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 248717 - Possible deadlocks during job manager is supended
Summary: Possible deadlocks during job manager is supended
Status: VERIFIED FIXED
Alias: None
Product: EMF Services
Classification: Modeling
Component: Transaction (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P3 critical
Target Milestone: ---   Edit
Assignee: Christian Damus CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-09-26 09:21 EDT by Bernd Vogt CLA
Modified: 2017-02-24 15:10 EST (History)
1 user (show)

See Also:


Attachments
JUnit test case that demonstrates the deadlock (3.43 KB, text/plain)
2008-09-26 09:26 EDT, Bernd Vogt CLA
give.a.damus: iplog+
Details
Integration of the JUnit test into the suite (5.08 KB, patch)
2008-10-03 20:07 EDT, Christian Damus CLA
ahunter.eclipse: iplog+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Bernd Vogt CLA 2008-09-26 09:21:43 EDT
I was faced with a deadlock during the startup process of the eclipse workbench.  The debuger has shown that a transaction blocks the main thread, waiting until another transaction in a several thread has finished. But after the other transaction has done the main thread won't wake up again. It stucks in the method Lock#uiSafeAcquire(true) more precisely in Lock#uninterruptibleWait(sync).

I figured out that the Lock#uninterruptibleWait(sync) method causes the current thread to wait until the previously scheduled AcquireJob notifies the sync object. But the job won't come up... The origin of the problem can be located a few stack frames earlier in the main thread. There the IDEWorkbenchAdvisor#preStartup() method was called. In this method the job manager was caused to supend executing jobs (see IJobManager#suspend())... The job manager should be resumed in IDEWorkbenchAdvisor#postStartup() but we will never reach this method call beacause the main thread does actually wait for the execution of the AcquireJob.
Comment 1 Bernd Vogt CLA 2008-09-26 09:26:58 EDT
Created attachment 113587 [details]
JUnit test case that demonstrates the deadlock
Comment 2 Christian Damus CLA 2008-09-26 09:29:12 EDT
Splendid!  Thanks for the test case.
Comment 3 Christian Damus CLA 2008-10-03 20:07:21 EDT
Created attachment 114244 [details]
Integration of the JUnit test into the suite

Attached a patch that integrates the reporter's JUnit test into the transction tests suite.  Some additional changes are made, including:
  - using notifyAll()
  - a finally block to ensure resumption of the JobManager for other tests
  - timeout adjustments

This test case passes in my environment (i.e., there is no deadlock):
  - Eclipse 3.5 M2
  - EMF 2.5 M2
  - EMF Transaction 1.3 M2

I don't see that I am doing anything substantively different than the original test case.  Bernd, can you see something wrong?  What have I missed?  Can you reproduce the problem with this test case?
Comment 4 Bernd Vogt CLA 2008-10-06 03:11:30 EDT
Sorry, my fault. In the version i attached the essential line is excluded via comment.. Pleas remove the comment from the last assertion:

// Thread-1 should have finished now
// assertFalse(t1.isAlive());

After suspending the VM with the debugger on this line you can see that 'Thread-1" still is alive and stucks in the method 'Lock.uninterruptibleWait(Object o)'.

Environment :
  - Eclipse 3.2.2
  - EMF 2.2.2
  - EMF Transaction 1.0.2
Comment 5 Christian Damus CLA 2008-10-06 09:13:40 EDT
(In reply to comment #4)
> Sorry, my fault. In the version i attached the essential line is excluded via
> comment.. Pleas remove the comment from the last assertion:
> 
> // Thread-1 should have finished now
> // assertFalse(t1.isAlive());

Ah, I should have seen that.  Will try again.

> After suspending the VM with the debugger on this line you can see that
> 'Thread-1" still is alive and stucks in the method
> 'Lock.uninterruptibleWait(Object o)'.
> 
> Environment :
>   - Eclipse 3.2.2
>   - EMF 2.2.2
>   - EMF Transaction 1.0.2

Oh, my.  That's pretty old stuff.  I will be able to fix this in the 1.3 (Eclipse 3.5) release and, possibly, 1.2.3 (Eclipse 3.4.2).  Do you anticipate being able to update?
Comment 6 Bernd Vogt CLA 2008-10-06 12:58:43 EDT
(In reply to comment #5)
> Oh, my.  That's pretty old stuff.  I will be able to fix this in the 1.3
> (Eclipse 3.5) release and, possibly, 1.2.3 (Eclipse 3.4.2).  Do you anticipate
> being able to update?

Unfortunately, we won't be able to update. One of our requirements (and on of our biggest challenges) is to stay compatible with all eclipse versions down to eclipse 3.2...

But we are aware of the version problem. Maybe a patch can help?! But even so... who knows which version will be used by our customers... I fear that we have to live with the risk that some evil guy can suspend the job manager...

By the way, for our concrete case we have found a workaround to prevent the deadlock during the workbench startup.
Comment 7 Christian Damus CLA 2008-10-08 20:56:23 EDT
Committed a fix to HEAD (the 1.3 branch).

Unfortunately, I cannot commit the fix to the 1.2.3 release (1.2 maintenance branch).  The solution requires querying the IJobManager for whether it is suspended, and that API that was only introduced the Eclipse 3.4 release.  As this is the first time that the transaction plug-in uses an Eclipse run-time API more recent than the 3.2 release, I had to update the lower bound of the transaction plug-in's dependency on org.eclipse.core.runtime from 3.2 to 3.4.  This constitutes an API change in the transaction plug-in,  because the it re-exports the org.eclipse.core.runtime bundle.  API changes are not permitted on the maintenance branch except under extreme circumstances.  Imagine the client that wants to continue using Eclipse 3.3 (which they can, because Transaction only requires EMF 2.3), finding that it cannot update to Transaction 1.2.3 because it would then require Eclipse 3.4.

Since a work-around seems to be available for the workbench start-up scenario, which I would hope is the only legitimate usage of this IJobManager API, I think an exception is not justified.
Comment 8 Bernd Vogt CLA 2008-10-09 04:19:04 EDT
Oh dear, I assumed sth. like this. It's a pity that you was forced to update the lower bound of your transaction plug-in's dependency from 3.2 to 3.4 because of a "isSuspended()" method...

I agree with you that it's not justified to permit this changes to your maintenance releases. I think we can live with our workaround and the (hopefully minimal) risk that somebody suspends the job manager.

I have still one question. As I reviewed your fix I came across the following lines:

   // If the Job Manager is suspended, then under normal
   // Eclipse circumstances, this is not the UI thread, anyway
   acquire(exclusive);

Your comment says, that it will be usually not the UI thread.. but in our usecase it was exactly the UI thread that gets stuck in the uiSafeAcquire method. Our deadlock occured because the pre and post startup methods of the WorkbenchAdvisor (which calls IJobManager#suspend() and IJobManager#resume()) will be invoked in the UI thread like our sticky transaction. The job manager never gets resumed because the sticky transaction was invoked between the pre and post startup calls.

What's the danger on calling acquire(exclusive) in the UI thread? Potentially interrupted exceptions?
Comment 9 Christian Damus CLA 2008-10-09 09:33:58 EDT
(In reply to comment #8)
> Oh dear, I assumed sth. like this. It's a pity that you was forced to update
> the lower bound of your transaction plug-in's dependency from 3.2 to 3.4
> because of a "isSuspended()" method...

Yes, I have tried to be diligent in following the intent of Eclipse's plug-in versioning guidelines.  Like anything else, I suppose they are both a blessing and a curse.  If I had followed the strategy of certain other projects, which always increase their dependency lower bounds in locked step, then I would have been able to make the same change in 1.2.3.

> I agree with you that it's not justified to permit this changes to your
> maintenance releases. I think we can live with our workaround and the
> (hopefully minimal) risk that somebody suspends the job manager.

Yes, in general, I would think that suspending the Job Manager is a symptom of a programming error.


> I have still one question. As I reviewed your fix I came across the following
> lines:
> 
>    // If the Job Manager is suspended, then under normal
>    // Eclipse circumstances, this is not the UI thread, anyway
>    acquire(exclusive);
> 
> Your comment says, that it will be usually not the UI thread.. but in our
> usecase it was exactly the UI thread that gets stuck in the uiSafeAcquire
> method. Our deadlock occured because the pre and post startup methods of the
> WorkbenchAdvisor (which calls IJobManager#suspend() and IJobManager#resume())
> will be invoked in the UI thread like our sticky transaction. The job manager
> never gets resumed because the sticky transaction was invoked between the pre
> and post startup calls.

Hmmm ... perhaps that comment is erroneous, then.  The probable reason for this suspension API's existence would seem to be UI code.


> What's the danger on calling acquire(exclusive) in the UI thread? Potentially
> interrupted exceptions?

Yes.  One of the curiosities in the Eclipse UI framework is that the event-queue processing code frequently interrupts the UI thread.  The last time I investigated it, it seemed rather arbitrary.  As it happens, in cases like this, it may actually end up being a good thing.  :-)
Comment 10 Christian Damus CLA 2008-11-03 11:40:01 EST
Fix available in HEAD: 1.3.0.I200811021821.
Comment 11 Christian Damus CLA 2008-11-24 13:16:25 EST
Restore original target after milestones were deranged.