Community
Participate
Working Groups
I was faced with a deadlock during the startup process of the eclipse workbench. The debuger has shown that a transaction blocks the main thread, waiting until another transaction in a several thread has finished. But after the other transaction has done the main thread won't wake up again. It stucks in the method Lock#uiSafeAcquire(true) more precisely in Lock#uninterruptibleWait(sync). I figured out that the Lock#uninterruptibleWait(sync) method causes the current thread to wait until the previously scheduled AcquireJob notifies the sync object. But the job won't come up... The origin of the problem can be located a few stack frames earlier in the main thread. There the IDEWorkbenchAdvisor#preStartup() method was called. In this method the job manager was caused to supend executing jobs (see IJobManager#suspend())... The job manager should be resumed in IDEWorkbenchAdvisor#postStartup() but we will never reach this method call beacause the main thread does actually wait for the execution of the AcquireJob.
Created attachment 113587 [details] JUnit test case that demonstrates the deadlock
Splendid! Thanks for the test case.
Created attachment 114244 [details] Integration of the JUnit test into the suite Attached a patch that integrates the reporter's JUnit test into the transction tests suite. Some additional changes are made, including: - using notifyAll() - a finally block to ensure resumption of the JobManager for other tests - timeout adjustments This test case passes in my environment (i.e., there is no deadlock): - Eclipse 3.5 M2 - EMF 2.5 M2 - EMF Transaction 1.3 M2 I don't see that I am doing anything substantively different than the original test case. Bernd, can you see something wrong? What have I missed? Can you reproduce the problem with this test case?
Sorry, my fault. In the version i attached the essential line is excluded via comment.. Pleas remove the comment from the last assertion: // Thread-1 should have finished now // assertFalse(t1.isAlive()); After suspending the VM with the debugger on this line you can see that 'Thread-1" still is alive and stucks in the method 'Lock.uninterruptibleWait(Object o)'. Environment : - Eclipse 3.2.2 - EMF 2.2.2 - EMF Transaction 1.0.2
(In reply to comment #4) > Sorry, my fault. In the version i attached the essential line is excluded via > comment.. Pleas remove the comment from the last assertion: > > // Thread-1 should have finished now > // assertFalse(t1.isAlive()); Ah, I should have seen that. Will try again. > After suspending the VM with the debugger on this line you can see that > 'Thread-1" still is alive and stucks in the method > 'Lock.uninterruptibleWait(Object o)'. > > Environment : > - Eclipse 3.2.2 > - EMF 2.2.2 > - EMF Transaction 1.0.2 Oh, my. That's pretty old stuff. I will be able to fix this in the 1.3 (Eclipse 3.5) release and, possibly, 1.2.3 (Eclipse 3.4.2). Do you anticipate being able to update?
(In reply to comment #5) > Oh, my. That's pretty old stuff. I will be able to fix this in the 1.3 > (Eclipse 3.5) release and, possibly, 1.2.3 (Eclipse 3.4.2). Do you anticipate > being able to update? Unfortunately, we won't be able to update. One of our requirements (and on of our biggest challenges) is to stay compatible with all eclipse versions down to eclipse 3.2... But we are aware of the version problem. Maybe a patch can help?! But even so... who knows which version will be used by our customers... I fear that we have to live with the risk that some evil guy can suspend the job manager... By the way, for our concrete case we have found a workaround to prevent the deadlock during the workbench startup.
Committed a fix to HEAD (the 1.3 branch). Unfortunately, I cannot commit the fix to the 1.2.3 release (1.2 maintenance branch). The solution requires querying the IJobManager for whether it is suspended, and that API that was only introduced the Eclipse 3.4 release. As this is the first time that the transaction plug-in uses an Eclipse run-time API more recent than the 3.2 release, I had to update the lower bound of the transaction plug-in's dependency on org.eclipse.core.runtime from 3.2 to 3.4. This constitutes an API change in the transaction plug-in, because the it re-exports the org.eclipse.core.runtime bundle. API changes are not permitted on the maintenance branch except under extreme circumstances. Imagine the client that wants to continue using Eclipse 3.3 (which they can, because Transaction only requires EMF 2.3), finding that it cannot update to Transaction 1.2.3 because it would then require Eclipse 3.4. Since a work-around seems to be available for the workbench start-up scenario, which I would hope is the only legitimate usage of this IJobManager API, I think an exception is not justified.
Oh dear, I assumed sth. like this. It's a pity that you was forced to update the lower bound of your transaction plug-in's dependency from 3.2 to 3.4 because of a "isSuspended()" method... I agree with you that it's not justified to permit this changes to your maintenance releases. I think we can live with our workaround and the (hopefully minimal) risk that somebody suspends the job manager. I have still one question. As I reviewed your fix I came across the following lines: // If the Job Manager is suspended, then under normal // Eclipse circumstances, this is not the UI thread, anyway acquire(exclusive); Your comment says, that it will be usually not the UI thread.. but in our usecase it was exactly the UI thread that gets stuck in the uiSafeAcquire method. Our deadlock occured because the pre and post startup methods of the WorkbenchAdvisor (which calls IJobManager#suspend() and IJobManager#resume()) will be invoked in the UI thread like our sticky transaction. The job manager never gets resumed because the sticky transaction was invoked between the pre and post startup calls. What's the danger on calling acquire(exclusive) in the UI thread? Potentially interrupted exceptions?
(In reply to comment #8) > Oh dear, I assumed sth. like this. It's a pity that you was forced to update > the lower bound of your transaction plug-in's dependency from 3.2 to 3.4 > because of a "isSuspended()" method... Yes, I have tried to be diligent in following the intent of Eclipse's plug-in versioning guidelines. Like anything else, I suppose they are both a blessing and a curse. If I had followed the strategy of certain other projects, which always increase their dependency lower bounds in locked step, then I would have been able to make the same change in 1.2.3. > I agree with you that it's not justified to permit this changes to your > maintenance releases. I think we can live with our workaround and the > (hopefully minimal) risk that somebody suspends the job manager. Yes, in general, I would think that suspending the Job Manager is a symptom of a programming error. > I have still one question. As I reviewed your fix I came across the following > lines: > > // If the Job Manager is suspended, then under normal > // Eclipse circumstances, this is not the UI thread, anyway > acquire(exclusive); > > Your comment says, that it will be usually not the UI thread.. but in our > usecase it was exactly the UI thread that gets stuck in the uiSafeAcquire > method. Our deadlock occured because the pre and post startup methods of the > WorkbenchAdvisor (which calls IJobManager#suspend() and IJobManager#resume()) > will be invoked in the UI thread like our sticky transaction. The job manager > never gets resumed because the sticky transaction was invoked between the pre > and post startup calls. Hmmm ... perhaps that comment is erroneous, then. The probable reason for this suspension API's existence would seem to be UI code. > What's the danger on calling acquire(exclusive) in the UI thread? Potentially > interrupted exceptions? Yes. One of the curiosities in the Eclipse UI framework is that the event-queue processing code frequently interrupts the UI thread. The last time I investigated it, it seemed rather arbitrary. As it happens, in cases like this, it may actually end up being a good thing. :-)
Fix available in HEAD: 1.3.0.I200811021821.
Restore original target after milestones were deranged.