Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 105491

Summary: deadlock: a simple scenario
Product: [Eclipse Project] Platform Reporter: Yasser Lulu <ylulu>
Component: RuntimeAssignee: John Arthorne <john.arthorne>
Status: RESOLVED INVALID QA Contact:
Severity: normal    
Priority: P3 CC: dirk_baeumer, dmisic, markus.kell.r, Tod_Creasey, zina
Version: 3.1   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on:    
Bug Blocks: 108162    
Attachments:
Description Flags
JUnit test to reproduce the problem none

Description Yasser Lulu CLA 2005-07-28 13:33:41 EDT
The following simplified code deadlocks, one can argue that it is expected and 
should be handled by the code author, however, when these snippets of code are 
triggered in response to Eclipse events by the various plugins that know 
nothing about each other, this issue becomes difficult -if not impossible- to 
resolve especially when there is no definite rules of enagagement specified by 
Eclipse platform nor there is a standardized arbitration when it comes to 
acquiring Eclipse global locks (e.g., workspace, ui):


//Add the following Action to a context-menu

new org.eclipse.jface.action.Action() {
  public void run() {
    try {
	ResourcesPlugin.getWorkspace().run(new IWorkspaceRunnable() {
	  public void run(IProgressMonitor monitor)
	  throws CoreException {
	  try {
		//Launch a progress dialog with fork set to true 
		new ProgressMonitorDialog(getShell()).run(true,
			true, new IRunnableWithProgress() {
			  public void run(IProgressMonitor monitor)
			  throws InvocationTargetException,
			  InterruptedException {
			    try {
				//Simply run a NOOP workspace runnable
				
      				//Deadlock: the UI thread already
				//owns the workspace lock and it is 
				//blocked on the progress monitor to 
				//return, which in turn is witing for
				//its runnable executing in the forked
				//thread to retrun, which in trun is
				//waiting to acquire the workspace lock
				//held by the UI thread.
				ResourcesPlugin.getWorkspace().run(
				  new IWorkspaceRunnable() {
					public void run(
						IProgressMonitor monitor)
						throws CoreException {
				  		//do nothing
				  	}
				  }, null);
				} catch (CoreException ex) {
					ex.printStackTrace();
				}
			  }
			});
		} catch (Exception ex) {
			ex.printStackTrace();
		}
	   }
	}, null);
    } catch (Exception ex) {
	ex.printStackTrace();
    }
  }
};
Comment 1 John Arthorne CLA 2005-07-28 17:53:14 EDT
This deadlocks in 3.1?  I thought we handled this case...

I just looked at NestedSyncExecDeadlockTest in our test suites and we handle
some subtle variants of this.  I'll investigate...
Comment 2 Yasser Lulu CLA 2005-07-28 18:07:23 EDT
This is the details of the Eclipse I'm using:
 Eclipse SDK
 Version: 3.1.0
 Build id: I20050617-1618

The JVM I'm using is: 
 Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_08-b03)

Comment 3 John Arthorne CLA 2005-08-04 11:52:30 EDT
From my first glance at this bug, I was confusing it with the case of:

1) non-UI thread gets workspace lock
2) non-UI thread does a syncExec
3) workspace lock is acquired in the sync exec.

This is the case that we handle currently.  In your situation, it is a more
general case of control being transferred to another thread.  We do have a
mechanism to make this work, and support for it is built into ModalContext. If
the operation passed to the ModalContext implements the interface
org.eclipse.jface.operation.IThreadListener, it will be notified when control is
about to pass to another thread.  This is an opportunity to transfer the
scheduling rule owned to the modal context thread.  See bug 80648.

Does this suite your requirements?  There is "standardized arbitration" between
core locks (ILock, ISchedulingRule), and the UI thread (sync exec essentially 
locks the UI thread). However, in this case the other lock is a plain Java
thread being created and the caller blocks on its completion (Thread.join()). 
There really isn't any way for the platform to automatically arbitrate lock
contention in these situations.
Comment 4 John Arthorne CLA 2005-08-15 16:10:45 EDT
The actual deadlock that was occurring in the user scenario was more complex:

1) Refactoring starts in the UI thread.  Workspace lock is taken, and a modal
context thread is forked (call this modal context MC1).
2) Refactoring transfers the workspace lock to MC1 using IJobManager.transferRule
3) Refactoring operation runs in MC1, causing various side-effects.  One
side-effect is another plugin schedules an asyncExec.
4) MC1 passes ownership of workspace lock back to UI thread, and exits
5) After passing control back to the UI thread, but before MC1 dies, the
asyncExec is run.
6) The asyncExec forks another model context (MC2), and blocks the UI thread in
another event loop.
7) MC2 tries to acquire the workspace lock and deadlocks because the workspace
lock is currently held by the UI Thread.

There are two problematic pieces of code that lead to the deadlock. The first is
the refactoring infrastructure, which runs the event loop at a time when it owns
the workspace lock.  This means that any asyncExec running at that moment that
tries to get the same lock will deadlock.  The second piece of code is the
asyncExec itself, which is indirectly acquiring the workspace lock from a
callback: a context where it cannot know what locks are currently owned by the
UI thread.  Both of these code patterns should be avoided if at all possible.
Comment 5 Dirk Baeumer CLA 2005-08-16 03:44:04 EDT
John, the problem you descibed regarding the refactoring infrastructure isn't
caused by refactoring. It is the general way how ModalContext and progress
reporting works. The event loop is run by the modal context infrastructure to
ensure that clients can press the cancel button. Allmost all UI actions follow
this pattern and therefore cause the same situation. Following your suggestions
no runnable executed via IRunnableContext.run (for example via the
ProgressMonitorDialog) could ever lock the workspace since the runnable context
will spin the event loop resulting in any (a)sync runnable to be executed.

However I agree that the behaviour is problematic in regards to deadlocks. We
are fighting the problem of the "free event" loop and the possibility that any
async or sync exec can run during a modal context thread since version 1.0 ;-)
We discussed several solutions to this in the past. Here is what I can remember
from the discussions:

- have an extra event queue in core. Most of the time clients execute an async 
  exec because they want to run after the currently executed workspace runnable.
  The only way how to achieve this currently is to use async exec.

- like executing a workspace runnble executed a runnable via the Display should
  take a scheduling rule allowing the infrastucture the detect these case. This
  would mean that all async exec which for example aquire the workspace lock 
  would wait until the modal context thread is finish.

- disallow locking the workspace or any parts of it inside a display runnable.
Comment 6 John Arthorne CLA 2005-08-16 18:40:39 EDT
Created attachment 26174 [details]
JUnit test to reproduce the problem

I have created this JUnit test that illustrates the problem being discussed. 
The test case in the original bug report was a bit too simple.	In that case it
is easier to argue that the deadlock is expected.  

In this JUnit test, you would not expect a deadlock from looking at the code. 
The second workspace operation occurs inside an asyncExec, so you would expect
the first operation to complete before the second one starts.  However, since
the first ModalContext spins the event loop (and it fact forces the event queue
to be flushed before returning), this test deadlocks every time.

I don't have an answer yet, but it feels like we should be able to handle this
particular case.  I think the real root of the deadlock is the nested
ModalContext threads (when this test is deadlocked, you see two modal context
event loops in the UI thread's stack trace).  If ModalContext could avoid the
nesting from happening it would break the deadlock.
Comment 7 Yasser Lulu CLA 2005-08-17 11:11:03 EDT
I think threr are three somewhat relevant issues comes to mind here, first is 
related to spinning the event-loop is sometimes is needed for dispatching 
system events (e.g., mouse-clicks, screen-paint) which means that executing 
asyncExec Runnables is not the intent of the caller, yet, the Display object 
only offers readAndDispatch that will run the asyncs, which as we can see can 
lead to some "unexpected" beahviour on the part of the calling client. Now, 
executing asyncRunnables is a legitimate behaviour and should stay the default 
for the message pump in Eclipse, besides some clients do need this behaviour 
when they explicitly call readAndDispatch, however some clients don’t, so, may 
be Eclipse should offer a different flavour for event flushing, one that 
dispatches system events only -no async runnables- this will give clients the 
option to pick and choose which flavour to call depending on their intent.

The second issue is that Eclipse does not specify a clear order-of-acquisition 
of its globally available locks, which is the only sure way to avoid deadlocks 
of any kind, currently, some clients acquire the UI -virtual- lock and then 
proceed to acquire Workspace lock, while others do exactly the opposite: 
acquire the Workspace lock then the UI lock, all of this works fine if these 
two clients are not executing in the same time, but once they do -as is the 
case of these two scenarios -the simplistic one or the more involved one- 
deadlock becomes inevitable. I suggest that some order of lock acquisition is 
needed here, meaning: you can't acquire lock B and then try to acquire lock A, 
you only can acquire A and then B, or acquire B alone without going back in the 
other direction and acquire A. Giving clients a clear-cut direction is 
important in deciding who is responsible for what whether it is Eclipse vs. 
clients or a client plugin vs. another client code.

Third, it seems that Eclipse introduced the IThreadListener interface to handle 
lock ownership transfer, however, the problem I find is that this technique 
is "partially" supported, meaning, it is not supported in every thread 
switching context offered by Eclipse, for example, it is supported by the 
ModalContext object before it forks a thread whereas it is not supported by the 
Display object when it runs syncExecs although a thread switching is taking 
place here -between the user-thread that called syncExec and the UI thread. If 
lock transfer is supported in all places, then clients can change their code to 
take advantage of that and avoid both deadlock scenarios we have here.
Comment 8 John Arthorne CLA 2005-08-17 12:23:30 EDT
Re comment #5:

> Almost all UI actions follow
> this pattern and therefore cause the same situation. Following your suggestions
> no runnable executed via IRunnableContext.run (for example via the
> ProgressMonitorDialog) could ever lock the workspace since the runnable context
> will spin the event loop resulting in any (a)sync runnable to be executed.

This isn't quite right.  In the "normal" case, the UI thread forks a
ModalContext thread and only the ModalContext thread acquires the workspace
lock.  If the UI thread runs an asyncExec while it is waiting, and that async
acquires the workspace lock, it will not cause a deadlock.  The modal context
operation will eventually complete and release the lock, and then the async will
complete.  The difference in the refactoring case is that when the modal context
operation completes, it does not release the lock but instead transfers the lock
back to the UI thread.  Any other background thread waiting for this lock (such
as another modal context), will not get a chance to run.  I know this is all
happen by design, but this coding pattern used by refactoring will cause
deadlocks in cases where "normal" modal context operations do not.
Comment 9 Dirk Baeumer CLA 2005-08-17 13:08:27 EDT
>> I know this is all happen by design, but this coding pattern used by 
>> refactoring will cause deadlocks in cases where "normal" modal context 
>> operations do not.

Agree that this can cause a deadlock in situations where the UI thread is
blocked by a (a)sync runnable waiting for a lock different than the workspace
lock. In all other cases there is no deadlock since when refactoring transfers
the rule back to the UI thread all they UI thread is doing is releasing the lock
right away. 

I opend PR 107237 to investigate whether refactoring could release the lock in
the modal context thread instead of transferring it back to the UI thread.
Comment 10 Dirk Baeumer CLA 2005-08-17 13:19:18 EDT
Some comments to comment #7.

First issue: that's exactly what we did in Visual Age Micro Edition. We filtered
all non system event and executed them after the modal context thread has
finished. The downside of this is that UI updating code thats reacts on model
deltas in a non UI thread has to post the updating code into the UI thread.
Since these are no system event they get filtered which leads to the fact that
the UI updates after the modal context thread has finished which sometimes looks
strange.

Third issue: you can transfer scheduling rules when using Display.syncExec. You
might want to have a look at UIPerformChangeOperation#executeChange
Comment 11 Yasser Lulu CLA 2005-08-17 13:40:23 EDT
(In reply to comment #10)
> Third issue: you can transfer scheduling rules when using Display.syncExec. 
You
> might want to have a look at UIPerformChangeOperation#executeChange

The problem with the code in the UIPerformChangeOperation is that the lock have 
transferred before the runnable runs, meaning, while the calling thread is 
still blocked on the Disaplay.syncExec to run, it doesn't own the lock anymore 
which means someone else that is currently running in the UI thread will have a 
chance to acquire the lock that just got transferred to the UI and this could 
lead to all kinds of issues. So I think the IThreadListener is the way to go to 
transfer locks -just when running the runnable and not before- and then 
transfer it back to tha caller.

Comment 12 Dirk Baeumer CLA 2005-08-29 11:09:26 EDT
Another scenario (not so uncommon) that cause deadlocks (from bug 108162):

- aquire a lock in the UI thread
- execute an operation (e.g. MoveFilesAndFoldersOperation) which spawns a modal
  context thread which aquires a lock as well
Comment 13 Tod Creasey CLA 2005-10-11 13:30:56 EDT
*** Bug 108162 has been marked as a duplicate of this bug. ***
Comment 14 Dusko CLA 2005-10-11 13:38:34 EDT
Since the bug #108162 has been marked as duplicate of this bug (and I am not 
entirely convinced that it is) I will re-post my question here:

Is there any news on this? I have not seen any Bugzilla activity on this bug 
(nor the 105491) in a while and was wondering what the plans are since it is 
really important for us.
Comment 15 John Arthorne CLA 2005-10-11 13:53:57 EDT
The best fix I can think of is to cause exceptions to be thrown when these
deadlock prone patterns are used.  This would ensure applications in the field
don't deadlock, and force client code away from dangerous things like waiting on
a thread while you own a scheduling rule.  For example, we could throw an
exception when:

 1) A rule is transferred to a thread that is waiting for an unrelated rule.
 2) A WorkspaceModifyOperation is forked from the UI while the UI owns a
scheduling rule.

Comment 16 Dusko CLA 2005-10-11 14:52:38 EDT
OK, the last comment I can agree with - to a certain degree. It is also the 
reason why I said that I am not entirely convinced that 108162 is a duplicate 
of this.

John already reopened the bug 108162 and I think it was the right thing to do.

As for this bug (105491), the exception would be improvement. At the least the 
application would not be dead and the user will not have to restart. I think 
it is a valid approach and should be done.

On the other hand, I do believe that there should be (much) more effort on the 
Eclipse side to enable existing APIs to work together and not get into the 
deadlock situations. By that I do not mean the ModalContext code / the thread 
mgmt code but rather its clients. We adopted Eclipse 3.1 and we are seeing a 
lot of deadlocks. Some of them were clearly caused by some practices in our 
code and we are trying to prune those. But there is also a number of them 
arising from using Eclipse APIs that do not indicate in any way what may or 
may not be used with them and still cause deadlock as in case of 108162. We 
can see that the RefactoringWizardDialog2 has an option to fork or not (on 
ModalContext) but that class is internal. The public class 
RefactoringWizardOpenOperation does not expose this flag but that is what is 
available to client code. So, either the public classes have to do more to 
give proper controls to their clients or they have to handle those themselves. 
The current situation is that lots of public APIs obtain those locks, don't 
expose or document any of that and do not handle the problems.
Comment 17 Dirk Baeumer CLA 2005-10-12 03:50:38 EDT
Dusko,

I don't understand your comments about the RefactoringWizardDialog2 and
RefactoringWizardOpenOperation. None of these two classes provide API to control
the fork mode. Are you referrring to the run method in RefactoringWizardDialog2,
which has an fork argument ? This method is an implementation of
IRunnableContext and necessary to use the RefactoringWizardDialog2 to execute
runnable with progress (basically to show progress inside of the dialog). This
implementation is basically the same as for the WizardDialog.

However, I agree that the RefactoringWizardOpenOperation should document that it
takes the workspace lock when executing a refactoring and that this lock is held
in the right threads when the change are created and executed for both
participants and processors. I added such a comment to the class.
Comment 18 Dusko CLA 2005-10-12 09:37:51 EDT
It is just an example. All I am saying is that RefactoringWizardDialog2 is an 
internal class being used by the framework and it has an option to fork the 
thread (in ModalContext). Yes, I know it is dictated by the interface but that 
is beyond the point - the client code never executes that directly and it is 
the framework that decides if the thread is going to be forked or not. Now, 
value for that parameter is important - it could either prevent or cause 
deadlock but the client code cannot control it. My point is that since the 
client cannot control how the ModalContext is used (actually, it does not even 
know that ModalContext is going to be used) the burden cannot be on the client 
code to deal with the deadlock.

I can see only 3 appropriate approaches:
1) The API that does not expose any controls on what and how is going to be 
locked and does not document it should deal with deadlock on its own
2) The API could give more control to the client code
3) The API could document what kind of locks it is going to obtain and/or how 
it deals with transfers

My point is that it cannot be left to the client code to hunt down hidden 
dependencies.
Comment 19 Dirk Baeumer CLA 2005-10-12 11:26:29 EDT
I agree that we either have to give more control (where appropriate) or document
this better.

The refactoring framework forks the execution into the modal context thread to
support cancelation of the refactoring execution. I am really reluctent to give
clients control over this since not forking will raise a new set of problems.
And more important: even if the refactoring would be executed with fork=false,
executing the MoveFilesAndFoldersOperation in a participant would result in the
same deadlock.

I updated the Javadoc to explain how refactoring deals with locks and threads.
Comment 20 Zina Mostafia CLA 2008-08-04 11:35:23 EDT
Any update on this bug ? It has not been targetted.
If this is not to be resolved, we need a good documentation on how clients should use this to avoid the deadlock.
Comment 21 John Arthorne CLA 2008-08-13 15:24:47 EDT
I don't see any more action to take on this bug.  The actions taken were:

 - Some specific cases that will certainly cause deadlock were changed to throw exceptions instead so clients can better detect and resolve these situations during development (tracked by bug 108162)

 - Dirk updated the refactoring javadoc to describe its use of locks.