Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 323788

Summary: Deadlock on Display.syncExec()
Product: [Modeling] EMF Reporter: Martin Fluegge <martin.fluegge>
Component: cdo.coreAssignee: Eike Stepper <stepper>
Status: CLOSED FIXED QA Contact: Eike Stepper <stepper>
Severity: normal    
Priority: P3 CC: give.a.damus, vincent.hemery
Version: 4.5   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on: 340709    
Bug Blocks:    

Description Martin Fluegge CLA 2010-08-27 04:07:15 EDT
During the execution of edge deletion tests the started runtime instance randomly freeze when executing (see MultipleResourcesDeletionTest) edge deletion tests. 
Using print out debugging I saw that this freeze happens in most cases when I retrieve the connections form the SWTBofGef editor (there is some syncExec involved). But it also happens when the whole test is torn down (tearDown()). 

Unfortunately the test case passes in most cases when running in debug mode. ?

But yesterday I managed to catch the frozen state and check the stack trace.

At this time the following components were involved:

-	SWTBot (running some syncExec)
-	TransactionalEditingDomain.runExclusive
-	DawnGMFTransactionListener (triggered by a repository change)
-	The Legacy Wrapper (also triggered by a repository change)

But I could not really see which object could cause a potential deadlock. 

Luckily I never saw this deadlock when working ?manually? with two runtime instances. It also never happened on any other Dawn test. Neither on Node deletion, nor on Edge modification tests. So it could be that it only occurs in combination with SWTBot.
Comment 1 Vincent HEMERY CLA 2012-05-16 11:47:30 EDT
I have also encountered freezes while developping a component ontop of CDO.

Trying to debug the freeze, I happened to give a closer look at the following method, in which invalidation often calls graphical refresh (hence sync with UI Display):
org.eclipse.emf.internal.cdo.view.CDOViewImpl.doInvalidate(CDOBranch, long, List<CDORevisionKey>, List<CDOIDAndVersion>, Map<CDOID, InternalCDORevision>)

This method is synchronized. Yet, I think this is not enough.
While debugging it, I had this method called on both a transaction and a view. Hence, the "detachedObjects" variable can get filled for both before the conflicts and notifications are handled.
Instead of a synchronized method (which synchronizes on the this instance), you should probably use "synchronized(CDOViewImpl.class) {" to avoid this.

The fact that I had breakpoints must have helped provoking these freezes. And so does probably SWTBot.
Comment 2 Eike Stepper CLA 2012-06-05 07:30:52 EDT
Moving all open bug reports to 4.1 because the release is very near and it's hghly unlikely that there will be spare time to address 4.0 problems.

Please make sure that your patches can be applied against the master branch and that your problem is not already fixed there!!!
Comment 3 Eike Stepper CLA 2012-08-14 22:56:39 EDT
Moving all open issues to 4.2. Open bugs can be ported to 4.1 maintenance after they've been fixed in master.
Comment 4 Christian Damus CLA 2012-09-08 14:00:32 EDT
I can easily reproduce this in the Juno Modeling Package with Dawn installed.  I suggest that it has at least Major severity, if not Critical, because this will make it very hard to integrate Dawn into CDO-based applications.  Basically, in a Dawn context, one must use Dawn's API (on the UI thread?) to commit transactions.  Committing using the CDO API causes deadlock.

Steps to reproduce 100% of the time:

0. Start a CDO Server instance on H2 database.
1. Launch your CDO/Dawn Juno workbench.
2. Create a new Dawn Acore diagram.  Add a class to the diagram and name it "Foo."
3. ** Don't save the editor **.
4. Switch to the CDO Sessions view and find the dirty transaction that the
   Dawn Acore editor is using.  Context-click it and select "Commit".
5. Observe the deadlock.  On Mac OS X, it manifests as a SPOD.

The deadlock is between:

 * a runnable on the UI thread waiting for a CDOTransaction monitor (trying to
   get the eResource() of an EObject, which the CDOStoreImpl does under
   synchronization on its view)
 * CDO's commit job, waiting for the UI thread's runnable lock.  This job is
   notifying listeners that commit completed.  The CDOResourceImpl changes its
   isModified state to false (saved), which notifies the
   DawnTransactionChangeRecorder.  This, in turn, tells GMF's DiagramEventBroker
   that the diagram has been saved, and the broker tries to update the UI in a
   synchronous runnable

I copy the relevant stack traces, below.  Note that on Mac OS X, the UI thread is not the main thread, but Thread-1.  The main thread is native code.

Daemon Thread [Thread-1] (Suspended)	
	owns: RunnableLock  (id=73)	
	waiting for: CDOTransactionImpl  (id=72)	
	CDOStoreImpl.getResource(InternalEObject) line: 157	
	CDOResourceImpl(CDOObjectImpl).eDirectResource() line: 481	
	CDOResourceImpl.eDirectResource() line: 205	
	CDOResourceImpl(BasicEObjectImpl).eInternalResource() line: 925	
	CDOResourceImpl(CDOObjectImpl).eInternalResource() line: 492	
	CDOResourceImpl(BasicEObjectImpl).eResource() line: 920	
	AcorePropertySection(AbstractModelerPropertySection).isNotifierDeleted(Notification) line: 506	
	AdvancedPropertySection$1.run() line: 215	
	RunnableLock.run() line: 35	
	UISynchronizer(Synchronizer).runAsyncMessages(boolean) line: 135	
	Display.runAsyncMessages(boolean) line: 3944	
	Display.readAndDispatch() line: 3621	
	...

Thread [Worker-1] (Suspended)	
	owns: CDOTransactionImpl  (id=72)	
	waiting for: Semaphore  (id=81)	
	Object.wait(long) line: not available [native method]	
	Semaphore.acquire(long) line: 43	
	UISynchronizer.syncExec(Runnable) line: 168	
	Display.syncExec(Runnable) line: 4605	
	DiagramEventBrokerThreadSafe.resourceSetChanged(ResourceSetChangeEvent) line: 63	
	DiagramEditingDomainFactory$DiagramEditingDomain$2.run() line: 167	
	DawnDiagramEditingDomainFactory$DawnDiagramEditingDomain(TransactionalEditingDomainImpl).runExclusive(Runnable) line: 328	
	DawnDiagramEditingDomainFactory$DawnDiagramEditingDomain(DiagramEditingDomainFactory$DiagramEditingDomain).broadcastUnbatched(Notification) line: 163	
	DawnTransactionChangeRecorder(TransactionChangeRecorder).appendNotification(Notification) line: 319	
	DawnTransactionChangeRecorder(TransactionChangeRecorder).processResourceNotification(Notification) line: 272	
	DawnTransactionChangeRecorder(TransactionChangeRecorder).notifyChanged(Notification) line: 238	
	DawnTransactionChangeRecorder.notifyChanged(Notification) line: 44	
	CDOResourceImpl(BasicNotifierImpl).eNotify(Notification) line: 374	
	CDOResourceImpl.setModified(boolean) line: 396	
	CDOModificationTrackingAdapter$1.committedTransaction(CDOTransaction, CDOCommitContext) line: 45	
	CDOTransactionImpl$CDOCommitContextImpl.postCommit(CDOSessionProtocol$CommitTransactionResult) line: 2893	
	CDOSingleTransactionStrategyImpl.commit(InternalCDOTransaction, IProgressMonitor) line: 74	
	CDOTransactionImpl.commit(IProgressMonitor) line: 1144	
	CDOTransactionImpl.commit() line: 1164	
	CommitTransactionAction.doRun(IProgressMonitor) line: 38	
	LongRunningAction$1.run(IProgressMonitor) line: 185	
	Worker.run() line: 54
Comment 5 Christian Damus CLA 2013-01-09 11:26:55 EST
With Kepler post-M4 CDO I'm still getting this deadlock.  Today, it's a different call stack on the UI thread trying to synchronized on the CDOView (below).

The DiagramEventBroker::resourceSetChanged(...) method is processing an notification from a CDOResource (modified state changed) in a block of code guarded by an "notifier instance of EObject" check.  I'm not sure that we still wouldn't deadlock on some other object if the event broker checked explicitly for "!(notifier instanceof Resource)".

I'm going to try to compare against what happens when committing the transaction from the Dawn UI, because I don't know why that doesn't deadlock in the same way (seems like an inevitable deadlock between the worker thread committing the transaction and the UI thread which doesn't know that it has to synchronize on the CDOView to access basic EMF properties like eContainer).

-------- 8< --------

Daemon Thread [Thread-1] (Suspended)	
	owns: RunnableLock  (id=75)	
	waiting for: CDOTransactionImpl  (id=74)	
	CDOStoreImpl.getContainer(InternalEObject) line: 123	
	CDOResourceImpl(CDOObjectImpl).eInternalContainer() line: 619	
	CDOResourceImpl(BasicEObjectImpl).eContainer() line: 765	
	DiagramEventBrokerThreadSafe(DiagramEventBroker).getInterestedNotificationListeners(Notification, DiagramEventBroker$NotifierToKeyToListenersSetMap) line: 753	
	DiagramEventBrokerThreadSafe(DiagramEventBroker).fireNotification(Notification) line: 497	
	DiagramEventBrokerThreadSafe(DiagramEventBroker).resourceSetChanged(ResourceSetChangeEvent) line: 399	
	DiagramEventBrokerThreadSafe.internal_resourceSetChanged(ResourceSetChangeEvent) line: 84	
	DiagramEventBrokerThreadSafe.access$0(DiagramEventBrokerThreadSafe, ResourceSetChangeEvent) line: 83	
	DiagramEventBrokerThreadSafe$1.run() line: 65	
	...
Comment 6 Christian Damus CLA 2013-01-09 12:19:32 EST
I just realized I've hijacked this bug with a different problem.  I'll raise a new one.
Comment 7 Eike Stepper CLA 2013-06-29 12:17:53 EDT
We'll try to address open problems in 4.3 (master) first and then port fixes back to 4.2.
Comment 8 Eike Stepper CLA 2015-07-14 02:19:31 EDT
Moving all open bugzillas to 4.5.
Comment 9 Eike Stepper CLA 2016-01-11 13:39:50 EST
Christian, do you know how to create an Acore diagram with Mars or Neon?
Comment 10 Christian Damus CLA 2016-01-12 09:03:02 EST
(In reply to Eike Stepper from comment #9)
> Christian, do you know how to create an Acore diagram with Mars or Neon?

Do you not have the New Acore Diagram option in the New Wizard?  Perhaps you don't have the *.acore* bundles in your PDE Target or workspace?
Comment 11 Eike Stepper CLA 2016-01-16 02:45:21 EST
I'll generalize this bugzilla to "Deadlock on Display.syncExec()". A general solution will follow ASAP...
Comment 12 Eike Stepper CLA 2016-07-31 01:02:16 EDT
Moving all unaddressed bugzillas to 4.6.
Comment 13 Eike Stepper CLA 2016-10-06 00:10:18 EDT
The changes were committed to master in January 2016:
5c0eface8b30653f24895d753ed1dee469f17d2b
be45bd8af72711993c5c1777bf755a9506c5a90c
Comment 14 Eike Stepper CLA 2020-12-11 10:32:11 EST
Closing.
Comment 15 Eike Stepper CLA 2020-12-11 10:32:49 EST
Closing.