Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 323238 - [Workbench] Deadlock caused by WorkbenchPlugin
Summary: [Workbench] Deadlock caused by WorkbenchPlugin
Status: CLOSED DUPLICATE of bug 344727
Alias: None
Product: Platform
Classification: Eclipse Project
Component: UI (show other bugs)
Version: 3.7   Edit
Hardware: PC Windows XP
: P3 major (vote)
Target Milestone: ---   Edit
Assignee: Platform UI Triaged CLA
QA Contact: Oleg Besedin CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-20 08:00 EDT by Jens Borrmann CLA
Modified: 2013-11-10 22:32 EST (History)
6 users (show)

See Also:


Attachments
Stacktrace of the hanging threads (16.65 KB, text/plain)
2010-08-20 08:01 EDT, Jens Borrmann CLA
no flags Details
test code to reproduce this issue. (13.57 KB, application/octet-stream)
2011-09-26 07:41 EDT, Meng Xin Zhu CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jens Borrmann CLA 2010-08-20 08:00:14 EDT
Build Identifier: 1.3.0.20100615-1704

I am currently facing deadlocks when starting my RAP based application. Stack traces are attached. The deadlock occurs when the two following events happen (with some bad luck in timing):
The (*n*) refer to markers I added to the stack traces.

Thread_1 (SpringOsgiExtenderThread-2 in the attached stack trace):
(*1*) For the first time a class from rap.ui.workbench must be loaded. During the starting process the classloader of rap.ui.workbench is locked (ClassLoader.loadClassInternal(String) is synchronized method).
(*2*) This triggers the lazy start mechanism for rap.ui.workbench. The first statement in WorkbenchPlugin.start(BundleContext) registers a BundleListener.
(*3*) After that additional code is executed, which leads to loading classes from another bundle (jface),
(*4*) which in turn triggers lazy start for this bundle.
(*5*) After starting jface the according BundleEvent is fired. The BundleListener that has been registered is called, which leads to
(*6*) WorkbenchPlugin.bundleChanged(BundleEvent) being called. The lock for WorkbenchPlugin.startingBundles cannot be acquired because it already held by Thread_2.

Thread_2 (SpringOsgiExtenderThread-9 in the attached stack trace):
(*A*) After the BundleListener has been registered by Thread_1 but before the starting of jface has ended another thread finishes starting a bundle.
(*B*) By that a BundleEvent is fired, which results in
(*C*) WorkbenchPlugin.bundleChanged(BundleEvent) being executed. The lock for WorkbenchPlugin.startingBundles is successfully acquired. The switch statement in the synchronized-Block starts another class loading (class BundleEvent). Of course, the classloader of rap.ui.workbench that has been locked by Thread_1 must be used in this situation.

Result: Deadlock.

Suggested solution: Change methodBundleChanged(BundleEvent):
  private void bundleChanged(final BundleEvent event) {
    // a bundle in the STARTING state generates 2 events, LAZY_ACTIVATION
    // when it enters STARTING and STARTING when it exists STARTING :-)
    int type = event.getType();
    Bundle bundle = event.getBundle();
    synchronized (this.startingBundles) {
      switch (type) {
        case BundleEvent.STARTING:
          this.startingBundles.add(bundle);
          break;
        case BundleEvent.STARTED:
        case BundleEvent.STOPPED:
          this.startingBundles.remove(bundle);
          break;
        default:
          break;
      }
    }
  }

By reordering the statements the classloading does not take place in the critical section but before it. If the classloading blocks the blocked thread does not own the lock.

Reproducible: Sometimes
Comment 1 Jens Borrmann CLA 2010-08-20 08:01:44 EDT
Created attachment 177084 [details]
Stacktrace of the hanging threads
Comment 2 Jens Borrmann CLA 2010-08-20 08:03:57 EDT
I forgot to mention that after using the fix I suggested the deadlock situations have not occurred again. Before the fix about one of ten restarts led to a deadlock. Of course, this does not prove that my solution is correct, but shows that at least it does not make the situation worse.
Comment 3 Jens Borrmann CLA 2010-09-06 05:20:36 EDT
Is there a chance that a fix for this bug will be included in the 1.3.1 version?
Comment 4 Ivan Furnadjiev CLA 2010-10-07 05:02:10 EDT
Hi Jens, sorry for the late response. It's too late for 1.3.1, but we will look at your proposal as soon as possible.
Comment 5 Ralf Sternberg CLA 2011-02-26 11:06:45 EST
This bug is not directly related to RAP, the code in question is a verbatim copy of WorkbenchPlugin#bundleChanged(). It seems that the problem described here does apply to the original workbench bundle as well. Thus moving to Platform UI.
Comment 6 Jens Borrmann CLA 2011-02-28 08:27:17 EST
Do I get you right that the process will be as follows?
1) There will be a fix for Platform UI. 
2) Afterwards that fix will be integrated into the RAP code base.
Comment 7 Paul Webster CLA 2011-02-28 08:28:41 EST
(In reply to comment #6)
> Do I get you right that the process will be as follows?

We would need to understand the why.  It doesn't appear to happen in the Eclipse SDK.

PW
Comment 8 Jens Borrmann CLA 2011-02-28 08:42:02 EST
I did not try to verify that the same vulnerability also exists in the Platform UI as described by Ralf Sternberg. Therefore my speculations may only apply to the RAP case where I observed the deadlocks.

In my opinion a key to our greater vulnerability is that we use Spring Dynamic Modules (Spring OSGi). Therefore the chance for many bundles starting in parallel (and not just being started sequentially by the Startlevel Event Dispatcher) is dramatically increased.
Comment 9 Paul Webster CLA 2011-02-28 08:50:17 EST
(In reply to comment #8)
> I did not try to verify that the same vulnerability also exists in the Platform
> UI as described by Ralf Sternberg.

We'll still investigate and see if there's a safe fix to protect against rapid bundle changes during startup.

PW
Comment 10 Jens Borrmann CLA 2011-02-28 08:57:02 EST
Hi Paul,

thanks for looking at this nasty thing.

My suggested fix in the original bug description just reorders the resource (=classloader) acquisition not your protection algorithm.
Comment 11 Meng Xin Zhu CLA 2011-09-26 07:41:05 EDT
Created attachment 204007 [details]
test code to reproduce this issue.

Following below steps are easy to reproduce the deadlock,

1. set breakpoints on line 1239 of WorkbenchPlugin.java
the line WorkbenchPlugin.this.bundleChanged(event);
2. debug the run configuration named 'deadlock-workbenchplugin'
3. the vm will be suspended on both main thread and thread-1
4. step in the thread-1 firstly
5. continue the main thread,

You would see the deadlock on a competition of DefaultClassloader object.
Comment 12 Meng Xin Zhu CLA 2011-09-26 07:42:12 EDT
+1 for the proposed solution.
Comment 13 Jens Borrmann CLA 2011-09-30 01:35:04 EDT
I came across Bug 344030 which seems to be at least quite similar to what I have described here. Maybe it is an exact duplicate. Can somebody from the Platform team please verify wether this is really the case so that both bugs can be handled together.

By the way: We have used the patch that I proposed last year both in our tests and in production and never faced any problems in this area so far.
Comment 14 Jens Borrmann CLA 2011-09-30 01:43:59 EDT
Found another bug that seems to be describing the same issue. So closing this one as a duplicate.

The fix that is used for Bug 344727 is identical to the one proposed by me one year ago. IMHO deadlock bug reports should be taken more seriously - at least when information about the conditions in which a deadlock occurs is available. In this case this would have saved many people from finding the same issue and trying to find the causes and a solution.

*** This bug has been marked as a duplicate of bug 344727 ***