Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 342211

Summary: BridgeServlet webcontainer thread deadlock when a background framework restart in progress
Product: [Eclipse Project] Equinox Reporter: Clayton <smarterclayton>
Component: Server-SideAssignee: equinox.server-side-inbox <equinox.server-side-inbox>
Status: CLOSED WONTFIX QA Contact:
Severity: normal    
Priority: P3 CC: raymond.auge, simon_kaegi
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Attachments:
Description Flags
Java core for the hung web container thread, generated by WebSphere none

Description Clayton CLA 2011-04-07 15:06:10 EDT
Build Identifier: org.eclipse.equinox.servletbridge_1.1.0.v20090520-1800

We recently encountered a deadlock in BridgeServlet, where it appears that a background framework restart was in progress (which has a lock on FrameworkLauncher) and BridgeServlet.service attempts to get the framework classloader (which requires a lock on FrameworkLauncher).  The restart thread is waiting in unregisterServletDelegate for all of the accessors of BridgeServlet to finish, and the web container thread ends up waiting forever for the lock on FrameworkServlet.

Java dump (attached) included these two relevant snippets:

Relevant snippets:

3XMTHREADINFO      "Framework Event Dispatcher" TID:0x19865800, j9thread_t:0x1D165E04, state:CW, prio=5
3XMTHREADINFO1            (native thread ID:0x1E1C, native priority:0x5, native policy:UNKNOWN)
4XESTACKTRACE          at java/lang/Object.wait(Native Method)
4XESTACKTRACE          at java/lang/Object.wait(Object.java:167)
4XESTACKTRACE          at org/eclipse/equinox/servletbridge/BridgeServlet.unregisterServletDelegate(BridgeServlet.java:257)
4XESTACKTRACE          at org/eclipse/equinox/http/servletbridge/internal/Activator.stop(Activator.java:29)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/BundleContextImpl$2.run(BundleContextImpl.java:843)
4XESTACKTRACE          at java/security/AccessController.doPrivileged(AccessController.java:251(Compiled Code))
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/BundleContextImpl.stop(BundleContextImpl.java:836)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/BundleHost.stopWorker(BundleHost.java:474)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/AbstractBundle.suspend(AbstractBundle.java:546)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/Framework.suspendBundle(Framework.java:1098)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/StartLevelManager.decFWSL(StartLevelManager.java:593)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/StartLevelManager.doSetStartLevel(StartLevelManager.java:261)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/StartLevelManager.shutdown(StartLevelManager.java:216)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/InternalSystemBundle.suspend(InternalSystemBundle.java:266)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/Framework.shutdown(Framework.java:685)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/Framework.close(Framework.java:583)
4XESTACKTRACE          at org/eclipse/core/runtime/adaptor/EclipseStarter.shutdown(EclipseStarter.java:409)
4XESTACKTRACE          at sun/reflect/NativeMethodAccessorImpl.invoke0(Native Method)
4XESTACKTRACE          at sun/reflect/NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:45(Compiled Code))
4XESTACKTRACE          at sun/reflect/DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37(Compiled Code))
4XESTACKTRACE          at java/lang/reflect/Method.invoke(Method.java:599(Compiled Code))
4XESTACKTRACE          at org/eclipse/equinox/servletbridge/FrameworkLauncher.stop(FrameworkLauncher.java:522)
4XESTACKTRACE          at org/eclipse/equinox/servletbridge/FrameworkLauncher$2.run(FrameworkLauncher.java:362)
4XESTACKTRACE          at org/eclipse/core/runtime/adaptor/EclipseStarter$1.bundleChanged(EclipseStarter.java:1504)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/BundleContextImpl.dispatchEvent(BundleContextImpl.java:919)
4XESTACKTRACE          at org/eclipse/osgi/framework/eventmgr/EventManager.dispatchEvent(EventManager.java:227)
4XESTACKTRACE          at org/eclipse/osgi/framework/eventmgr/EventManager$EventThread.run(EventManager.java:337)

3XMTHREADINFO      "WebContainer : 0" TID:0x1ACCCE00, j9thread_t:0x1D6CA658, state:B, prio=5
3XMTHREADINFO1            (native thread ID:0x1A80, native priority:0x5, native policy:UNKNOWN)
4XESTACKTRACE          at org/eclipse/equinox/servletbridge/BridgeServlet.service(BridgeServlet.java:119)
4XESTACKTRACE          at javax/servlet/http/HttpServlet.service(HttpServlet.java:831)
4XESTACKTRACE          at com/ibm/ws/webcontainer/servlet/ServletWrapper.service(ServletWrapper.java:1657)

The apparent flow is:

webcontainer:0 enters BridgeServlet.service and calls acquireDelegateReference(), which increments the reference count of servlets accessing the framework. 
webcontainer:0 loses context
FrameworkEventDispatcher is invoked on detection of a change to a bundle (not sure why, need to dig more)
FrameworkEventDispatcher calls FrameworkLauncher.stop, which is synchronized (takes lock on FrameworkLauncher)
FrameworkEventDispatcher then drills down deep into the OSGi shutdown sequence - near the end it goes to unregister the servlet delegate on the BridgeServlet class.  It enters unregisterServletDelegate (which takes a lock on BridgeServlet) and then takes a lock on instance, and then because webcontainer:0 is still accessing BridgeServlet.service (hasn't called releaseDelegateReference), it waits for a notification on the instance lock.
webcontainer:0 now wakes up and calls FrameworkLauncher.getFrameworkClassLoader() which is synchronized, so it needs a lock on FrameworkLauncher.  FrameworkEventDispatcher still owns this lock, so it waits.  Deadlock.

Proposed solution is to attempt to get FrameworkLauncher.getFrameworkClassLoader() first, then to acquireDelegateReference().  If that method returns null, it means the framework is shut down and so we return immediately.  If the method returns a value, it means the framework is still running AND no one else can terminate until we complete because we incremented the reference count (the root of the problem).  Appears to fix the problem at hand, however I don't know if there are other issues that could occur due to contention or memory boundaries.  If this fix is insufficient, it would help to know where we fail so we could patch it locally.

Current BridgeServlet:
112 	ClassLoader original = Thread.currentThread().getContextClassLoader();
113 	HttpServlet servletReference = acquireDelegateReference();
114 	if (servletReference == null) {
115 	resp.sendError(HttpServletResponse.SC_NOT_FOUND, "BridgeServlet: " + req.getRequestURI()); //$NON-NLS-1$
116 	return;
117 	}
118 	try {
119 	Thread.currentThread().setContextClassLoader(framework.getFrameworkContextClassLoader());
120 	servletReference.service(req, resp);
121 	} finally {
122 	releaseDelegateReference();
123 	Thread.currentThread().setContextClassLoader(original);
124 	} 

Proposed BridgeServlet:
112 	ClassLoader original = Thread.currentThread().getContextClassLoader();
113 	ClassLoader frameworkClassLoader = framework.getFrameworkContextClassLoader();
114 	HttpServlet servletReference = acquireDelegateReference();
115 	if (servletReference == null) {
116 	resp.sendError(HttpServletResponse.SC_NOT_FOUND, "BridgeServlet: " + req.getRequestURI()); //$NON-NLS-1$
117 	return;
118 	}
119 	try {
120 	Thread.currentThread().setContextClassLoader(frameworkClassLoader);
121 	servletReference.service(req, resp);
122 	} finally {
123 	releaseDelegateReference();
124 	Thread.currentThread().setContextClassLoader(original);
125 	} 

Reproducible: Sometimes

Steps to Reproduce:
In an OSGi servletbridged web app, invoke FrameworkLauncher stop() in a separate thread after a webcontainer thread has acquired the delegate reference but before the webcontainer thread accesses the framework class loader.

Actually triggering the framework stop or restart is more complex - in the stack trace you can see the bundleChanged event is being fired by the framework, but I'm not sure what leads to that.  That triggers the forced restart of the framework from FrameworkLauncher, which calls stop() and thus sets this chain in motion.  It should be sufficient to recreate to insert a fairly long sleep between acquireDelegateReference() and the getFrameworkClassLoader() call, and invoke stop() via the bridge servlet's mechanism.

It's a very small window, but apparently we can reproduce this fairly often in our environment (automated deployment, starts web application, immediately begins running tests and changes filesystem contents).
Comment 1 Clayton CLA 2011-04-07 15:09:37 EDT
Created attachment 192771 [details]
Java core for the hung web container thread, generated by WebSphere
Comment 2 Simon Kaegi CLA 2011-04-07 15:37:43 EDT
Thanks for the analysis, excellent stack trace and context. We've seen this problem reported once before about a year ago but were unable to do very much with it because of the difficulty in reproducing the problem. This is nasty stuff and your description really helps -- you are a star.
Comment 3 Raymond Auge CLA 2018-01-22 22:49:05 EST
Closing due to inactivity.