Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 320760 - Eclipse Help System WAR has a deadlock at startup
Summary: Eclipse Help System WAR has a deadlock at startup
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: Framework (show other bugs)
Version: 3.4.2   Edit
Hardware: Other other
: P3 normal (vote)
Target Milestone: 3.6.1   Edit
Assignee: Thomas Watson CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 318277
Blocks:
  Show dependency tree
 
Reported: 2010-07-23 14:19 EDT by Thomas Watson CLA
Modified: 2010-07-23 14:29 EDT (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Watson CLA 2010-07-23 14:19:38 EDT
+++ This bug was initially created as a clone of Bug #318277 +++

Build Identifier: 3.4

Our customers encountered a deadlock issue with our IEHS 3.4.2 WAR build, which is based on Eclipse Help System 3.4.2. And then, our development team extracted an EHS WAR package based on Eclipse 3.4.2. However, customers still met the deadlock error with the EHS WAR. Following is some info from customers:
======================================
The problem can be recreated by simply starting and stopping the server. I do not have have exact guess at the rate of failure, but is seem to happen every 30-50 server start.
I am running on a 8-way AIX machine, and have increased the server startup threads from 3 to 10, which seems to increase the rate of failure. I have also seen the problem on an solaris machine (running 3 startup thread).
The entire server process is dead locked by this failure.
======================================

Customers had done some investigation. They do believe that this is a problem with Equinox, and that there is a fix in the code. The fix would require enabling the fix via a configuration setting in the help system. Following are Information provided by them:
======================================
Here is my core dump on the what I think is happening on the server deadlock during server start.

I am running on a 8 processor AIX and Sun machine. Both sets of hardware have the problem. I have not run on windows.
The basic scenario is very simple, where the server simply started. On some instances we see a dead lock in the server runtime : 

1LKDEADLOCK    Deadlock detected !!!
NULL           ---------------------
NULL
2LKDEADLOCKTHR  Thread "server.startup : 7" (0x35148600)
3LKDEADLOCKWTR    is waiting for:
4LKDEADLOCKMON      sys_mon_t:0x3A241490 infl_mon_t: 0x3A2414B0:
4LKDEADLOCKOBJ      org/eclipse/osgi/framework/internal/protocol/StreamHandlerFactory@0xB0101770/0xB010177C:
3LKDEADLOCKOWN    which is owned by:
2LKDEADLOCKTHR  Thread "server.startup : 5" (0x37F4E100)
3LKDEADLOCKWTR    which is waiting for:
4LKDEADLOCKMON      sys_mon_t:0x3A241438 infl_mon_t: 0x3A241458:
4LKDEADLOCKOBJ      org/eclipse/equinox/servletbridge/FrameworkLauncher$ChildFirstURLClassLoader@0xB877F4D8/0xB877F4E4:
3LKDEADLOCKOWN    which is owned by:
2LKDEADLOCKTHR  Thread "server.startup : 7" (0x35148600)
NULL


The complex scenario is :
1) The WAS server runtime starts the launcher process for the server JVM
2) The OSGI environment is intialized by the server runtime
	- as part of the initialization, the singleton : ChildFirstURLClassLoader is created
3) A number of WAS component are started (no too interesting)
4) The WAS Application manager is invoked as part of the server start process to start the applications installed on the server
	- The BusinessSpacerHelp.war application is started
		- contained within the WAR is another osgi/equinox environment
		- the osgi envirment in the WAR is started
			- A new singleton : ChildFirstURLClassLoader is created
5) The dead lock occurs if another application is using the ChildFirstURLClassLoader


The problem is similar to a problem reported to WAS and fixed in PK81985 

However, where the system value was set in the customer properties for the JVM, the deadlock still occurred.
NOTE: The WAS solution consisted of two changes : 1) a fix to equinox, 2) a change to the launcher code.
(I am guessing but I think that the fix for equinox was to check to see if there had been a class already instantiated).

I believe that the reason why I continue to see the problem after running with the WAS setup, is because. 

THIS IS WHERE I NEED SOME HELP:

1) The equinox version which is running in the bspace help war does not include the fix associated with PK81985 (I do not have the associated bug report for the equinox fix).
2) The environment setting which are part of the WAS server start environment are not getting propagated to the bspace war environment. 
	- This is where I am a little fuzzy about what needs to be done, but suspect that there is an .ini file some where which could be updated which would all allow for the correct values to be set. 
	   I have looked the WAS code, and this that these value set which launching the osgi environment in the WAR would work around the problem:

	  osgi.parentClassload=fwk
	  osgi.frameworkParentClassloader=app
======================================

The logs can be found here : http://rchgsa.ibm.com/~malin/public/javacore.logs.zip

Reproducible: Sometimes
Comment 1 Thomas Watson CLA 2010-07-23 14:20:07 EDT
Should fix this for 3.6.1.
Comment 2 Thomas Watson CLA 2010-07-23 14:29:44 EDT
Fixed for 3.6.1