Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 330716

Summary: Thread time-out value (5 seconds) should be configurable
Product: [Eclipse Project] Equinox Reporter: mstancamp
Component: FrameworkAssignee: Thomas Watson <tjwatson>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: carsten.pfeiffer, Ed.Merks, jamesblackburn+eclipse, john.arthorne, pascal, pwebster, stepper, tjwatson
Version: 3.6   
Target Milestone: 3.7 M5   
Hardware: All   
OS: All   
Whiteboard:
Bug Depends on:    
Bug Blocks: 337226    
Attachments:
Description Flags
patch none

Description mstancamp CLA 2010-11-19 16:48:52 EST
Build Identifier: Eclipse 3.4.2

Our product contains a large number of bundles (in the hundreds). Periodically
timeout messages appear during the start of the product. The following is an
example:

2010/02/26 19:37:48.890 WARNING While loading class
"com.mypackage.Activator", thread "Thread[MyMergeThread:com.otherpackage.kernel:START,5,main]" timed
out waiting (5000ms) for thread "Thread[Thread-1,5,main]" to finish starting
bundle "update@../../myproduct/eclipse/plugins/com.mypackage.console.jsf_1.2.0/
[318]". To avoid deadlock, thread
"Thread[MyMergeThread:com.otherpackage.kernel:START,5,main]" is
proceeding but "com.mypackage.Activator" may not
be fully initialized. ::class.method=unknown
::thread=MyMergeThread:com.otherpackage.kernel:START
::loggername=org.eclipse.osgi
   org.osgi.framework.BundleException: State change in progress for bundle
"update@../../myproduct/eclipse/plugins/com.mypackage.console.jsf_1.2.0/"
by thread "Thread-1".
                 at
org.eclipse.osgi.framework.internal.core.AbstractBundle.beginStateChange(AbstractBundle.java:1144)
                 at
org.eclipse.osgi.framework.internal.core.AbstractBundle.start(AbstractBundle.java:263)
                 at
org.eclipse.osgi.framework.util.SecureAction.start(SecureAction.java:400)
                 at
org.eclipse.core.runtime.internal.adaptor.EclipseLazyStarter.postFindLocalClass(EclipseLazyStarter.java:111)
                 at
org.eclipse.osgi.baseadaptor.loader.ClasspathManager.findLocalClass(ClasspathManager.java:427)
                 at
org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.findLocalClass(DefaultClassLoader.java:193)
                 at
org.eclipse.osgi.framework.internal.core.BundleLoader.findLocalClass(BundleLoader.java:370)
                 at
org.eclipse.osgi.framework.internal.core.BundleLoader.findClassInternal(BundleLoader.java:446)
                 at
org.eclipse.osgi.framework.internal.core.BundleLoader.findClass(BundleLoader.java:399)
                 at
org.eclipse.osgi.framework.internal.core.BundleLoader.findClass(BundleLoader.java:387)
                 at
org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:87)
                 at java.lang.ClassLoader.loadClass(ClassLoader.java:609)


We understand and don't disagree with the reasons for issuing this sort of
message. The issue is that the timeout value (5000ms) is hard-coded and is not
configurable. In many cases if the timeout value was longer the processing
would complete successfully. Allowing the timeout value to be configurable
would also help us determine when actual deadlock conditions occur.


Reproducible: Sometimes

Steps to Reproduce:
Not easily reproducible. See details above.
Comment 1 Thomas Watson CLA 2010-11-19 17:44:03 EST
It would be good to understand why you are hitting this and make sure you are not simply getting into a deadlock situation.  In most cases this error is observed when the activator thread and another thread get into a deadly embrace over the lock to the class loader.  If that is true then increasing this timeout value could simply result in a longer delay before we break out of the deadlock.
Comment 2 James Blackburn CLA 2010-11-19 17:59:06 EST
I agree.
I've never hit the 5s timeout without there being a real deadlock here. If your plugin can take more than 5s to startup you could be very well be block the UI (or any other caller) for that time, in real terms that's a long pause!
Comment 3 mstancamp CLA 2010-12-03 15:36:06 EST
As indicated in the note, our environment includes hundreds of plug-ins, which provide eclipse extensions, OSGi services, etc. Because of this environment the start-up of one bundle can cause multiple other bundles to be started. Also our environment is a headless environment, there are no Eclipse UI bundles involved. 
This hard-coded 5 second time-out is just an arbitrary value, and may not fit all environments. We are not asking that the time-out process be changed, just that the value is configurable. In an enterprise environment this would allow applications to modify the value to their needs, and actually help determine if real deadlocks exist. Please reconsider providing this enhancement.

Thanks.
Comment 4 Thomas Watson CLA 2010-12-03 15:43:52 EST
(In reply to comment #3)
> Please reconsider providing this enhancement.
> 
> Thanks.

My concern is that 99% of the time you are seeing real deadlock.  But I have no strong objection to making this configurable.  It is an arbitrary value as you observe.  But I would only be for a global setting, not a per bundle setting.  Is that what you are suggesting?
Comment 5 mstancamp CLA 2010-12-07 16:35:59 EST
A global setting is what is being requested. It would apply to all bundles.
Comment 6 Thomas Watson CLA 2011-01-04 11:30:45 EST
Created attachment 186022 [details]
patch

This patch adds a new configuration setting equinox.statechange.timeout which can be set in the config.ini or as a system prop to a timeout value in millis.  The default value is "5000".
Comment 7 Thomas Watson CLA 2011-01-04 11:32:15 EST
Patch released to head.
Comment 8 Eike Stepper CLA 2015-10-10 01:53:21 EDT
All my IDEs recently started to fail during *first* startup of a day with the same error message in a log file in the configuration directories.

All these IDEs use the same shared bundle pool and I have the feeling that it's related to the size of that bundle pool. There seems to be an Equinox startup thread that checks whether/how the bundles of an installation resolve. When I looked at this problem earlier (it used to happen infrequently only for my debug workbenches) I had the feeling that this early checking is done for *all* bundles in the bundle pool and not just for the bundles in the profile/installation, so that "unrelated" growth of the pool has a bad impact (at least quadratic but maybe even cubic, IIRC) on the startup time.

What's new to me now is the fact that even my host IDEs have this problem and it seems to be more severe because they really don't come up at all (the debug workbenches did come up in the end, with some error traces).

In both cases a new start attempt always succeeds. IIRC from my former research about this problem there's some timestamp checking involved so that this very expensive bundle pool scanning is done only once per day or so.

Sorry that this is all so vague, but the problem is very hard to debug for me because it happens at most once per day and IDE. Maybe the following system property (which I use for a while to prevent this problem in my debug launches) helps to ring an expert bell:

  -Dosgi.checkConfiguration=false

I know that this bugzilla has been closed long ago. I'd be happy to submit a new one if you want. If my vague feeling is right that this problem is related to a shared bundle pool a solution might become more important now that Oomph's Eclipse Installer creates installations with a shared bundle pool by default.