Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 359535

Summary: Deadlock on startup when using Spring OSGi
Product: [Eclipse Project] Equinox Reporter: Jens Borrmann <jens.borrmann>
Component: FrameworkAssignee: Thomas Watson <tjwatson>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: glyn.normington, reto.urfer, steffen.pingel, tjwatson
Version: unspecified   
Target Milestone: Juno M3   
Hardware: PC   
OS: Windows 7   
Whiteboard:
Bug Depends on:    
Bug Blocks: 361533    
Attachments:
Description Flags
Annotated stacktraces of the two blocking threads
none
Stack Trace with dealock under Java7
none
Stacktrace of the deadlock using a backport of the patch to Indigo. none

Description Jens Borrmann CLA 2011-09-30 05:02:11 EDT
Build Identifier: 3.6.0.v20010613

I have come across a deadlock situation during the startup phase of our application that is caused by an interaction between Spring OSGi and Equinox. As far as I can tell from browsing through the Gemini Blueprint SVN repository the same situation also applies to Gemini Blueprint.

I will attach an annotated stacktrace of the two blocking threads. The numbers and letters in the following description refer to my annotations in the stacktrace.

During startup of the system the spring osgi extender tries to extend two bundles at the same time:

Thread [SpringOsgiExtenderThread-10]
1) starts refreshing Applicationcontext for business bundle A. 
2) This causes the loading of a bean which represents an OSGi service reference. Therefore an OsgiServiceProxyFactoryBean tries to create a proxy object for this service.
3) As a consequence of this a ServiceDynamicInterceptor is created which triggers loading of class org.springframework.osgi.service.importer.support.internal.aop.SwappingServiceReferenceProxy.
4) While loading this class a lock for the ClassLoader of spring-osgi-core is acquired.
5) During the class loading cascade another class from the same bundle is loaded (org.springframework.osgi.service.importer.support.internal.aop.BaseServiceReferenceProxy), which triggers
6) a reentry into the lock for the classloader, which does not do any harm.
7) When the class is found all ClassLoadingStatsHooks are called. One of them is the WeavingHookConfigurator. This causes
8) Access to the ServiceRegistry, which tries to
9) Lookup service registrations. The access to the service registry is synchronized. So SpringOsgiExtenderThread-10 blocks waiting for the release of the lock by the other thread.


SpringOsgiExtenderThread-20
a) tries to complete refreshing the application context for business bundle B. 
b) In the process of doing this a bean is found that represents a service registration <osgi:service .....> 
c) Since all preconditions are satisfied the service registration can take place. For this purpose a lock on the ServiceRegistry is acquired.
d) When initializing the ServiceProperties there is an iteraton over all keys in the Dictionary of properties used when registering the service. Spring OSGi and Gemini Blueprint does not use a JDK Dictionary class but a subclass org.springframework.osgi.util.internal.MapBasedDictionary. This class has been loaded before. 
6) But when calling the keys() method the inner class IteratorBasedEnumeration is used for the first time. This triggers class loading for this inner class. Classloading would require the lock on the class loader of spring-osgi-core, which is held by the other thread.


We did not see a really easy solution to this problem so far. One idea could be to reorder the sequence of steps in ServiceRegistrationImpl.register(...). Maybe it is possible to call createProperties() without holding the lock on the ServiceRegistry. But we are not aware of all side effects this might have.

Reproducible: Sometimes
Comment 1 Jens Borrmann CLA 2011-09-30 05:03:15 EDT
Created attachment 204352 [details]
Annotated stacktraces of the two blocking threads
Comment 2 Jens Borrmann CLA 2011-09-30 05:04:33 EDT
Sorry - I mistyped the build ID.

The correct value is: 3.7.0.v20110613
Comment 3 Thomas Watson CLA 2011-09-30 09:44:02 EDT
It would be interesting to know if this deadlock can be observed when running on Java 7.  With Java 7 Equinox no longer locks on the class loader object.  Instead it uses more fine grained locking based on the class loader and class name being defined.  Java 7 is required for this so we can take advantage of the parallel class loader support that got added in Java 7.
Comment 4 Glyn Normington CLA 2011-09-30 09:46:51 EDT
Prior to Java 7, this may help:

http://underlap.blogspot.com/2006/11/experimental-fix-for-sunbug-4670071.html
Comment 5 Thomas Watson CLA 2011-09-30 09:51:45 EDT
(In reply to comment #4)
> Prior to Java 7, this may help:
> 
> http://underlap.blogspot.com/2006/11/experimental-fix-for-sunbug-4670071.html

Thanks Glyn, note that in order to really take advantage of this on pre-Java7 you will need to configure equinox to not do any locking on the osgi class loader object.  This can be done with the following configuration property:

osgi.classloader.lock=classname
Comment 6 Jens Borrmann CLA 2011-09-30 09:52:28 EDT
Unfortunately, using Java 7 is no option in our case. Our last customers are
changing to Java 7 right now...

Reading the stacktrace looks as if having lock free class loading would resolve
the issue completely.
Comment 7 Thomas Watson CLA 2011-10-17 10:14:52 EDT
*** Bug 361129 has been marked as a duplicate of this bug. ***
Comment 8 Thomas Watson CLA 2011-10-17 10:15:47 EDT
Going to have to look at finding the weaving hook registrations outside of the class loader lock.
Comment 9 Reto Urfer CLA 2011-10-17 11:07:30 EDT
we have already defined the properties -XX:+UnlockDiagnosticVMOptions -XX:+UnsyncloadClass
They dont help.

The deadlock also occurs with Java 7. I have just tried this.
Comment 10 Reto Urfer CLA 2011-10-17 11:19:54 EDT
I don't think it i a Java problem. The locks are both hold by OSGi classes:

Thread A:
- org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.register(Dictionary<String, ?>) holds the lock on the ServiceRegistry
...
- org.eclipse.osgi.internal.loader.BundleLoader.lock(Object) waits for the lock of the DefaultClassLoader


Thread B:
- org.eclipse.osgi.baseadaptor.loader.ClasspathManager.findLocalClass_LockClassLoader(String, ClassLoadingStatsHook[]) holds the lock on the DefaultClassLoader
...
- org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.lookupServiceRegistrations(String, Filter) waits for the lock of the ServiceRegistry
Comment 11 Thomas Watson CLA 2011-10-17 11:56:35 EDT
(In reply to comment #10)
> I don't think it i a Java problem. The locks are both hold by OSGi classes:

Could you provide a dump with the deadlock on Java 7.

> 
> Thread A:
> -
> org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.register(Dictionary<String,
> ?>) holds the lock on the ServiceRegistry
> ...
> - org.eclipse.osgi.internal.loader.BundleLoader.lock(Object) waits for the lock
> of the DefaultClassLoader
> 

The fact that you are entering the code BundleLoader.lock(Object) suggests you are running with the option osgi.classloader.singleThreadLoads=true

I need to rip this support out of the framework (see bug212262 comment 23)


> 
> Thread B:
> -
> org.eclipse.osgi.baseadaptor.loader.ClasspathManager.findLocalClass_LockClassLoader(String,
> ClassLoadingStatsHook[]) holds the lock on the DefaultClassLoader
> ...
> -
> 

On Java 7 I expect this to go down a different path that does not lock the class loader: ClassPathManager.findLocalClass_LockClassName.

Can you try to reproduce this on Java 7 without setting osgi.classloader.singleThreadLoads=true.  If you can reproduce, please attach a thread dump with the hang.  Thanks.

org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.lookupServiceRegistrations(String,
> Filter) waits for the lock of the ServiceRegistry
Comment 12 Reto Urfer CLA 2011-10-18 02:58:56 EDT
Am I right that I don't use the properties -XX:+UnlockDiagnosticVMOptions
-XX:+UnsyncloadClass anymore with Java7?

And whats about the option osgi.classloader.singleThreadLoads=true in Eclipse 3.7.1? Is it still needed?
Comment 13 Reto Urfer CLA 2011-10-18 03:44:16 EDT
Created attachment 205390 [details]
Stack Trace with dealock under Java7
Comment 14 Reto Urfer CLA 2011-10-18 03:45:47 EDT
I tried once more with Java7, I removed properties -XX:+UnlockDiagnosticVMOptions
-XX:+UnsyncloadClass

With osgi.classloader.singleThreadLoads=true the deadlock still occurs

With osgi.classloader.singleThreadLoads=false i could not reproduce the deadlock anymore
Comment 15 Thomas Watson CLA 2011-10-18 08:59:37 EDT
(In reply to comment #12)
> Am I right that I don't use the properties -XX:+UnlockDiagnosticVMOptions
> -XX:+UnsyncloadClass anymore with Java7?

Correct, these options are not needed.  Java 7 added a new static method that allows a class loader to be registered as a "parallel" class loader.  When on Java 7 Equinox registers its class loaders as "parallel" class loaders.  (http://download.oracle.com/javase/7/docs/api/java/lang/ClassLoader.html#registerAsParallelCapable())

This eliminates the need to specify the above -XX options.  These hidden options essentially made every class loader a "parallel" class loader.

> 
> And whats about the option osgi.classloader.singleThreadLoads=true in Eclipse
> 3.7.1? Is it still needed?

That option is dangerous and should never be used on modern VMs (J2SE 5 or greater) since these VMs perform a native fine grain lock on the class name.  Such a fine grain lock will lead to out of order locking with the single thread lock.  I need to remove this option from the framework.
Comment 16 Thomas Watson CLA 2011-10-18 09:05:17 EDT
(In reply to comment #13)
> Created attachment 205390 [details]
> Stack Trace with dealock under Java7

This is still using the osgi.classloader.singleThreadLoads=true option which can lead to deadlock.

(In reply to comment #14)
> I tried once more with Java7, I removed properties
> -XX:+UnlockDiagnosticVMOptions
> -XX:+UnsyncloadClass
> 
> With osgi.classloader.singleThreadLoads=true the deadlock still occurs
> 
> With osgi.classloader.singleThreadLoads=false i could not reproduce the
> deadlock anymore

Thanks for your perseverance and patience!  This is the expected behavior and it is good to hear Java 7 can help with these kinds of deadlocks.  We still need to investigate how to avoid locking the registry while holding the class loader coarse grain lock on Java 6.
Comment 17 Thomas Watson CLA 2011-10-19 15:56:25 EDT
I will put a fix to this in M3.
Comment 18 Thomas Watson CLA 2011-10-19 16:05:48 EDT
I have released a fix in commit:

http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=306f7f89aff350fd83b816f2a81d6a911c06733f

The basics idea behind the fix is to avoid holding any locks while calling out to the class loader hooks.  This forced us into calling ClassLoader.findLoadedClass two times before defining a class.
 1) Before locating the class byte[] and calling weaving hooks
 2) Just before defining the class

This will ensure that the framework is not holding any class loader locks, either fine grained on Java 7, or coarse grain on Java 6 or older while calling out to the class loader hooks.

There is still potential for deadlock on pre-Java7 VMs because there are many cases where the native VM locks the class loader in native code before calling into the OSGi class loaders.  I would like others to test this out to see if it actually helps on pre-Java 7 VMs.
Comment 19 Jens Borrmann CLA 2011-10-20 01:11:52 EDT
Thanks for looking at this issue. Is there a chance for the fix being backported to Indigo?
Comment 20 Thomas Watson CLA 2011-10-20 08:29:10 EDT
(In reply to comment #19)
> Thanks for looking at this issue. Is there a chance for the fix being
> backported to Indigo?

This fix is a bit risky, I want it to get some extensive testing in Juno before backporting to Indigo (3.7.2 Equinox) release.  Juno M3 is next week.  Could you either test on that or try one of the latest nightly builds to confirm it helps fix your deadlock when using Java 6?
Comment 21 Thomas Watson CLA 2011-10-20 08:30:12 EDT
I forgot to mention that I opened bug361533 to consider backporting to 3.7.2.
Comment 22 Reto Urfer CLA 2011-10-24 03:30:58 EDT
I tried with the build I20111021 from Equinox 3.8. I could start our application several times with Java6 and the vm options -XX:+UnlockDiagnosticVMOptions -XX:+UnsyncloadClass but without setting osgi.classloader.singleThreadLoads=true.

If i also set osgi.classloader.singleThreadLoads=true then the application blocks now with Java6
Comment 23 Thomas Watson CLA 2011-10-24 10:06:13 EDT
(In reply to comment #22)
> I tried with the build I20111021 from Equinox 3.8. I could start our
> application several times with Java6 and the vm options
> -XX:+UnlockDiagnosticVMOptions -XX:+UnsyncloadClass but without setting
> osgi.classloader.singleThreadLoads=true.

Did you happen to test without using the -XX options?

> 
> If i also set osgi.classloader.singleThreadLoads=true then the application
> blocks now with Java6

I opened bug361806 to remove this option.
Comment 24 Reto Urfer CLA 2011-10-25 02:47:27 EDT
I tried now with Java6 and without the vm options -XX:+UnlockDiagnosticVMOptions -XX:+UnsyncloadClass and still could start the application several times.

I don't remember why we used this flags but we had different problems with deadlocks before and could only solve them using these flags together with osgi.classloader.singleThreadLoads=true.
Comment 25 Jens Borrmann CLA 2011-11-14 01:08:54 EST
Maybe that there is still a problem in this area. We were not able to migrate to Juno so we backported the fix in ClasspathManager.java to Indigo ourselves.

After some days the deadlock reappeared on my colleagues machine. I'll attach a stacktrace and leave it to your decision wether the bugs needs to be reopended.
Comment 26 Jens Borrmann CLA 2011-11-14 01:10:39 EST
Created attachment 206913 [details]
Stacktrace of the deadlock using a backport of the patch to Indigo.
Comment 27 Thomas Watson CLA 2011-11-17 10:45:27 EST
(In reply to comment #26)
> Created attachment 206913 [details]
> Stacktrace of the deadlock using a backport of the patch to Indigo.

On Java 6 or earlier the Sun VM locks the class loader natively.  You would have try the VM options "-XX:+UnlockDiagnosticVMOptions -XX:+UnsyncloadClass" to avoid this lock.