Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 316063

Summary: EcorePlugin.getWorkspaceRoot() called before workspace is initialized
Product: [Modeling] EMF Reporter: Bryan Hunt <bhunt>
Component: CoreAssignee: Ed Merks <Ed.Merks>
Status: RESOLVED WONTFIX QA Contact:
Severity: critical    
Priority: P3 CC: smccants, tjwatson
Version: 2.5.0   
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard:

Description Bryan Hunt CLA 2010-06-07 18:28:35 EDT
We have run into a case where one thread causes the ecore bundle to be activated which initializes the workspaceRoot inside EcorePlugin while another thread causes EcorePlugin.getWorkspaceRoot() to be called before the bundle and workspaceRoot are initialized.  The other thread gets null for the workspace root which bubbles up as an IOException.

I believe this can be simply fixed by wrapping a synchronized() around workspaceRoot.
Comment 1 Bryan Hunt CLA 2010-06-07 19:02:40 EDT
After some additional investigation, it is not 100% clear that the problem description is the root cause.  We will continue to investigate tomorrow and update the bug when we know more.  There's definitely a problem here, we just want to be completely sure of the root cause.
Comment 2 Ed Merks CLA 2010-06-07 19:45:04 EDT
Given that "start" does the initialization, I don't see now synchronized will help. Also, given that the EcorePlugin.Implementation is the first class loaded when any class within the ecore bundle needs to be loaded, it's hard to imagine how it's possible for clients to be using it fully before start has been called.  Doesn't OSGi ensure the bundle properly initialized before clients can start using it?
Comment 3 Stephen McCants CLA 2010-06-08 12:47:31 EDT
Bryan Hunt and I made some changes to EcorePlugin to try and unwind/prevent this problem.  Here are the changed functions:

  public static final ReentrantLock lock = new ReentrantLock();

  public static IWorkspaceRoot getWorkspaceRoot()
  {
  	lock.lock();
  	try
  	{
  		System.out.println("Getting workspace root which is "+workspaceRoot+" while plugin is "+INSTANCE.hashCode()+".");
      return workspaceRoot;	
		}
  	finally
  	{
  		lock.unlock();
  	}
  }

Also changed EcorePlugin.Implementation.start() at the top of the start function:

 public void start(BundleContext context) throws Exception
    {
      lock.lock();
      try
      {
      	System.out.println("ECore bundle is starting.");
      	super.start(context);
      	
      	if (IS_RESOURCES_BUNDLE_AVAILABLE && System.getProperty("org.eclipse.emf.ecore.plugin.EcorePlugin.doNotLoadResourcesPlugin") == null)
      	{
      		System.out.println("Initializing workspace root. Plugin = "+INSTANCE.hashCode());
      		workspaceRoot = ResourcesPlugin.getWorkspace().getRoot();
      		System.out.println("Workspace root has been set to "+workspaceRoot+".");
      	}
      }  	
      finally
      {
    	lock.unlock();
      }

In theory, this lock should prevent any threads from grabbing the workspaceRoot before it has been set.  In practice, here is the stdout from one of our runs:

ECore bundle is starting.
Initializing workspace root. Plugin = 1187464903
Getting workspace root which is null while plugin is 1187464903.
Getting workspace root which is null while plugin is 1187464903.
Workspace root has been set to R/.

As you can see, someone is getting the workspace root before it is set despite the lock around the setting and acquisition of the workspace root.  This is only possible if the setter and the acquirer are the same thread.  It turns out that that is the case.

Below is the stack trace that shows both initializing the workspace root and getting the workspace root in the same thread.  Before reading the stack trace (which is very long), let me explain what I think is happening:

1) Declaritive Services (DS) is starting a service that Bryan and I wrote named "ObjectRelationalMappingServiceMetaTypeProvider", which accesses EMF causing the class loader to load and start ECorePlugin$Implementation.
ObjectRelationalMappingServiceMetaTypeProvider.<init>() line: 18

2) ECorePlugin$Implementation.start() begins to initialize the workspace root.
EcorePlugin$Implementation.start(BundleContext) line: 527	

3) This causes the class loader to load and start the ResourcesPlugin.
ResourcesPlugin.start(BundleContext) line: 379	

4) The ResourcesPlugin.start(..) function makes a call to workspace.open().  So far so good.  The Workspace.startup(..) function is called and it creates and starts a CharSetManager.

5) The CharSetManager creates a CharsetDeltaJob.  This causes the class loader to load and start the org.eclipse.core.internal.content.Activator()
Activator.start(BundleContext) line: 49	

6) The Activator registers a service called IContentTypeManager.
Activator.start(BundleContext) line: 49	

7) This causes a ServiceEvent to be dispatched announcing the new service.
ServiceRegistry.publishServiceEventPrivileged(ServiceEvent) line: 756	

8) The SCRManager gets the event and calls Resolver.getEligible(..)
SCRManager.serviceChanged(ServiceEvent) line: 226	

9) The Resolver decides to build all the resolved components.  In our case there is one resolved component, which is our DatabaseService (com.ibm.hdwb.persistence.common.jdbc.database).  This causes the DatabaseService to be activated.  This is were we think things start to go awry.
Resolver.getEligible(ServiceEvent) line: 292	

10) The DatabaseService.activate(..) function makes a call to EMF's ResourceSet.getResource() to read in a configuration file.
DatabaseService.activate(ComponentContext) line: 67	

11) EMF's ResourceSetImpl.getResource(...) ends up calling EcorePlugin.getWorkspaceRoot(), but the workspace is not yet initialized, so the workspace root is still null.  
EcorePlugin.getWorkspaceRoot() line: 637 [local variables unavailable]	

12) This null value for the workspace root is cached by the ExtensibleURIConverterImpl, meaning that no EMF classes in the future that use that URI Converter can successfully access the workspace.
ExtensibleURIConverterImpl.<clinit>() line: 334 [local variables unavailable]	

We've discussed this some with Tom Watson and at his request are adding him to the cc list.

Our initial statement that this is a multi-threading issue was incorrect.  Synchronizing around the workspace root does not fix the problem.  We currently do not have a work around for this problem and are broken.

Here is the full stack trace that we just described:

Daemon Thread [Component Resolve Thread (Bundle 251)] (Suspended (breakpoint at line 637 in EcorePlugin))	
	EcorePlugin.getWorkspaceRoot() line: 637 [local variables unavailable]	
	ExtensibleURIConverterImpl.<clinit>() line: 334	
	J9VMInternals.initializeImpl(Class) line: not available [native method]	
	J9VMInternals.initialize(Class) line: 200	
	ResourceSetImpl.getURIConverter() line: 490	
	ResourceSetImpl.getResource(URI, boolean) line: 360	
	DatabaseConnectionManager.<init>(URI) line: 29	
	DatabaseService.activate(ComponentContext) line: 67	
	NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]	
	NativeMethodAccessorImpl.invoke(Object, Object[]) line: 48	
	DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 37	
	Method.invoke(Object, Object...) line: 600	
	ServiceComponent.activate(Object, ComponentContext) line: 206	
	ServiceComponentProp.activate(Bundle, ComponentInstanceImpl) line: 140	
	ServiceComponentProp.build(Bundle, Object, boolean) line: 330	
	InstanceProcess.buildComponent(Bundle, ServiceComponentProp, Object, boolean) line: 560	
	InstanceProcess.buildComponents(Vector, boolean) line: 182	
	Resolver.getEligible(ServiceEvent) line: 292	
	SCRManager.serviceChanged(ServiceEvent) line: 226	
	FilteredServiceListener.serviceChanged(ServiceEvent) line: 104	
	BundleContextImpl.dispatchEvent(Object, Object, int, Object) line: 933	
	EventManager.dispatchEvent(Set, EventDispatcher, int, Object) line: 227	
	ListenerQueue.dispatchEventSynchronous(int, Object) line: 149	
	ServiceRegistry.publishServiceEventPrivileged(ServiceEvent) line: 756	
	ServiceRegistry.publishServiceEvent(ServiceEvent) line: 711	
	ServiceRegistrationImpl.register(Dictionary) line: 129	
	ServiceRegistry.registerService(BundleContextImpl, String[], Object, Dictionary) line: 206	
	BundleContextImpl.registerService(String[], Object, Dictionary) line: 507	
	BundleContextImpl.registerService(String, Object, Dictionary) line: 525	
	Activator.start(BundleContext) line: 49	
	BundleContextImpl$1.run() line: 783	
	AccessController.doPrivileged(PrivilegedExceptionAction<T>) line: 251	
	BundleContextImpl.startActivator(BundleActivator) line: 774	
	BundleContextImpl.start() line: 755	
	BundleHost.startWorker(int) line: 352	
	BundleHost(AbstractBundle).start(int) line: 280	
	SecureAction.start(Bundle, int) line: 408	
	EclipseLazyStarter.postFindLocalClass(String, Class, ClasspathManager) line: 111	
	ClasspathManager.findLocalClass(String) line: 449	
	DefaultClassLoader.findLocalClass(String) line: 211	
	BundleLoader.findLocalClass(String) line: 381	
	SingleSourcePackage.loadClass(String) line: 33	
	BundleLoader.findClassInternal(String, boolean, ClassLoader) line: 454	
	BundleLoader.findClass(String, boolean) line: 410	
	BundleLoader.findClass(String) line: 398	
	DefaultClassLoader.loadClass(String, boolean) line: 105	
	DefaultClassLoader(ClassLoader).loadClass(String) line: 618	
	ClassLoader.defineClassImpl(String, byte[], int, int, Object) line: not available [native method]	
	DefaultClassLoader(ClassLoader).defineClass(String, byte[], int, int, ProtectionDomain) line: 274	
	DefaultClassLoader.defineClass(String, byte[], ClasspathEntry, BundleEntry) line: 183	
	ClasspathManager.defineClass(String, byte[], ClasspathEntry, BundleEntry, ClassLoadingStatsHook[]) line: 576	
	ClasspathManager.findClassImpl(String, ClasspathEntry, ClassLoadingStatsHook[]) line: 546	
	ClasspathManager.findLocalClassImpl(String, ClassLoadingStatsHook[]) line: 477	
	ClasspathManager.findLocalClass_LockClassLoader(String, ClassLoadingStatsHook[]) line: 465	
	ClasspathManager.findLocalClass(String) line: 445	
	DefaultClassLoader.findLocalClass(String) line: 211	
	BundleLoader.findLocalClass(String) line: 381	
	BundleLoader.findClassInternal(String, boolean, ClassLoader) line: 457	
	BundleLoader.findClass(String, boolean) line: 410	
	BundleLoader.findClass(String) line: 398	
	DefaultClassLoader.loadClass(String, boolean) line: 105	
	DefaultClassLoader(ClassLoader).loadClass(String) line: 618	
	CharsetManager.startup(IProgressMonitor) line: 301	
	Workspace.startup(IProgressMonitor) line: 1965	
	Workspace.open(IProgressMonitor) line: 1716	
	ResourcesPlugin.start(BundleContext) line: 379	
	BundleContextImpl$1.run() line: 783	
	AccessController.doPrivileged(PrivilegedExceptionAction<T>) line: 251	
	BundleContextImpl.startActivator(BundleActivator) line: 774	
	BundleContextImpl.start() line: 755	
	BundleHost.startWorker(int) line: 352	
	BundleHost(AbstractBundle).start(int) line: 280	
	SecureAction.start(Bundle, int) line: 408	
	EclipseLazyStarter.postFindLocalClass(String, Class, ClasspathManager) line: 111	
	ClasspathManager.findLocalClass(String) line: 449	
	DefaultClassLoader.findLocalClass(String) line: 211	
	BundleLoader.findLocalClass(String) line: 381	
	SingleSourcePackage.loadClass(String) line: 33	
	BundleLoader.findClassInternal(String, boolean, ClassLoader) line: 454	
	BundleLoader.findClass(String, boolean) line: 410	
	BundleLoader.findClass(String) line: 398	
	DefaultClassLoader.loadClass(String, boolean) line: 105	
	DefaultClassLoader(ClassLoader).loadClass(String) line: 618	
	EcorePlugin$Implementation.start(BundleContext) line: 527	
	BundleContextImpl$1.run() line: 783	
	AccessController.doPrivileged(PrivilegedExceptionAction<T>) line: 251	
	BundleContextImpl.startActivator(BundleActivator) line: 774	
	BundleContextImpl.start() line: 755	
	BundleHost.startWorker(int) line: 352	
	BundleHost(AbstractBundle).start(int) line: 280	
	SecureAction.start(Bundle, int) line: 408	
	EclipseLazyStarter.postFindLocalClass(String, Class, ClasspathManager) line: 111	
	ClasspathManager.findLocalClass(String) line: 449	
	DefaultClassLoader.findLocalClass(String) line: 211	
	BundleLoader.findLocalClass(String) line: 381	
	SingleSourcePackage.loadClass(String) line: 33	
	BundleLoader.findClassInternal(String, boolean, ClassLoader) line: 454	
	BundleLoader.findClass(String, boolean) line: 410	
	BundleLoader.findClass(String) line: 398	
	DefaultClassLoader.loadClass(String, boolean) line: 105	
	DefaultClassLoader(ClassLoader).loadClass(String) line: 618	
	ObjectRelationalMappingServiceMetaTypeProvider.<init>() line: 18	
	J9VMInternals.newInstanceImpl(Class) line: not available [native method]	
	Class<T>.newInstance() line: 1325	
	ServiceComponent.createInstance() line: 454	
	ServiceComponentProp.createInstance() line: 255	
	ServiceComponentProp.build(Bundle, Object, boolean) line: 316	
	InstanceProcess.buildComponent(Bundle, ServiceComponentProp, Object, boolean) line: 560	
	InstanceProcess.buildComponents(Vector, boolean) line: 182	
	Resolver.buildNewlySatisfied(boolean) line: 393	
	Resolver.enableComponents(Vector) line: 176	
	SCRManager.startedBundle(Bundle) line: 634	
	SCRManager.bundleChanged(BundleEvent) line: 239	
	BundleContextImpl.dispatchEvent(Object, Object, int, Object) line: 919	
	EventManager.dispatchEvent(Set, EventDispatcher, int, Object) line: 227	
	ListenerQueue.dispatchEventSynchronous(int, Object) line: 149	
	Framework.publishBundleEventPrivileged(BundleEvent) line: 1350	
	Framework.publishBundleEvent(int, Bundle) line: 1301	
	BundleHost.startWorker(int) line: 362	
	BundleHost(AbstractBundle).start(int) line: 280	
	SecureAction.start(Bundle, int) line: 408	
	EclipseLazyStarter.postFindLocalClass(String, Class, ClasspathManager) line: 111	
	ClasspathManager.findLocalClass(String) line: 449	
	DefaultClassLoader.findLocalClass(String) line: 211	
	BundleLoader.findLocalClass(String) line: 381	
	SingleSourcePackage.loadClass(String) line: 33	
	BundleLoader.findClassInternal(String, boolean, ClassLoader) line: 454	
	BundleLoader.findClass(String, boolean) line: 410	
	BundleLoader.findClass(String) line: 398	
	DefaultClassLoader.loadClass(String, boolean) line: 105	
	DefaultClassLoader(ClassLoader).loadClass(String) line: 618	
	Activator.start(BundleContext) line: 39	
	BundleContextImpl$1.run() line: 783	
	AccessController.doPrivileged(PrivilegedExceptionAction<T>) line: 251	
	BundleContextImpl.startActivator(BundleActivator) line: 774	
	BundleContextImpl.start() line: 755	
	BundleHost.startWorker(int) line: 352	
	BundleHost(AbstractBundle).start(int) line: 280	
	SecureAction.start(Bundle, int) line: 408	
	EclipseLazyStarter.postFindLocalClass(String, Class, ClasspathManager) line: 111	
	ClasspathManager.findLocalClass(String) line: 449	
	DefaultClassLoader.findLocalClass(String) line: 211	
	BundleLoader.findLocalClass(String) line: 381	
	BundleLoader.findClassInternal(String, boolean, ClassLoader) line: 457	
	BundleLoader.findClass(String, boolean) line: 410	
	BundleLoader.findClass(String) line: 398	
	DefaultClassLoader.loadClass(String, boolean) line: 105	
	DefaultClassLoader(ClassLoader).loadClass(String) line: 618	
	BundleLoader.loadClass(String) line: 326	
	BundleHost.loadClass(String, boolean) line: 231	
	BundleHost(AbstractBundle).loadClass(String) line: 1193	
	ServiceComponent.createInstance() line: 454	
	ServiceComponentProp.createInstance() line: 255	
	ServiceComponentProp.build(Bundle, Object, boolean) line: 316	
	InstanceProcess.buildComponent(Bundle, ServiceComponentProp, Object, boolean) line: 560	
	InstanceProcess.buildComponents(Vector, boolean) line: 182	
	Resolver.buildNewlySatisfied(boolean) line: 393	
	Resolver.enableComponents(Vector) line: 176	
	SCRManager.performWork(int, Object) line: 791	
	SCRManager$QueuedJob.dispatch() line: 758	
	WorkThread.run() line: 90	
	Executor.run() line: 70
Comment 4 Thomas Watson CLA 2010-06-08 13:46:11 EDT
One possible work around is to have an immediate component that depends on nothing but causes the activation of the ecore bundle and then enables your other components only after you ensure the ecore bundle is started.
Comment 5 Bryan Hunt CLA 2010-06-08 14:43:50 EDT
I implemented Tom's suggested workaround and it seems to be working.  Thanks Tom.
Comment 6 Ed Merks CLA 2010-06-11 19:04:11 EDT
Sounds like the workaround is a good general solution, but not one the framework itself can provide.
Comment 7 Stephen McCants CLA 2010-06-11 19:21:58 EDT
Ed, is it okay for the Framework to start unrelated services in the middle of starting a service?  All on the same thread.  That seems like it is asking for trouble.  Big, nasty, hard to debug trouble, much like we are seeing.

While the work around may work, I think there is a real problem with the framework haphazardly starting things and that shouldn't be swept under the rug.
Comment 8 Ed Merks CLA 2010-06-11 19:46:55 EDT
Given there's no test case that I can use to see a problem it's pretty hard to comment and impossible to verify that any change I might make is actually an improvement.  

EMF doesn't use declarative services, so I'm not sure what you're alluding to when referring to the framework starting an unrelated service in the middle of starting a service. Given we've never had a problem with this before, I'm not sure what particular aspect is haphazard.  Do you have a suggestion?  Would moving the initialization logic to getWorkspaceRoot help? As I said, I'm not in a position where I can verify that changes are actually fixes.
Comment 9 Stephen McCants CLA 2010-06-11 20:09:16 EDT
Ed, I suspect this is not actually an EMF problem, although I could be mistaken.

My suggestion is to not start additional services while starting a service, unless required.  The start of the DatabaseService is not related to it being required or used by ObjectRelationalMappingServiceMetaTypeProvider service.

The idea of moving the initialization of the Workspace Root has some merit, but I don't think it addresses the larger issue.  There are likely other shared resources that could exhibit similar problems.

This problem is very difficult to reproduce as it depends on start orders that are more or less random.  We've seen it in the past and seen it go away.  This time it, for some reason it hung around, but only with one particular launcher on one particular computer.  However, just because it is difficult to reproduce with doesn't mean an attempt to fix it should be made, especially after Bryan and I were able to debug it to a point of reasonable understanding.