Community
Participate
Working Groups
We have run into a case where one thread causes the ecore bundle to be activated which initializes the workspaceRoot inside EcorePlugin while another thread causes EcorePlugin.getWorkspaceRoot() to be called before the bundle and workspaceRoot are initialized. The other thread gets null for the workspace root which bubbles up as an IOException. I believe this can be simply fixed by wrapping a synchronized() around workspaceRoot.
After some additional investigation, it is not 100% clear that the problem description is the root cause. We will continue to investigate tomorrow and update the bug when we know more. There's definitely a problem here, we just want to be completely sure of the root cause.
Given that "start" does the initialization, I don't see now synchronized will help. Also, given that the EcorePlugin.Implementation is the first class loaded when any class within the ecore bundle needs to be loaded, it's hard to imagine how it's possible for clients to be using it fully before start has been called. Doesn't OSGi ensure the bundle properly initialized before clients can start using it?
Bryan Hunt and I made some changes to EcorePlugin to try and unwind/prevent this problem. Here are the changed functions: public static final ReentrantLock lock = new ReentrantLock(); public static IWorkspaceRoot getWorkspaceRoot() { lock.lock(); try { System.out.println("Getting workspace root which is "+workspaceRoot+" while plugin is "+INSTANCE.hashCode()+"."); return workspaceRoot; } finally { lock.unlock(); } } Also changed EcorePlugin.Implementation.start() at the top of the start function: public void start(BundleContext context) throws Exception { lock.lock(); try { System.out.println("ECore bundle is starting."); super.start(context); if (IS_RESOURCES_BUNDLE_AVAILABLE && System.getProperty("org.eclipse.emf.ecore.plugin.EcorePlugin.doNotLoadResourcesPlugin") == null) { System.out.println("Initializing workspace root. Plugin = "+INSTANCE.hashCode()); workspaceRoot = ResourcesPlugin.getWorkspace().getRoot(); System.out.println("Workspace root has been set to "+workspaceRoot+"."); } } finally { lock.unlock(); } In theory, this lock should prevent any threads from grabbing the workspaceRoot before it has been set. In practice, here is the stdout from one of our runs: ECore bundle is starting. Initializing workspace root. Plugin = 1187464903 Getting workspace root which is null while plugin is 1187464903. Getting workspace root which is null while plugin is 1187464903. Workspace root has been set to R/. As you can see, someone is getting the workspace root before it is set despite the lock around the setting and acquisition of the workspace root. This is only possible if the setter and the acquirer are the same thread. It turns out that that is the case. Below is the stack trace that shows both initializing the workspace root and getting the workspace root in the same thread. Before reading the stack trace (which is very long), let me explain what I think is happening: 1) Declaritive Services (DS) is starting a service that Bryan and I wrote named "ObjectRelationalMappingServiceMetaTypeProvider", which accesses EMF causing the class loader to load and start ECorePlugin$Implementation. ObjectRelationalMappingServiceMetaTypeProvider.<init>() line: 18 2) ECorePlugin$Implementation.start() begins to initialize the workspace root. EcorePlugin$Implementation.start(BundleContext) line: 527 3) This causes the class loader to load and start the ResourcesPlugin. ResourcesPlugin.start(BundleContext) line: 379 4) The ResourcesPlugin.start(..) function makes a call to workspace.open(). So far so good. The Workspace.startup(..) function is called and it creates and starts a CharSetManager. 5) The CharSetManager creates a CharsetDeltaJob. This causes the class loader to load and start the org.eclipse.core.internal.content.Activator() Activator.start(BundleContext) line: 49 6) The Activator registers a service called IContentTypeManager. Activator.start(BundleContext) line: 49 7) This causes a ServiceEvent to be dispatched announcing the new service. ServiceRegistry.publishServiceEventPrivileged(ServiceEvent) line: 756 8) The SCRManager gets the event and calls Resolver.getEligible(..) SCRManager.serviceChanged(ServiceEvent) line: 226 9) The Resolver decides to build all the resolved components. In our case there is one resolved component, which is our DatabaseService (com.ibm.hdwb.persistence.common.jdbc.database). This causes the DatabaseService to be activated. This is were we think things start to go awry. Resolver.getEligible(ServiceEvent) line: 292 10) The DatabaseService.activate(..) function makes a call to EMF's ResourceSet.getResource() to read in a configuration file. DatabaseService.activate(ComponentContext) line: 67 11) EMF's ResourceSetImpl.getResource(...) ends up calling EcorePlugin.getWorkspaceRoot(), but the workspace is not yet initialized, so the workspace root is still null. EcorePlugin.getWorkspaceRoot() line: 637 [local variables unavailable] 12) This null value for the workspace root is cached by the ExtensibleURIConverterImpl, meaning that no EMF classes in the future that use that URI Converter can successfully access the workspace. ExtensibleURIConverterImpl.<clinit>() line: 334 [local variables unavailable] We've discussed this some with Tom Watson and at his request are adding him to the cc list. Our initial statement that this is a multi-threading issue was incorrect. Synchronizing around the workspace root does not fix the problem. We currently do not have a work around for this problem and are broken. Here is the full stack trace that we just described: Daemon Thread [Component Resolve Thread (Bundle 251)] (Suspended (breakpoint at line 637 in EcorePlugin)) EcorePlugin.getWorkspaceRoot() line: 637 [local variables unavailable] ExtensibleURIConverterImpl.<clinit>() line: 334 J9VMInternals.initializeImpl(Class) line: not available [native method] J9VMInternals.initialize(Class) line: 200 ResourceSetImpl.getURIConverter() line: 490 ResourceSetImpl.getResource(URI, boolean) line: 360 DatabaseConnectionManager.<init>(URI) line: 29 DatabaseService.activate(ComponentContext) line: 67 NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method] NativeMethodAccessorImpl.invoke(Object, Object[]) line: 48 DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 37 Method.invoke(Object, Object...) line: 600 ServiceComponent.activate(Object, ComponentContext) line: 206 ServiceComponentProp.activate(Bundle, ComponentInstanceImpl) line: 140 ServiceComponentProp.build(Bundle, Object, boolean) line: 330 InstanceProcess.buildComponent(Bundle, ServiceComponentProp, Object, boolean) line: 560 InstanceProcess.buildComponents(Vector, boolean) line: 182 Resolver.getEligible(ServiceEvent) line: 292 SCRManager.serviceChanged(ServiceEvent) line: 226 FilteredServiceListener.serviceChanged(ServiceEvent) line: 104 BundleContextImpl.dispatchEvent(Object, Object, int, Object) line: 933 EventManager.dispatchEvent(Set, EventDispatcher, int, Object) line: 227 ListenerQueue.dispatchEventSynchronous(int, Object) line: 149 ServiceRegistry.publishServiceEventPrivileged(ServiceEvent) line: 756 ServiceRegistry.publishServiceEvent(ServiceEvent) line: 711 ServiceRegistrationImpl.register(Dictionary) line: 129 ServiceRegistry.registerService(BundleContextImpl, String[], Object, Dictionary) line: 206 BundleContextImpl.registerService(String[], Object, Dictionary) line: 507 BundleContextImpl.registerService(String, Object, Dictionary) line: 525 Activator.start(BundleContext) line: 49 BundleContextImpl$1.run() line: 783 AccessController.doPrivileged(PrivilegedExceptionAction<T>) line: 251 BundleContextImpl.startActivator(BundleActivator) line: 774 BundleContextImpl.start() line: 755 BundleHost.startWorker(int) line: 352 BundleHost(AbstractBundle).start(int) line: 280 SecureAction.start(Bundle, int) line: 408 EclipseLazyStarter.postFindLocalClass(String, Class, ClasspathManager) line: 111 ClasspathManager.findLocalClass(String) line: 449 DefaultClassLoader.findLocalClass(String) line: 211 BundleLoader.findLocalClass(String) line: 381 SingleSourcePackage.loadClass(String) line: 33 BundleLoader.findClassInternal(String, boolean, ClassLoader) line: 454 BundleLoader.findClass(String, boolean) line: 410 BundleLoader.findClass(String) line: 398 DefaultClassLoader.loadClass(String, boolean) line: 105 DefaultClassLoader(ClassLoader).loadClass(String) line: 618 ClassLoader.defineClassImpl(String, byte[], int, int, Object) line: not available [native method] DefaultClassLoader(ClassLoader).defineClass(String, byte[], int, int, ProtectionDomain) line: 274 DefaultClassLoader.defineClass(String, byte[], ClasspathEntry, BundleEntry) line: 183 ClasspathManager.defineClass(String, byte[], ClasspathEntry, BundleEntry, ClassLoadingStatsHook[]) line: 576 ClasspathManager.findClassImpl(String, ClasspathEntry, ClassLoadingStatsHook[]) line: 546 ClasspathManager.findLocalClassImpl(String, ClassLoadingStatsHook[]) line: 477 ClasspathManager.findLocalClass_LockClassLoader(String, ClassLoadingStatsHook[]) line: 465 ClasspathManager.findLocalClass(String) line: 445 DefaultClassLoader.findLocalClass(String) line: 211 BundleLoader.findLocalClass(String) line: 381 BundleLoader.findClassInternal(String, boolean, ClassLoader) line: 457 BundleLoader.findClass(String, boolean) line: 410 BundleLoader.findClass(String) line: 398 DefaultClassLoader.loadClass(String, boolean) line: 105 DefaultClassLoader(ClassLoader).loadClass(String) line: 618 CharsetManager.startup(IProgressMonitor) line: 301 Workspace.startup(IProgressMonitor) line: 1965 Workspace.open(IProgressMonitor) line: 1716 ResourcesPlugin.start(BundleContext) line: 379 BundleContextImpl$1.run() line: 783 AccessController.doPrivileged(PrivilegedExceptionAction<T>) line: 251 BundleContextImpl.startActivator(BundleActivator) line: 774 BundleContextImpl.start() line: 755 BundleHost.startWorker(int) line: 352 BundleHost(AbstractBundle).start(int) line: 280 SecureAction.start(Bundle, int) line: 408 EclipseLazyStarter.postFindLocalClass(String, Class, ClasspathManager) line: 111 ClasspathManager.findLocalClass(String) line: 449 DefaultClassLoader.findLocalClass(String) line: 211 BundleLoader.findLocalClass(String) line: 381 SingleSourcePackage.loadClass(String) line: 33 BundleLoader.findClassInternal(String, boolean, ClassLoader) line: 454 BundleLoader.findClass(String, boolean) line: 410 BundleLoader.findClass(String) line: 398 DefaultClassLoader.loadClass(String, boolean) line: 105 DefaultClassLoader(ClassLoader).loadClass(String) line: 618 EcorePlugin$Implementation.start(BundleContext) line: 527 BundleContextImpl$1.run() line: 783 AccessController.doPrivileged(PrivilegedExceptionAction<T>) line: 251 BundleContextImpl.startActivator(BundleActivator) line: 774 BundleContextImpl.start() line: 755 BundleHost.startWorker(int) line: 352 BundleHost(AbstractBundle).start(int) line: 280 SecureAction.start(Bundle, int) line: 408 EclipseLazyStarter.postFindLocalClass(String, Class, ClasspathManager) line: 111 ClasspathManager.findLocalClass(String) line: 449 DefaultClassLoader.findLocalClass(String) line: 211 BundleLoader.findLocalClass(String) line: 381 SingleSourcePackage.loadClass(String) line: 33 BundleLoader.findClassInternal(String, boolean, ClassLoader) line: 454 BundleLoader.findClass(String, boolean) line: 410 BundleLoader.findClass(String) line: 398 DefaultClassLoader.loadClass(String, boolean) line: 105 DefaultClassLoader(ClassLoader).loadClass(String) line: 618 ObjectRelationalMappingServiceMetaTypeProvider.<init>() line: 18 J9VMInternals.newInstanceImpl(Class) line: not available [native method] Class<T>.newInstance() line: 1325 ServiceComponent.createInstance() line: 454 ServiceComponentProp.createInstance() line: 255 ServiceComponentProp.build(Bundle, Object, boolean) line: 316 InstanceProcess.buildComponent(Bundle, ServiceComponentProp, Object, boolean) line: 560 InstanceProcess.buildComponents(Vector, boolean) line: 182 Resolver.buildNewlySatisfied(boolean) line: 393 Resolver.enableComponents(Vector) line: 176 SCRManager.startedBundle(Bundle) line: 634 SCRManager.bundleChanged(BundleEvent) line: 239 BundleContextImpl.dispatchEvent(Object, Object, int, Object) line: 919 EventManager.dispatchEvent(Set, EventDispatcher, int, Object) line: 227 ListenerQueue.dispatchEventSynchronous(int, Object) line: 149 Framework.publishBundleEventPrivileged(BundleEvent) line: 1350 Framework.publishBundleEvent(int, Bundle) line: 1301 BundleHost.startWorker(int) line: 362 BundleHost(AbstractBundle).start(int) line: 280 SecureAction.start(Bundle, int) line: 408 EclipseLazyStarter.postFindLocalClass(String, Class, ClasspathManager) line: 111 ClasspathManager.findLocalClass(String) line: 449 DefaultClassLoader.findLocalClass(String) line: 211 BundleLoader.findLocalClass(String) line: 381 SingleSourcePackage.loadClass(String) line: 33 BundleLoader.findClassInternal(String, boolean, ClassLoader) line: 454 BundleLoader.findClass(String, boolean) line: 410 BundleLoader.findClass(String) line: 398 DefaultClassLoader.loadClass(String, boolean) line: 105 DefaultClassLoader(ClassLoader).loadClass(String) line: 618 Activator.start(BundleContext) line: 39 BundleContextImpl$1.run() line: 783 AccessController.doPrivileged(PrivilegedExceptionAction<T>) line: 251 BundleContextImpl.startActivator(BundleActivator) line: 774 BundleContextImpl.start() line: 755 BundleHost.startWorker(int) line: 352 BundleHost(AbstractBundle).start(int) line: 280 SecureAction.start(Bundle, int) line: 408 EclipseLazyStarter.postFindLocalClass(String, Class, ClasspathManager) line: 111 ClasspathManager.findLocalClass(String) line: 449 DefaultClassLoader.findLocalClass(String) line: 211 BundleLoader.findLocalClass(String) line: 381 BundleLoader.findClassInternal(String, boolean, ClassLoader) line: 457 BundleLoader.findClass(String, boolean) line: 410 BundleLoader.findClass(String) line: 398 DefaultClassLoader.loadClass(String, boolean) line: 105 DefaultClassLoader(ClassLoader).loadClass(String) line: 618 BundleLoader.loadClass(String) line: 326 BundleHost.loadClass(String, boolean) line: 231 BundleHost(AbstractBundle).loadClass(String) line: 1193 ServiceComponent.createInstance() line: 454 ServiceComponentProp.createInstance() line: 255 ServiceComponentProp.build(Bundle, Object, boolean) line: 316 InstanceProcess.buildComponent(Bundle, ServiceComponentProp, Object, boolean) line: 560 InstanceProcess.buildComponents(Vector, boolean) line: 182 Resolver.buildNewlySatisfied(boolean) line: 393 Resolver.enableComponents(Vector) line: 176 SCRManager.performWork(int, Object) line: 791 SCRManager$QueuedJob.dispatch() line: 758 WorkThread.run() line: 90 Executor.run() line: 70
One possible work around is to have an immediate component that depends on nothing but causes the activation of the ecore bundle and then enables your other components only after you ensure the ecore bundle is started.
I implemented Tom's suggested workaround and it seems to be working. Thanks Tom.
Sounds like the workaround is a good general solution, but not one the framework itself can provide.
Ed, is it okay for the Framework to start unrelated services in the middle of starting a service? All on the same thread. That seems like it is asking for trouble. Big, nasty, hard to debug trouble, much like we are seeing. While the work around may work, I think there is a real problem with the framework haphazardly starting things and that shouldn't be swept under the rug.
Given there's no test case that I can use to see a problem it's pretty hard to comment and impossible to verify that any change I might make is actually an improvement. EMF doesn't use declarative services, so I'm not sure what you're alluding to when referring to the framework starting an unrelated service in the middle of starting a service. Given we've never had a problem with this before, I'm not sure what particular aspect is haphazard. Do you have a suggestion? Would moving the initialization logic to getWorkspaceRoot help? As I said, I'm not in a position where I can verify that changes are actually fixes.
Ed, I suspect this is not actually an EMF problem, although I could be mistaken. My suggestion is to not start additional services while starting a service, unless required. The start of the DatabaseService is not related to it being required or used by ObjectRelationalMappingServiceMetaTypeProvider service. The idea of moving the initialization of the Workspace Root has some merit, but I don't think it addresses the larger issue. There are likely other shared resources that could exhibit similar problems. This problem is very difficult to reproduce as it depends on start orders that are more or less random. We've seen it in the past and seen it go away. This time it, for some reason it hung around, but only with one particular launcher on one particular computer. However, just because it is difficult to reproduce with doesn't mean an attempt to fix it should be made, especially after Bryan and I were able to debug it to a point of reasonable understanding.