Community
Participate
Working Groups
TL;DR: Display.setSynchronizer invokes runAsyncMessages, and this can cause the Eclipse workbench to crash on startup. This call was first introduced in bug 76182 in order to prevent the runnables in the old synchronizer from being discarded. Unfortunately, running the event loop at this time is very likely to cause the Eclipse workbench to crash on startup if the queue is non-empty (details below). The whole point of a synchronizer is to let the UI control which runnables get executed and when. However, running them all when a synchronizer is attached means they're being executed at the very moment they're most likely to cause a problem. There are two alternatives to pumping the event loop here: 1. setSynchronizer can remove the events from the old synchronizer and insert them at the start of the new synchronizer's queue. 2. The caller can invoke getSynchronizer() immediately before calling setSynchronizer and remember the old synchronizer. The caller can then invoke runAsyncMessages whenever it's prepared to deal with them (presumably right before it runs the event loop). Both approaches would give the caller control over when the event loop is executed and prevent the crash, and both would prevent any events from being lost. Details of the crash: 1. A plugin starts up before the workbench and registers an asyncExec that calls methods on the workbench. The following line of code, run in an early asyncExec, is sufficient to cause the crash: PlatformUI.getWorkbench().getAdapter(ICommandService.class) 2. The workbench starts up. During initialization, it is in an invalid state and it registers a UISynchronizer to control what runnables get executed. Unfortunately, the call to setSynchronizer forces the runnable to execute immediately and the call to getAdapter throws an exception. 3. Since the exception is thrown during plugin activation, it prevents the workbench from starting and the UI doesn't open at all. This normally exhibits itself as a race condition. If the call to asyncExec happens before the call to setSynchronizer, the workbench crashes. If it happens after, it starts up fine. You can force it to happen every time by creating such a plugin and putting a breakpoint just before the call to setSynchronizer in Workbench.java.
New Gerrit change created: https://git.eclipse.org/r/57890
Ping?
Raising importance to "critical" since this makes it impossible for affected users to start Eclipse.
Thanks for the gerrit patch Stefan! I know the bug really manifests as a race condition but I'm trying to understand if there are any real world crash scenarios that you have encountered with the SDK itself and how the problem affects UI exactly. Please also upload any test cases (snippet/plugin) that could be used to validate the patch.
> I'm trying to understand if there are any real world crash scenarios > that you have encountered with the SDK itself bug 479475 has some dupes attached that demonstrate a real-world crash that occurred due to an asyncExec running too early. Also: I discovered this while investigating a series of crashes reported in our internal bug tracker so can confirm that it was definitely occurring "in the wild". In every case I'm aware of, there was some non-SDK plugin involved in triggering the early activation which scheduled the asyncExec. I would guess that plugins which mess with the Eclipse resource layer (EFS, resource filters, etc.) would be easiest to reproduce the crash with, since they start up very early... but I don't think Eclipse does much of this in the plugins that ship with the SDK. So these crashes are definitely occurring, but I don't know a way to reproduce it using only the SDK plugins.
> Please also upload any test cases (snippet/plugin) that could be > used to validate the patch. I've updated the patch with a unit test.
In case your interested, here's the stack trace from a crash-to-desktop reported by an end user. I've had to remove some proprietary information from the stack, but have included comments with <snip> describing what I removed in each case. What remains still demonstrates the general form for this sort of crash. !ENTRY org.eclipse.osgi 4 0 2015-10-08 11:44:01.609 !MESSAGE Application error !STACK 1 org.eclipse.swt.SWTException: Failed to execute runnable (java.lang.NoClassDefFoundError: <snip -- some critical class>) at org.eclipse.swt.SWT.error(SWT.java:4491) at org.eclipse.swt.SWT.error(SWT.java:4406) at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:138) at org.eclipse.swt.widgets.Display.setSynchronizer(Display.java:4213) at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2873) at org.eclipse.ui.internal.Workbench.access$8(Workbench.java:2854) at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:651) at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:336) at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:598) at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:150) at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:139) at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:134) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:104) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:380) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:235) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:669) at org.eclipse.equinox.launcher.Main.basicRun(Main.java:608) at org.eclipse.equinox.launcher.Main.run(Main.java:1515) at org.eclipse.equinox.launcher.Main.main(Main.java:1488) Caused by: java.lang.NoClassDefFoundError: <snip -- some critical class> at <snip -- a runnable's run method> at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35) at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:135) ... 21 more Caused by: java.lang.ClassNotFoundException: An error occurred while automatically activating bundle <snip - some bundle which started early> (575). at org.eclipse.osgi.internal.hooks.EclipseLazyStarter.postFindLocalClass(EclipseLazyStarter.java:116) at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.findLocalClass(ClasspathManager.java:531) at org.eclipse.osgi.internal.loader.ModuleClassLoader.findLocalClass(ModuleClassLoader.java:324) at org.eclipse.osgi.internal.loader.BundleLoader.findLocalClass(BundleLoader.java:327) at org.eclipse.osgi.internal.loader.sources.SingleSourcePackage.loadClass(SingleSourcePackage.java:36) at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:398) at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:352) at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:344) at org.eclipse.osgi.internal.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:160) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 24 more Caused by: org.osgi.framework.BundleException: Exception in <snip>.start() of bundle <snip - a plugin ID>. at org.eclipse.osgi.internal.framework.BundleContextImpl.startActivator(BundleContextImpl.java:792) at org.eclipse.osgi.internal.framework.BundleContextImpl.start(BundleContextImpl.java:721) at org.eclipse.osgi.internal.framework.EquinoxBundle.startWorker0(EquinoxBundle.java:941) at org.eclipse.osgi.internal.framework.EquinoxBundle$EquinoxModule.startWorker(EquinoxBundle.java:318) at org.eclipse.osgi.container.Module.doStart(Module.java:571) at org.eclipse.osgi.container.Module.start(Module.java:439) at org.eclipse.osgi.framework.util.SecureAction.start(SecureAction.java:454) at org.eclipse.osgi.internal.hooks.EclipseLazyStarter.postFindLocalClass(EclipseLazyStarter.java:107) ... 33 more Caused by: java.lang.NullPointerException at org.eclipse.ui.internal.services.WorkbenchServiceRegistry.<init>(WorkbenchServiceRegistry.java:67) at org.eclipse.ui.internal.services.WorkbenchServiceRegistry.getRegistry(WorkbenchServiceRegistry.java:61) at org.eclipse.ui.internal.services.ServiceLocator.getService(ServiceLocator.java:179) at org.eclipse.ui.internal.Workbench.getAdapter(Workbench.java:3580) at <snip - four stack frames omitted> at <snip - a bundle activator's start methhod> at org.eclipse.osgi.internal.framework.BundleContextImpl$3.run(BundleContextImpl.java:771) at org.eclipse.osgi.internal.framework.BundleContextImpl$3.run(BundleContextImpl.java:1) at java.security.AccessController.doPrivileged(Native Method) at org.eclipse.osgi.internal.framework.BundleContextImpl.startActivator(BundleContextImpl.java:764) ... 40 mor
Arun, I think I've addressed all your questions. Could I trouble you to take another look?
I'm quite keen to get this in for M3, and AFAIK today is the last day for it. Could a kind SWT committer please take another look at it?
(In reply to Stefan Xenos from comment #8) > Arun, I think I've addressed all your questions. Could I trouble you to take > another look? Stefan, I've reviewed the code and it does look good, but the patch needs to be tested thoroughly on all the three different platforms as it could possibly have an impact on the eclipse SDK. Considering this, I'm not in favour of pushing it for M3 in a hurry as we may not have enough time to fix problems IF any regressions are found during the milestone testing! I propose to merge the patch as soon as M3 winds up (i.e by end of this week). Let me know if there are any compelling reasons why you think it couldn't wait until M4...
> Let me know if there are any compelling reasons why you think it couldn't > wait until M4... Your plan sounds fine. Thank you kindly!
Ping
Gerrit change https://git.eclipse.org/r/57890 was merged to [master]. Commit: http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=d3e16b485f56a8dad16e2220a3b00d130cb71d2f
You have to specify a comment when changing the Status of a bug from NEW to RESOLVED.
Thanks for the review, Arun.