Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 479435 - Display.setSynchronizer's call to runAsyncMessages can cause the workbench to crash
Summary: Display.setSynchronizer's call to runAsyncMessages can cause the workbench to...
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: SWT (show other bugs)
Version: 4.5   Edit
Hardware: PC All
: P1 critical (vote)
Target Milestone: 4.6 M4   Edit
Assignee: Stefan Xenos CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 479475
  Show dependency tree
 
Reported: 2015-10-09 10:45 EDT by Stefan Xenos CLA
Modified: 2015-11-05 21:38 EST (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Xenos CLA 2015-10-09 10:45:49 EDT
TL;DR: Display.setSynchronizer invokes runAsyncMessages, and this can cause the Eclipse workbench to crash on startup.

This call was first introduced in bug 76182 in order to prevent the runnables in the old synchronizer from being discarded. Unfortunately, running the event loop at this time is very likely to cause the Eclipse workbench to crash on startup if the queue is non-empty (details below).

The whole point of a synchronizer is to let the UI control which runnables get executed and when. However, running them all when a synchronizer is attached means they're being executed at the very moment they're most likely to cause a problem.

There are two alternatives to pumping the event loop here:

1. setSynchronizer can remove the events from the old synchronizer and insert them at the start of the new synchronizer's queue.

2. The caller can invoke getSynchronizer() immediately before calling setSynchronizer and remember the old synchronizer. The caller can then invoke runAsyncMessages whenever it's prepared to deal with them (presumably right before it runs the event loop).

Both approaches would give the caller control over when the event loop is executed and prevent the crash, and both would prevent any events from being lost.

Details of the crash:

1. A plugin starts up before the workbench and registers an asyncExec that calls methods on the workbench. The following line of code, run in an early asyncExec, is sufficient to cause the crash:

PlatformUI.getWorkbench().getAdapter(ICommandService.class)


2. The workbench starts up. During initialization, it is in an invalid state and it registers a UISynchronizer to control what runnables get executed. Unfortunately, the call to setSynchronizer forces the runnable to execute immediately and the call to getAdapter throws an exception.


3. Since the exception is thrown during plugin activation, it prevents the workbench from starting and the UI doesn't open at all.


This normally exhibits itself as a race condition. If the call to asyncExec happens before the call to setSynchronizer, the workbench crashes. If it happens after, it starts up fine. You can force it to happen every time by creating such a plugin and putting a breakpoint just before the call to setSynchronizer in Workbench.java.
Comment 1 Eclipse Genie CLA 2015-10-09 21:41:14 EDT
New Gerrit change created: https://git.eclipse.org/r/57890
Comment 2 Stefan Xenos CLA 2015-10-14 13:05:18 EDT
Ping?
Comment 3 Stefan Xenos CLA 2015-10-15 14:06:50 EDT
Raising importance to "critical" since this makes it impossible for affected users to start Eclipse.
Comment 4 Arun Thondapu CLA 2015-10-20 05:44:17 EDT
Thanks for the gerrit patch Stefan!

I know the bug really manifests as a race condition but I'm trying to understand if there are any real world crash scenarios that you have encountered with the SDK itself and how the problem affects UI exactly. Please also upload any test cases (snippet/plugin) that could be used to validate the patch.
Comment 5 Stefan Xenos CLA 2015-10-20 09:31:48 EDT
> I'm trying to understand if there are any real world crash scenarios 
> that you have encountered with the SDK itself

bug 479475 has some dupes attached that demonstrate a real-world crash that occurred due to an asyncExec running too early.

Also: I discovered this while investigating a series of crashes reported in our internal bug tracker so can confirm that it was definitely occurring "in the wild".

In every case I'm aware of, there was some non-SDK plugin involved in triggering the early activation which scheduled the asyncExec. I would guess that plugins which mess with the Eclipse resource layer (EFS, resource filters, etc.) would be easiest to reproduce the crash with, since they start up very early... but I don't think Eclipse does much of this in the plugins that ship with the SDK.

So these crashes are definitely occurring, but I don't know a way to reproduce it using only the SDK plugins.
Comment 6 Stefan Xenos CLA 2015-10-20 09:45:55 EDT
> Please also upload any test cases (snippet/plugin) that could be 
> used to validate the patch.

I've updated the patch with a unit test.
Comment 7 Stefan Xenos CLA 2015-10-20 09:55:02 EDT
In case your interested, here's the stack trace from a crash-to-desktop reported by an end user.

I've had to remove some proprietary information from the stack, but have included comments with <snip> describing what I removed in each case. What remains still demonstrates the general form for this sort of crash.


!ENTRY org.eclipse.osgi 4 0 2015-10-08 11:44:01.609
!MESSAGE Application error
!STACK 1
org.eclipse.swt.SWTException: Failed to execute runnable (java.lang.NoClassDefFoundError: <snip -- some critical class>)
        at org.eclipse.swt.SWT.error(SWT.java:4491)
        at org.eclipse.swt.SWT.error(SWT.java:4406)
        at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:138)
        at org.eclipse.swt.widgets.Display.setSynchronizer(Display.java:4213)
        at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2873)
        at org.eclipse.ui.internal.Workbench.access$8(Workbench.java:2854)
        at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:651)
        at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:336)
        at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:598)
        at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:150)
        at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:139)
        at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
        at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:134)
        at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:104)
        at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:380)
        at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:235)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:669)
        at org.eclipse.equinox.launcher.Main.basicRun(Main.java:608)
        at org.eclipse.equinox.launcher.Main.run(Main.java:1515)
        at org.eclipse.equinox.launcher.Main.main(Main.java:1488)
Caused by: java.lang.NoClassDefFoundError: <snip -- some critical class>
        at <snip -- a runnable's run method>
        at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35)
        at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:135)
        ... 21 more
Caused by: java.lang.ClassNotFoundException: An error occurred while automatically activating bundle <snip - some bundle which started early> (575).
        at org.eclipse.osgi.internal.hooks.EclipseLazyStarter.postFindLocalClass(EclipseLazyStarter.java:116)
        at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.findLocalClass(ClasspathManager.java:531)
        at org.eclipse.osgi.internal.loader.ModuleClassLoader.findLocalClass(ModuleClassLoader.java:324)
        at org.eclipse.osgi.internal.loader.BundleLoader.findLocalClass(BundleLoader.java:327)
        at org.eclipse.osgi.internal.loader.sources.SingleSourcePackage.loadClass(SingleSourcePackage.java:36)
        at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:398)
        at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:352)
        at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:344)
        at org.eclipse.osgi.internal.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:160)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 24 more
Caused by: org.osgi.framework.BundleException: Exception in <snip>.start() of bundle <snip - a plugin ID>.
        at org.eclipse.osgi.internal.framework.BundleContextImpl.startActivator(BundleContextImpl.java:792)
        at org.eclipse.osgi.internal.framework.BundleContextImpl.start(BundleContextImpl.java:721)
        at org.eclipse.osgi.internal.framework.EquinoxBundle.startWorker0(EquinoxBundle.java:941)
        at org.eclipse.osgi.internal.framework.EquinoxBundle$EquinoxModule.startWorker(EquinoxBundle.java:318)
        at org.eclipse.osgi.container.Module.doStart(Module.java:571)
        at org.eclipse.osgi.container.Module.start(Module.java:439)
        at org.eclipse.osgi.framework.util.SecureAction.start(SecureAction.java:454)
        at org.eclipse.osgi.internal.hooks.EclipseLazyStarter.postFindLocalClass(EclipseLazyStarter.java:107)
        ... 33 more
Caused by: java.lang.NullPointerException
        at org.eclipse.ui.internal.services.WorkbenchServiceRegistry.<init>(WorkbenchServiceRegistry.java:67)
        at org.eclipse.ui.internal.services.WorkbenchServiceRegistry.getRegistry(WorkbenchServiceRegistry.java:61)
        at org.eclipse.ui.internal.services.ServiceLocator.getService(ServiceLocator.java:179)
        at org.eclipse.ui.internal.Workbench.getAdapter(Workbench.java:3580)
        at <snip - four stack frames omitted>
        at <snip - a bundle activator's start methhod>
        at org.eclipse.osgi.internal.framework.BundleContextImpl$3.run(BundleContextImpl.java:771)
        at org.eclipse.osgi.internal.framework.BundleContextImpl$3.run(BundleContextImpl.java:1)
        at java.security.AccessController.doPrivileged(Native Method)
        at org.eclipse.osgi.internal.framework.BundleContextImpl.startActivator(BundleContextImpl.java:764)
        ... 40 mor
Comment 8 Stefan Xenos CLA 2015-10-23 09:11:03 EDT
Arun, I think I've addressed all your questions. Could I trouble you to take another look?
Comment 9 Stefan Xenos CLA 2015-10-26 10:21:10 EDT
I'm quite keen to get this in for M3, and AFAIK today is the last day for it. Could a kind SWT committer please take another look at it?
Comment 10 Arun Thondapu CLA 2015-10-26 16:43:46 EDT
(In reply to Stefan Xenos from comment #8)
> Arun, I think I've addressed all your questions. Could I trouble you to take
> another look?

Stefan, I've reviewed the code and it does look good, but the patch needs to be tested thoroughly on all the three different platforms as it could possibly have an impact on the eclipse SDK. Considering this, I'm not in favour of pushing it for M3 in a hurry as we may not have enough time to fix problems IF any regressions are found during the milestone testing!

I propose to merge the patch as soon as M3 winds up (i.e by end of this week). Let me know if there are any compelling reasons why you think it couldn't wait until M4...
Comment 11 Stefan Xenos CLA 2015-10-26 21:35:35 EDT
> Let me know if there are any compelling reasons why you think it couldn't
> wait until M4...

Your plan sounds fine. Thank you kindly!
Comment 12 Stefan Xenos CLA 2015-11-04 16:57:42 EST
Ping
Comment 14 Stefan Xenos CLA 2015-11-05 14:45:39 EST
You have to specify a comment when changing the Status of a bug from NEW to RESOLVED.
Comment 15 Stefan Xenos CLA 2015-11-05 21:38:26 EST
Thanks for the review, Arun.