| Summary: | Slow startup due to thousands of CNFE thrown for Sirius classes | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Modeling] Sirius | Reporter: | Laurent Goubet <laurent.goubet> | ||||
| Component: | Core | Assignee: | Pierre-Charles David <pierre-charles.david> | ||||
| Status: | CLOSED FIXED | QA Contact: | |||||
| Severity: | critical | ||||||
| Priority: | P3 | CC: | cmordant1, julien.dupont, pierre-charles.david | ||||
| Version: | unspecified | Keywords: | triaged | ||||
| Target Milestone: | 4.0.0 | ||||||
| Hardware: | All | ||||||
| OS: | All | ||||||
| See Also: |
https://git.eclipse.org/r/73693 https://git.eclipse.org/r/73798 https://git.eclipse.org/c/sirius/org.eclipse.sirius.git/commit/?id=9f8ca427badbaf2892a3e0e51a64deb0fe44828b |
||||||
| Whiteboard: | |||||||
| Attachments: |
|
||||||
New Gerrit change created: https://git.eclipse.org/r/73693 Quite ugly, but the associated gerrit shows the kind of change it took to get back to acceptable startup times on my environment. The root cause of this issue is that we're initializing the Messages class _before_ the "plugin" singleton instance of our activators is instantiated. Two things will end up causing that issue : - SiriusPlugin.INTER_REGISTRY : inlined instantation on a static field. It will try and instantiate an InterpreterRegistry, but InterpreterRegistry.ERROR_MSG_IMPOSSIBLE_TO_FIND_AN_INTERPRETER is also a static field with an inlined instantiation that will trigger the messages class initialization. At that point, we are _currently constructing_ the SiriusPlugin instance, so it is still not started, and SiriusPlugin.plugin is still null. - DslCommonPlugin.PROFILER : same issue as above. It is a problem for the plugin not to be instantiated yet because the Messages initialization will go through org.eclipse.emf.common.util.DelegatingResourceLocator.getString(String, boolean) , which makes use of the ResourceLocator (the plugin)... or has to resort to finding and parsing the properties file itself. SiriusPlugin.INTER_REGISTRY is easy to move down to SiriusPlugin.Implementation and make "non static" : it can be instantiated from Implementation.start() and is only used from the non-static method SiriusPlugin.Implementation.getInterpreterRegistry() DslCommonPlugin.PROFILER requires API breakages to be isolated from the plugin construction. After many attempts, I can reproduce part of the problem: indeed, I get ~28000 ClassNotFoundException on startup on Sirius-related classes. What's strange is that on my machine the slowdown this causes is very minor. Probably on the order of 3s, small enough not to notice it among the 14s of total time for Eclipse startup. I'm under Linux, while Laurent is under Windows 10 and has an anti-virus installed (could it cause slowdowns when accessing many JARs?).
To reproduce I put a conditional breakpoing in the last line of org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(String, boolean):
throw new ClassNotFoundException(name + " cannot be found by " + this); //$NON-NLS-1$
This is line 444 in org.eclipse.osgi 3.11.0.v20160504-1419.
With the following breakpoint condition:
System.out.println("CNFE: " + name);
return false;
Starting my runtime (with 995 plugins, including Sirius, EMF Compare and Papyrus), I get:
4761 CNFE: org.eclipse.sirius.viewpoint.plugin_fr_FR
4761 CNFE: org.eclipse.sirius.viewpoint.plugin_fr
4761 CNFE: org.eclipse.sirius.viewpoint.plugin
4761 CNFE: org.eclipse.sirius.common.tools.plugin_fr_FR
4761 CNFE: org.eclipse.sirius.common.tools.plugin_fr
4761 CNFE: org.eclipse.sirius.common.tools.plugin
Which totals 28566 Sirius-related CNFEs.
New Gerrit change created: https://git.eclipse.org/r/73798 (In reply to Eclipse Genie from comment #5) > New Gerrit change created: https://git.eclipse.org/r/73798 I reproduce the message from the commite above for convenience, as I put a detailed analysis in it: Avoid static access to Messages classes (which require the bundle to be activated to initialized properly) from the activator classes themselves. This caused a situation where to initialize some Sirius class (for example one registers on a platform extension point): 1. OSGi instanciates the corresponding Sirius activator class (for example DslCommonPlugin$Implementation). To do this it needs to initialize its static fields/execute its static blocks. 2. If such a static initialization accesses the bundle's Messages class, Messages itself needs to be loaded and initialized first. 3. The (static) initialization of the Messages class ends-up calling EMF's DelegatingResourceLocator.getString(). getString() calls getPrimaryResourceLocator(), which calls getPluginResourceLocator(), which returns the value of the "plugin" static field. It is null at this point because we are still in the process of creating the instance of DslCommonPlugin$Implementation (step 1 above) that will be its value later. 4. Seeing no ResourceLocator, EMF's DelegatingResourceLocator assumes we're in a non-OSGi context and falls back to a costly strategy of invoking Java's ResourceResourceBundle.getBundle(packageName + ".plugin"), which will ask all the accessible ClasLoaders for the named resource. 5. Of course they will all fail, throwing a MissingResourceException. DelegatingResourceLocator tries a last-chance method of looking for a "plugin.properties" file at the base URL of the plug-in. As it happens, this matches the location of our properties, so in the end we get the correct value. Note that here EMF hard-codes the name and path of the properties file. If we had put them somewhere else (and mentioned it in the MANIFEST.MF's Bundle-Localization header), this heuristic would not find them (but OSGi/Equinox would). For now two cases have been identified, as mentioned in http://eclip.se/494611#c3, both treated in this patch: - SiriusPlugin.INTER_REGISTRY: a private static final field, which is not non-static and non-final, initialized only in the activator's start() method. - DslCommonPlugin.PROFILER: creates an instant of TimeProfiler2 which ends-up accessing Messages to get some of its strings. This one is public, so instead of breaking the API this late, we simply inline the string's default values for the few messages concerned. This means they are no longer translatable, but has very little impact in practice for name of columns or error messages related to the Sirius profiler. What's left to do: - Identify other possible cases in other activators, and treat them if possible. - Understand why on most systems the thousands of CNFEs thrown on Sirius startup do not cause a significant slowdown, but on at least one (the reporter's), the slowdown is catatrophic. After reviewing all our activators, most are trivial and safe: - org.eclipse.sirius.common.acceleo.aql.business.AQLSiriusPlugin.Implementation - org.eclipse.sirius.common.acceleo.aql.ide.AqlIdeSupportPlugin.Implementation - org.eclipse.sirius.common.acceleo.interpreter.InterpreterViewPlugin.Implementation - org.eclipse.sirius.common.acceleo.interpreter.InterpreterViewPlugin.Implementation - org.eclipse.sirius.common.acceleo.mtl.ide.AcceleoMTLInterpreterIDEPlugin.Implementation - org.eclipse.sirius.common.ocl.DslOclPlugin.Implementation - org.eclipse.sirius.common.ui.ext.Activator - org.eclipse.sirius.common.xtext.internal.XtextIntegrationPlugin.Implementation - org.eclipse.sirius.diagram.DiagramPlugin.Implementation - org.eclipse.sirius.diagram.layoutdata.LayoutDataPlugin.Implementation - org.eclipse.sirius.diagram.sequence.SequenceDiagramPlugin.Implementation - org.eclipse.sirius.diagram.sequence.description.provider.SequenceEditPlugin.Implementation - org.eclipse.sirius.diagram.sequence.ui.SequenceDiagramUIPlugin.Implementation - org.eclipse.sirius.diagram.ui.ext.Activator - org.eclipse.sirius.ecore.extender.business.internal.ExtenderPlugin.Implementation - org.eclipse.sirius.editor.tree.Activator - org.eclipse.sirius.eef.adapters.EEFAdapterPlugin.Implementation - org.eclipse.sirius.ext.base.BasePlugin.Implementation - org.eclipse.sirius.ext.draw2d.Draw2DExtPlugin.Implementation - org.eclipse.sirius.ext.e3.ui.Eclipse3UIExtPlugin.Implementation - org.eclipse.sirius.ext.emf.EMFExtPlugin.Implementation - org.eclipse.sirius.synchronizer.internal.SynchronizerPlugin.Implementation - org.eclipse.sirius.table.tools.internal.TablePlugin.Implementation - org.eclipse.sirius.table.ui.ext.TableUIExtPlugin.Implementation - org.eclipse.sirius.tree.tools.internal.TreePlugin.Implementation - org.eclipse.sirius.tree.ui.ext.TreeUIExtPlugin.Implementation - org.eclipse.sirius.tree.ui.provider.TreeUIPlugin.Implementation - org.eclipse.sirius.ui.ext.SiriusUIExtPlugin.Implementation These contain some real code, but seem safe too: - org.eclipse.sirius.common.ui.SiriusTransPlugin.Implementation - org.eclipse.sirius.editor.editorPlugin.SiriusEditorPlugin.Implementation - org.eclipse.sirius.table.metamodel.table.provider.TableUIPlugin.Implementation - org.eclipse.sirius.viewpoint.provider.SiriusEditPlugin.Implementation This one needs more review to be completely sure: - org.eclipse.sirius.diagram.ui.provider.DiagramUIPlugin.Implementation For this one I have a small patch (moving a field initialization inside the constructor, after the "plugin" field has been set): - org.eclipse.sirius.common.acceleo.mtl.AcceleoMTLInterpreterPlugin.Implementation Which leaves the two already identified as dangerous, but for which we have an initial patch: - org.eclipse.sirius.common.tools.DslCommonPlugin.Implementation - org.eclipse.sirius.viewpoint.SiriusPlugin.Implementation Steps to reproduce: 1. Put a breakpoint in org.eclipse.emf.common.util.DelegatingResourceLocator.doGetString(String, boolean), on the line which does: InputStream inputStream = new URL(getBaseURL().toString() + "plugin.properties").openStream(); In EMF 2.12 (Neon version), it is line 425. It corresponds to the fallback case when the ResourceBundle.getBundle() above failed with a MissingResourceException (typically after trying -- and failing -- to load the resource with all the classloaders available. 2. Launch a runtime in debug, and perform some actions which require loading Sirius (New Modeling Project, open a session/diagram, etc.). If the breakpoint is hit at any point (with Sirius classes in the context), the bug is still present. Gerrit change https://git.eclipse.org/r/73798 was merged to [master]. Commit: http://git.eclipse.org/c/sirius/org.eclipse.sirius.git/commit/?id=9f8ca427badbaf2892a3e0e51a64deb0fe44828b Fixed by 9f8ca427badbaf2892a3e0e51a64deb0fe44828b. Available in Sirius 4.0.0. FYI, I found that thrown CNFE are thrown and slow down Property view openiong when you generate multiple ecores in a same plugin: the reason is that some of the XXXEditPlugin aren't considered as started (as they're not referenced as activators in the Manifest) so they've to be restarted and reinstantiated each time the framework asks for an attribute. In this case, the Programmer should implement a common Activator and reference it in the constructor of every Inherited/reusing EditPlugin class constructor |
Created attachment 262018 [details] Yourkit 2013 trace The startup time for Sirius can be extremely long in some environments, and it freezes the user's eclipse for the duration. In an Eclipse Luna with 883 plugins, starting up Sirius ("doing something that will make use of a class from Sirius" which in my case is launching a comparison on a plain ecore file, but with code that tests against ViewpointPackage.eINSTANCE) takes 30 seconds. Attached is the yourkit 2013 snapshot of the freeze duration. What's interesting in there is that 28000 ClassNotFoundExceptions have been thrown during that time frame. The problem seems to stem from Sirius' internationalization, as the main culprit for all these exceptions is EMFPlugin.doGetString(...) that's called to initialize the ResourceBundles. Modifying the code of that method to first try and initialize the bundles from "plugin.properties" instead of <activator path>/.plugin seems to solve the issue as the action that was taking 82 seconds and froze my eclipse no longer takes any noticeable time after the change. Given the yourkit trace, it would seem like the freezing time will increase with the number of bundles available in the platform, since it seems like a "<bundle class loader>.loadClass(...)" that cannot work will be done for every single bundle every time a message class has to be initialized.