Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 494611 - Slow startup due to thousands of CNFE thrown for Sirius classes
Summary: Slow startup due to thousands of CNFE thrown for Sirius classes
Status: CLOSED FIXED
Alias: None
Product: Sirius
Classification: Modeling
Component: Core (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 critical (vote)
Target Milestone: 4.0.0   Edit
Assignee: Pierre-Charles David CLA
QA Contact:
URL:
Whiteboard:
Keywords: triaged
Depends on:
Blocks:
 
Reported: 2016-05-26 04:27 EDT by Laurent Goubet CLA
Modified: 2017-06-05 04:51 EDT (History)
3 users (show)

See Also:


Attachments
Yourkit 2013 trace (9.44 MB, application/octet-stream)
2016-05-26 04:27 EDT, Laurent Goubet CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Laurent Goubet CLA 2016-05-26 04:27:54 EDT
Created attachment 262018 [details]
Yourkit 2013 trace

The startup time for Sirius can be extremely long in some environments, and it freezes the user's eclipse for the duration.

In an Eclipse Luna with 883 plugins, starting up Sirius ("doing something that will make use of a class from Sirius" which in my case is launching a comparison on a plain ecore file, but with code that tests against ViewpointPackage.eINSTANCE) takes 30 seconds.

Attached is the yourkit 2013 snapshot of the freeze duration. What's interesting in there is that 28000 ClassNotFoundExceptions have been thrown during that time frame.

The problem seems to stem from Sirius' internationalization, as the main culprit for all these exceptions is EMFPlugin.doGetString(...) that's called to initialize the ResourceBundles. Modifying the code of that method to first try and initialize the bundles from "plugin.properties" instead of <activator path>/.plugin seems to solve the issue as the action that was taking 82 seconds and froze my eclipse no longer takes any noticeable time after the change.

Given the yourkit trace, it would seem like the freezing time will increase with the number of bundles available in the platform, since it seems like a "<bundle class loader>.loadClass(...)" that cannot work will be done for every single bundle every time a message class has to be initialized.
Comment 1 Eclipse Genie CLA 2016-05-26 04:32:56 EDT
New Gerrit change created: https://git.eclipse.org/r/73693
Comment 2 Laurent Goubet CLA 2016-05-26 04:33:37 EDT
Quite ugly, but the associated gerrit shows the kind of change it took to get back to acceptable startup times on my environment.
Comment 3 Laurent Goubet CLA 2016-05-26 08:05:56 EDT
The root cause of this issue is that we're initializing the Messages class _before_ the "plugin" singleton instance of our activators is instantiated.

Two things will end up causing that issue :

- SiriusPlugin.INTER_REGISTRY : inlined instantation on a static field. It will try and instantiate an InterpreterRegistry, but InterpreterRegistry.ERROR_MSG_IMPOSSIBLE_TO_FIND_AN_INTERPRETER is also a static field with an inlined instantiation that will trigger the messages class initialization. At that point, we are _currently constructing_ the SiriusPlugin instance, so it is still not started, and SiriusPlugin.plugin is still null.
- DslCommonPlugin.PROFILER : same issue as above.

It is a problem for the plugin not to be instantiated yet because the Messages initialization will go through org.eclipse.emf.common.util.DelegatingResourceLocator.getString(String, boolean) , which makes use of the ResourceLocator (the plugin)... or has to resort to finding and parsing the properties file itself.

SiriusPlugin.INTER_REGISTRY is easy to move down to SiriusPlugin.Implementation and make "non static" : it can be instantiated from Implementation.start() and is only used from the non-static method SiriusPlugin.Implementation.getInterpreterRegistry()

DslCommonPlugin.PROFILER requires API breakages to be isolated from the plugin construction.
Comment 4 Pierre-Charles David CLA 2016-05-27 05:26:42 EDT
After many attempts, I can reproduce part of the problem: indeed, I get ~28000 ClassNotFoundException on startup on Sirius-related classes. What's strange is that on my machine the slowdown this causes is very minor. Probably on the order of 3s, small enough not to notice it among the 14s of total time for Eclipse startup. I'm under Linux, while Laurent is under Windows 10 and has an anti-virus installed (could it cause slowdowns when accessing many JARs?).

To reproduce I put a conditional breakpoing in the last line of org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(String, boolean):

  throw new ClassNotFoundException(name + " cannot be found by " + this); //$NON-NLS-1$

This is line 444 in org.eclipse.osgi 3.11.0.v20160504-1419.

With the following breakpoint condition:

  System.out.println("CNFE: " + name);
  return false;

Starting my runtime (with 995 plugins, including Sirius, EMF Compare and Papyrus), I get:

   4761 CNFE: org.eclipse.sirius.viewpoint.plugin_fr_FR
   4761 CNFE: org.eclipse.sirius.viewpoint.plugin_fr
   4761 CNFE: org.eclipse.sirius.viewpoint.plugin
   4761 CNFE: org.eclipse.sirius.common.tools.plugin_fr_FR
   4761 CNFE: org.eclipse.sirius.common.tools.plugin_fr
   4761 CNFE: org.eclipse.sirius.common.tools.plugin

Which totals 28566 Sirius-related CNFEs.
Comment 5 Eclipse Genie CLA 2016-05-27 07:20:53 EDT
New Gerrit change created: https://git.eclipse.org/r/73798
Comment 6 Pierre-Charles David CLA 2016-05-27 07:44:41 EDT
(In reply to Eclipse Genie from comment #5)
> New Gerrit change created: https://git.eclipse.org/r/73798

I reproduce the message from the commite above for convenience, as I put a detailed analysis in it:

Avoid static access to Messages classes (which require the bundle to be
activated to initialized properly) from the activator classes
themselves.

This caused a situation where to initialize some Sirius class (for
example one registers on a platform extension point):

1. OSGi instanciates the corresponding Sirius activator class (for
   example DslCommonPlugin$Implementation). To do this it needs to
   initialize its static fields/execute its static blocks.

2. If such a static initialization accesses the bundle's Messages class,
   Messages itself needs to be loaded and initialized first.

3. The (static) initialization of the Messages class ends-up calling
   EMF's DelegatingResourceLocator.getString(). getString() calls
   getPrimaryResourceLocator(), which calls getPluginResourceLocator(),
   which returns the value of the "plugin" static field. It is null at
   this point because we are still in the process of creating the
   instance of DslCommonPlugin$Implementation (step 1 above) that will
   be its value later.

4. Seeing no ResourceLocator, EMF's DelegatingResourceLocator assumes
   we're in a non-OSGi context and falls back to a costly strategy of
   invoking Java's ResourceResourceBundle.getBundle(packageName +
   ".plugin"), which will ask all the accessible ClasLoaders for the
   named resource.

5. Of course they will all fail, throwing a MissingResourceException.
   DelegatingResourceLocator tries a last-chance method of looking for a
   "plugin.properties" file at the base URL of the plug-in. As it
   happens, this matches the location of our properties, so in the end
   we get the correct value. Note that here EMF hard-codes the name and
   path of the properties file. If we had put them somewhere else (and
   mentioned it in the MANIFEST.MF's Bundle-Localization header), this
   heuristic would not find them (but OSGi/Equinox would).

For now two cases have been identified, as mentioned in
http://eclip.se/494611#c3, both treated in this patch:

- SiriusPlugin.INTER_REGISTRY: a private static final field, which is
  not non-static and non-final, initialized only in the activator's
  start() method.
- DslCommonPlugin.PROFILER: creates an instant of TimeProfiler2 which
  ends-up accessing Messages to get some of its strings. This one is
  public, so instead of breaking the API this late, we simply inline the
  string's default values for the few messages concerned. This means
  they are no longer translatable, but has very little impact in
  practice for name of columns or error messages related to the Sirius
  profiler.

What's left to do:
- Identify other possible cases in other activators, and treat them if
  possible.
- Understand why on most systems the thousands of CNFEs thrown on Sirius
  startup do not cause a significant slowdown, but on at least one (the
  reporter's), the slowdown is catatrophic.
Comment 7 Pierre-Charles David CLA 2016-05-27 09:19:28 EDT
After reviewing all our activators, most are trivial and safe:
- org.eclipse.sirius.common.acceleo.aql.business.AQLSiriusPlugin.Implementation
- org.eclipse.sirius.common.acceleo.aql.ide.AqlIdeSupportPlugin.Implementation
- org.eclipse.sirius.common.acceleo.interpreter.InterpreterViewPlugin.Implementation
- org.eclipse.sirius.common.acceleo.interpreter.InterpreterViewPlugin.Implementation
- org.eclipse.sirius.common.acceleo.mtl.ide.AcceleoMTLInterpreterIDEPlugin.Implementation
- org.eclipse.sirius.common.ocl.DslOclPlugin.Implementation
- org.eclipse.sirius.common.ui.ext.Activator
- org.eclipse.sirius.common.xtext.internal.XtextIntegrationPlugin.Implementation
- org.eclipse.sirius.diagram.DiagramPlugin.Implementation
- org.eclipse.sirius.diagram.layoutdata.LayoutDataPlugin.Implementation
- org.eclipse.sirius.diagram.sequence.SequenceDiagramPlugin.Implementation
- org.eclipse.sirius.diagram.sequence.description.provider.SequenceEditPlugin.Implementation
- org.eclipse.sirius.diagram.sequence.ui.SequenceDiagramUIPlugin.Implementation
- org.eclipse.sirius.diagram.ui.ext.Activator
- org.eclipse.sirius.ecore.extender.business.internal.ExtenderPlugin.Implementation
- org.eclipse.sirius.editor.tree.Activator
- org.eclipse.sirius.eef.adapters.EEFAdapterPlugin.Implementation
- org.eclipse.sirius.ext.base.BasePlugin.Implementation
- org.eclipse.sirius.ext.draw2d.Draw2DExtPlugin.Implementation
- org.eclipse.sirius.ext.e3.ui.Eclipse3UIExtPlugin.Implementation
- org.eclipse.sirius.ext.emf.EMFExtPlugin.Implementation
- org.eclipse.sirius.synchronizer.internal.SynchronizerPlugin.Implementation
- org.eclipse.sirius.table.tools.internal.TablePlugin.Implementation
- org.eclipse.sirius.table.ui.ext.TableUIExtPlugin.Implementation
- org.eclipse.sirius.tree.tools.internal.TreePlugin.Implementation
- org.eclipse.sirius.tree.ui.ext.TreeUIExtPlugin.Implementation
- org.eclipse.sirius.tree.ui.provider.TreeUIPlugin.Implementation
- org.eclipse.sirius.ui.ext.SiriusUIExtPlugin.Implementation

These contain some real code, but seem safe too:
- org.eclipse.sirius.common.ui.SiriusTransPlugin.Implementation
- org.eclipse.sirius.editor.editorPlugin.SiriusEditorPlugin.Implementation
- org.eclipse.sirius.table.metamodel.table.provider.TableUIPlugin.Implementation
- org.eclipse.sirius.viewpoint.provider.SiriusEditPlugin.Implementation

This one needs more review to be completely sure:
- org.eclipse.sirius.diagram.ui.provider.DiagramUIPlugin.Implementation

For this one I have a small patch (moving a field initialization inside the constructor, after the "plugin" field has been set):
- org.eclipse.sirius.common.acceleo.mtl.AcceleoMTLInterpreterPlugin.Implementation

Which leaves the two already identified as dangerous, but for which we have an initial patch:
- org.eclipse.sirius.common.tools.DslCommonPlugin.Implementation
- org.eclipse.sirius.viewpoint.SiriusPlugin.Implementation
Comment 8 Pierre-Charles David CLA 2016-05-27 10:25:46 EDT
Steps to reproduce:
1. Put a breakpoint in org.eclipse.emf.common.util.DelegatingResourceLocator.doGetString(String, boolean), on the line which does:

InputStream inputStream =  new URL(getBaseURL().toString() + "plugin.properties").openStream();

In EMF 2.12 (Neon version), it is line 425. It corresponds to the fallback case when the ResourceBundle.getBundle() above failed with a MissingResourceException (typically after trying -- and failing -- to load the resource with all the classloaders available.

2. Launch a runtime in debug, and perform some actions which require loading Sirius (New Modeling Project, open a session/diagram, etc.).

If the breakpoint is hit at any point (with Sirius classes in the context), the bug is still present.
Comment 10 Pierre-Charles David CLA 2016-05-30 04:03:10 EDT
Fixed by 9f8ca427badbaf2892a3e0e51a64deb0fe44828b.
Comment 11 Pierre-Charles David CLA 2016-06-24 07:59:09 EDT
Available in Sirius 4.0.0.
Comment 12 Charlie Mordant CLA 2017-06-05 04:51:33 EDT
FYI, I found that thrown CNFE are thrown and slow down Property view openiong when you generate multiple ecores in a same plugin: the reason is that some of the XXXEditPlugin aren't considered as started (as they're not referenced as activators in the Manifest) so they've to be restarted and reinstantiated each time the framework asks for an attribute.

In this case, the Programmer should implement a common Activator and reference it in the constructor of every Inherited/reusing EditPlugin class constructor