Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 332549 - Slow adding of big file to project
Summary: Slow adding of big file to project
Status: CLOSED FIXED
Alias: None
Product: TMF
Classification: Modeling
Component: Xtext (show other bugs)
Version: 2.0.0   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: M7   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-14 12:12 EST by Mark Christiaens CLA
Modified: 2017-09-19 17:27 EDT (History)
2 users (show)

See Also:
sven.efftinge: indigo+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Christiaens CLA 2010-12-14 12:12:54 EST
Build Identifier: I20101028-1441

I'm working with today's nightly build.  I'm adding a >2 MiB VHDL file to an existing project by drag and dropping it on the project and asking Eclipse to link it in.  A lot of time seems to be spent in unexpected locations.  The profiler mentions that I've spent 52 seconds or so opening the file.  Among others I see that the PolymorphicDispatcher is throwing a lot of NoSuchMethodExceptions.  The top of the Java stack looks like this:

Name	Time (ms)
java.lang.Throwable.fillInStackTrace()	13738
java.lang.Throwable.<init>()	
java.lang.Exception.<init>()	
java.lang.NoSuchMethodException.<init>()	
org.eclipse.xtext.util.PolymorphicDispatcher$NoSuchMethodException.<init>(Predicate, Object[])	
org.eclipse.xtext.util.PolymorphicDispatcher.handleNoSuchMethod(Object[])	
org.eclipse.xtext.util.PolymorphicDispatcher.invoke(Object[])	
org.eclipse.xtext.scoping.impl.AbstractDeclarativeScopeProvider.polymorphicFindScopeForReferenceName(EObject, EReference)	8333
org.eclipse.xtext.scoping.impl.AbstractDeclarativeScopeProvider.getScope(EObject, EReference)	
com.sigasi.xtext.scoping.VHDLLangScopeProvider.getScope(EObject, EReference)	
...

Is this to be expected?  I would have assumed that the AbstractDeclarativeScopeProvider only invokes existing member function by reflective inspection?

Another thing that's worrisome is that a lot of time (20s) is spent in sun.misc.Unsafe.compareAndSwapInt.  It seems that org.eclipse.xtext.ui.containers.AbstractAllContainersState.getContainedURIs(String), org.eclipse.xtext.util.SimpleCache.get(Object), org.eclipse.xtext.util.OnChangeEvictingCache$CacheAdapter.set(Object, Object) ... are locking excessively.  It sounds like the locking strategy needs to be made more coarse-grained or more decoupled.

Reproducible: Always
Comment 1 Sebastian Zarnekow CLA 2010-12-14 17:40:01 EST
The implementation of the simple cache should be replaced by a MapMaker.weakKeys.computingMap with a reasonable concurrency level.

The NoSuchMethodException is internal to the PolymorphicDispatch as it is used by the ScopeProvider. However, we should avoid population of the stacktrace or add a check prior to invoking a method that does not exist just to "learn" that it does not exist...

Btw: can you say something about memory footprint and actual parsing time for the large library file (compared to Xtext 1.0)?
Comment 2 Mark Christiaens CLA 2010-12-15 03:45:07 EST
(In reply to comment #1)
> The implementation of the simple cache should be replaced by a
> MapMaker.weakKeys.computingMap with a reasonable concurrency level.
> 
> The NoSuchMethodException is internal to the PolymorphicDispatch as it is used
> by the ScopeProvider. However, we should avoid population of the stacktrace or
> add a check prior to invoking a method that does not exist just to "learn" that
> it does not exist...
> 
> Btw: can you say something about memory footprint and actual parsing time for
> the large library file (compared to Xtext 1.0)?

The use of a computingMap sounds like an interesting try.

Indeed, I also think it's very costly to "learn" through NoSuchMethodExceptions.  Was this approach also present in the 1.0.1 version?

I'm not yet ready to say much about the memory footprint.  I notice that I said above that these profiling results are from "opening" the file but actually they are from simply _adding_ the file to the project and not yet opening it.  It seems to take longer than I remember from the 1.0.1 implementation.  That's why I think something has changed and may be doing the wrong thing.  

The real problem I'm working up to is that I can't actually open the file.  Xtext goes into a very long loop for a couple of minutes (and then I have up) without being responsive anymore.  (And it's not caused by too little memory).  I decided to first investigate simply adding the file and then work my way up to opening it.  Opening the file on my Xtext 1.0.1 implementation took 12 seconds.  Once I've got some useful profiling results about opening the file, I'll create another bug report.  When I can open the file, I can meaningfully compare memory consumption.
Comment 3 Sebastian Zarnekow CLA 2010-12-15 03:53:41 EST
The builder loads the file when it is added to the project. The latest nightly has some problem with imported namespaces when it tries to resolve crossreferences (which the builder will try to do). This may cause the performance problem.
You could try to add the file to a project without the Xtext nature and open it afterwards.

The polymorphic dispatcher and its usage in the declarative scoping was the same in Xtext 1.0.1.
Comment 4 Mark Christiaens CLA 2010-12-16 12:06:21 EST
I've been digging deeper into why my big VHDL-file can't be opened (I'm talking now about really opening the file, not just adding it to the project).  I think I'm starting to see what might be the cause.  This profile dump gives a hint:

Name	Time (ms)	Invocation Count
...

org.eclipse.jface.text.TextViewer.fireInputDocumentChanged(IDocument, IDocument)	74384	1
org.eclipse.jface.text.presentation.PresentationReconciler$InternalListener.inputDocumentChanged(IDocument, IDocument)	74384	1
org.eclipse.jface.text.presentation.PresentationReconciler.access$3(PresentationReconciler, IRegion, IDocument)	74384	1
org.eclipse.jface.text.presentation.PresentationReconciler.processDamage(IRegion, IDocument)	74384	1
org.eclipse.jface.text.presentation.PresentationReconciler.createPresentation(IRegion, IDocument)	74384	1
org.eclipse.jface.text.rules.DefaultDamagerRepairer.createPresentation(TextPresentation, ITypedRegion)	281553	290
org.eclipse.xtext.ui.editor.syntaxcoloring.TokenScanner.nextToken()	282248	47515
com.google.common.collect.AbstractIterator.hasNext()	282220	47515
com.google.common.collect.AbstractIterator.tryToComputeNext()	282220	47515
com.google.common.collect.Iterators$7.computeNext()	282219	47515
com.google.common.collect.AbstractIterator.hasNext()	144583	158054075
com.google.common.collect.AbstractIterator.tryToComputeNext()	111566	158054075
org.eclipse.xtext.ui.editor.model.DocumentTokenSource$IRegionIterable$1.computeNext()	99832	158054075
org.eclipse.xtext.ui.editor.model.DocumentTokenSource$IRegionIterable$1.computeNext()	90674	158054075
java.util.Collections$UnmodifiableCollection$1.next()	25800	158053786
java.util.Collections$UnmodifiableCollection$1.hasNext()	12220	158054075
org.eclipse.xtext.ui.editor.model.DocumentTokenSource$TokenAdapter.<init>(DocumentTokenSource$TokenInfo, int)	9211	158053785
com.google.common.base.Preconditions.checkState(boolean)	8531	158054075
org.eclipse.xtext.ui.editor.model.Regions$1.apply(Object)	89160	158053785
com.google.common.collect.AbstractIterator.next()	42005	158053785

This is a part of the execution profile of the main thread.  The thing to notice is that it's iterating like crazy over the tokens of the file.  I think I understand why.
- The DefaultDamagerRepairer is presented with ITypedRegions that are small chunks of the 2 MB VHDL file.  I looked at the first ones and the region's size is in the order of hundreds of bytes, a kilobyte ...
- As a response, it makes a TokenScanner.  
- That TokenScanner is built on a list of _all_ tokens of the entire file with a filter on top of that.  See TokenScanner:
	public void setRange(IDocument document, final int offset, final int length) {
		currentToken=null;
		iterator = Iterables.filter(getTokens(document),Regions.overlaps(offset, length)).iterator();
	}
- When the first token is requested from the TokenScanner, it will iterate through the underlying token list up to the first token in the region (based on the filter). 

So that means that the underlying tokens are almost all fetched for each region (the more the region is to the back of the file, the more tokens to fetch).  I suspect that the number of regions in a file is proportional to the file's size.  Therefore, this approach to damage repairing is O(n^2).  Very painful for such a big file.
Comment 5 Sebastian Zarnekow CLA 2011-01-30 09:16:59 EST
Postponed
Comment 6 Sebastian Zarnekow CLA 2011-04-18 08:47:13 EDT
I took a look at performance of parsing / linking / open in editor with a simplified version of the domain model example (without Xbase).

Parsing and resolving links of a 16MB file took about 35 seconds where parsing took 7 secs and linking 28 secs. 

Other test results:

16 MB - 35 | 7   | 28
5.5MB - 13 | 2.5 | 10.5
2.2MB - 5.5| 1   |  4.5

All these numbers look reasonable to me.

It took around 4 secs to open the 2MB file in an editor. Editing at the end of the file was a little sluggish but I won't expect files with 30.000 lines to be edited manually very often.
Comment 7 Sebastian Zarnekow CLA 2011-04-18 09:34:28 EDT
Pushed to master.
Comment 8 Karsten Thoms CLA 2017-09-19 17:16:21 EDT
Closing all bugs that were set to RESOLVED before Neon.0
Comment 9 Karsten Thoms CLA 2017-09-19 17:27:47 EDT
Closing all bugs that were set to RESOLVED before Neon.0