Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 460393

Summary: Infinite loop in ResolverImpl.resolve
Product: [Eclipse Project] Equinox Reporter: Marc-André Laperle <malaperle>
Component: FrameworkAssignee: Thomas Watson <tjwatson>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: P3 CC: cvgaviao, eclipse.sprigogin, matthieu.helleboid, slewis, tjwatson
Version: 4.5.0 Mars   
Target Milestone: Mars M7   
Hardware: PC   
OS: Linux   
Whiteboard:
Attachments:
Description Flags
Test target file none

Description Marc-André Laperle CLA 2015-02-20 01:27:44 EST
Created attachment 250954 [details]
Test target file

1. Open attached test.target file in Target Editor, Set as Target platform
2. Run as an Eclipse Application (with all plugins enabled)
3. Runtime Eclipse doesn't start and enters in an infinite loop

This did not happen in 4.5M4 (201412102000) but happens in the closest next build available (4.5-I20150129-1830). If I remove the Jetty feature from the target, it works (or if I just unselect the old Jetty plugins from the launch configuration). It seems that Jetty is causing an error in the resolving and the loop doesn't handle that gracefully. I also found that by reverting the recent commits for bug 457118, it works again. Thankfully, I can't seem to reproduce the issue with 4.4.2RC4. Here is what the stack looks like:

ResolverImpl.resolve(ResolveContext) line: 266	
ModuleResolver$ResolveProcess.resolveRevisions(List<Resource>, boolean, ResolveLogger, Map<Resource,List<Wire>>) line: 984	
ModuleResolver$ResolveProcess.resolveRevisionsInBatch(Collection<ModuleRevision>, boolean, ResolveLogger, Map<Resource,List<Wire>>) line: 964	
ModuleResolver$ResolveProcess.resolve() line: 892	
ModuleResolver.resolveDelta(Collection<ModuleRevision>, boolean, Collection<ModuleRevision>, Map<ModuleRevision,ModuleWiring>, ModuleDatabase) line: 126	
ModuleContainer.resolveAndApply(Collection<Module>, boolean, boolean) line: 479	
ModuleContainer.resolve(Collection<Module>, boolean, boolean) line: 437	
ModuleContainer.refresh(Collection<Module>) line: 955	
ModuleContainer$ContainerWiring.dispatchEvent(ContainerWiring, FrameworkListener[], int, Collection<Module>) line: 1336	
ModuleContainer$ContainerWiring.dispatchEvent(Object, Object, int, Object) line: 1	
EventManager.dispatchEvent(Set<Entry<K,V>>, EventDispatcher<K,V,E>, int, E) line: 230	
EventManager$EventThread<K,V,E>.run() line: 340
Comment 1 Matthieu Helleboid CLA 2015-02-20 07:49:04 EST
There was two main modifications in bug #457118 :

1- 
http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=e7db81bab4bce237fcafd3d624e56d183bbd6dae 
backported for Luna SR2 in bug #457718 : http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=e7db81bab4bce237fcafd3d624e56d183bbd6dae

2-
http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=16bb483bd75d665c3ff6a554419ddf8ad93bab57

So if you cannot reproduce this bug on Luna 4.4.2RC4, maybe only the second commit is causing this. Can you test that?
Comment 2 Thomas Watson CLA 2015-02-20 08:44:45 EST
Thanks for testing on both 4.4.2 and 4.5 (Mars).  I will investigate if the later commit is the cause, but could be a combination of the two.
Comment 3 Thomas Watson CLA 2015-02-20 09:58:29 EST
Setting configuration option equinox.resolver.revision.batch.size=1 gets around the issue.
Comment 4 Thomas Watson CLA 2015-02-20 10:00:35 EST
(In reply to Thomas Watson from comment #3)
> Setting configuration option equinox.resolver.revision.batch.size=1 gets
> around the issue.

Forgot to state, that does seem to indicate that the batching of the bundles to resolve is causing the issue.  But I don't think it is the batching code that is causing the issue, but rather an issue in the felix resolver.  Need to track that down.
Comment 5 Thomas Watson CLA 2015-02-20 10:03:53 EST
Sorry for the blast of comments.  I notice your target includes both servlet 3.0 and 3.1.  But you are using jetty 8 repo which only supports servlet 3.0 I think.  We should not have an infinite as a result, but something seems off about this target.
Comment 6 Marc-André Laperle CLA 2015-02-20 10:09:28 EST
(In reply to Thomas Watson from comment #5)
> Sorry for the blast of comments.  I notice your target includes both servlet
> 3.0 and 3.1.  But you are using jetty 8 repo which only supports servlet 3.0
> I think.  We should not have an infinite as a result, but something seems
> off about this target.

Yes, the target is not good and we fixed it in our master branch (CDT). But it used to work OK in previous versions and I'm concerned this could happen in other scenarios.
Comment 7 Thomas Watson CLA 2015-02-20 11:46:22 EST
At this point I may have to default the batch size to 1 until we can get a handle on the cause.  It appears the multiple versions of the javax.servlet package are causing an explosion in the permutations to try when the batch size gets to large.

equinox.resolver.revision.batch.size=38 seems to be the magic number in this case that causes the 'infinite loop'.  I'm not really sure it is infinite at this point or if the possible solutions it is trying is just so large that it appears infinite.  There could be a bug in the felix resolver that is causing it to try the 'same' solution over and over, in that case this would be an infinite loop.
Comment 8 Thomas Watson CLA 2015-02-26 10:49:02 EST
For now I changed the default of the batch to 1:

http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=025ca5529ced196ed37091198976820e63180e1d
Comment 9 Thomas Watson CLA 2015-04-07 16:31:30 EDT
There have been some major changes to the felix resolver to fix some performance issues (https://issues.apache.org/jira/browse/FELIX-4656).

These are pretty involved changes that I think are risky to make in M7.  Initial testing show that the changes improve the resolve time for this problematic scenario down to 200 seconds.  Before it would never complete and end in an out of memory error.

At this point I think we need to keep the batch setting at 1 for Mars.  I opened bug 464084 to track updating the resolver implementation for Neon.

Marking this as fixed since setting the default batch size to 1 has worked around the issue for now.
Comment 10 Thomas Watson CLA 2015-04-17 17:05:02 EDT
FYI, more discussions in Felix with respect to the resolver optimizations.  I spent more time reviewing the changes and they are significant, but they make a pretty important optimization that prevents the same solution set from being processed over and over.

After more testing and more review I decided to include the changes into M7 with bug 464084.