Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 367569

Summary: ResourceSetImpl#getResource optimization
Product: [Modeling] EMF Reporter: Vladimir Piskarev <vpiskarov>
Component: CoreAssignee: Ed Merks <Ed.Merks>
Status: CLOSED FIXED QA Contact:
Severity: enhancement    
Priority: P3 CC: bhunt, knut.wannheden, saulius.tvarijonas, stepper
Version: 2.8.0   
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard:
Bug Depends on:    
Bug Blocks: 372527    
Attachments:
Description Flags
Proposed patch
none
Revised patch
none
Synthetic test case
none
Patches to address the issue. none

Description Vladimir Piskarev CLA 2011-12-27 03:06:18 EST
Build Identifier: 2.8.0

Please see the proposed patch which has allowed to reduce batch-loading time of 5000 resources from 3 minutes to about 10 seconds in practical circumstances. The basic idea is to use normalized resource URIs as keys in URIResourceMap. This allows to avoid a full scan on each 'new' resource loading and provides about-linear scalability.

Reproducible: Always
Comment 1 Vladimir Piskarev CLA 2011-12-27 03:07:57 EST
Created attachment 208811 [details]
Proposed patch
Comment 2 Ed Merks CLA 2011-12-28 03:03:36 EST
This changes the behavior quite a bit.  Normalization has to be done before the map lookup, so resolving a huge number of proxies in a smaller resource set will tend to get more expensive, rather than less.

Could you provide a test case that demonstrates your performance numbers?  That way I can explore alternatives.  For example, we might hook into inverseAdd/inverseRemove to compute the map more eagerly. One problem with that approach though is that a resource's URI can be changed after the resource is added; sometimes that happens because clients create resources directly and add them manually.  Perhaps we could use an adapter on the resource to manage that problem as well, or hook directly into the resource implementation's setURI method...
Comment 3 Vladimir Piskarev CLA 2011-12-28 08:17:40 EST
Created attachment 208822 [details]
Revised patch

Please see the revised patch. I agree that the previous version was not very accurate - fixing one hot path, it could introduce another. What about storing normalized URI together with the given URI in the URIResourceMap? Since the two will be equal in many cases, it does not seem to introduce a huge overhead.

Unfortunately I can't share the code which was profiled since it's proprietary (and impractically large setup anyway), so I could try to make a synthetic test case, if it's still needed.
Comment 4 Ed Merks CLA 2011-12-28 09:33:44 EST
Yes, storing both in the map was another thing I was considering.  One could also eagerly populate the map during the loop, but then you'd be doing it each time, which seems pretty bad.  Eager population upon add would be better.

Even a synthetic test case would at least mirrors the kind of improvement you'd expect to see and by which to compare the different solutions would be helpful...
Comment 5 Vladimir Piskarev CLA 2011-12-28 11:06:09 EST
Sure, I'll try to come up with a synthetic test and share the result (perhaps, tomorrow...)
Comment 6 Vladimir Piskarev CLA 2011-12-29 07:04:23 EST
Created attachment 208842 [details]
Synthetic test case

Please see the test case attached.

On my system, it takes 11219 ms to load 10000 resources into a resource set with the Revised Patch applied vs. 100989 ms to load those same resources into an unpatched resource set. For 20000 resources, it's 21501 ms and 357483 ms accordingly.

Of course, it's a synthetic test. In real world use, the difference may be even more dramatic. For example, XtextResourceSet has a bit slower URIConverter (which additionally understands 'classpath' scheme). The numbers given in the bug description were taken from an initiative that had allowed to reduce (clean-) build time from 36 minutes to about 4 minutes for a project containing 5000 Xtext-based 'entities'.
Comment 7 Ed Merks CLA 2012-04-26 05:18:11 EDT
Created attachment 214587 [details]
Patches to address the issue.

This design is intended to provide a flexible way to optimize how resources are located by ResourceSet.getResource.

A new abstract base class, ResourceLocator is introduced in ResourceSetImpl as well as a derived implementation MappedResourceLocator.  When a resource locator is present, the resource set will delegate all operations the implementation of it.  The mapped resource locator maintains a map from URI to normalized URI and maintains a map from normalized URI to the resources with that URI.  The latter map is maintained by listening to changes in the resource set's list of resources and to resource URI changes, so it always remains up to date.  If the URI converter or the URI converter's map changes, that is detected and then maps are recomputed.  This approach should generally reduce resource look-up to be constant time rather than linear with the size of the resource set.
Comment 8 Ed Merks CLA 2012-04-26 05:18:53 EDT
These changes are committed to git for 2.8.
Comment 9 Bryan Hunt CLA 2012-04-26 09:14:53 EDT
So this is basically my MongoResourceSetImpl integrated into ResourceSetImpl?  That would be awesome :)
Comment 10 Ed Merks CLA 2012-05-14 10:07:16 EDT
The changes are available in the 2.8M7 build.