Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 286022 - Large number of Version objects retained in memory
Summary: Large number of Version objects retained in memory
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: Framework (show other bugs)
Version: 3.4.2   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.6 M2   Edit
Assignee: Thomas Watson CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-07 14:32 EDT by Brian Lillie CLA
Modified: 2009-08-13 18:04 EDT (History)
3 users (show)

See Also:
john.arthorne: review+


Attachments
Proposed patch (3.88 KB, patch)
2009-08-07 14:34 EDT, Brian Lillie CLA
tjwatson: iplog+
Details | Diff
Fix + test (12.71 KB, patch)
2009-08-10 17:18 EDT, Thomas Watson CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Brian Lillie CLA 2009-08-07 14:32:08 EDT
Build ID: 3.4

Steps To Reproduce:
The target is a product built on top of Eclipse.  There are a sizable number of bundles, and also a number of import/export packages with versions.

During performance test, following a set of activities that occur on the platform, a java dump is initiated and analyzed with eclipse memory analyzer.

The analysis shows that there are a significant number of Version objects that have been created, but only a small fraction of which have unique values.

Depending upon the collection of plugins being used and the point at which the dump is initiated, this might be 406 unique out of 7742, 165 out of 2874, 195 out of 3398 -- all of which come out to about 5-6% unique.  For those particular sizes, memory consumption appears to be reduced by about 80K or more, depending upon the number of objects created.
Comment 1 Brian Lillie CLA 2009-08-07 14:34:42 EDT
Created attachment 143796 [details]
Proposed patch

Patch adds a cache to StateImpl, so that it is accessible via both AdaptorUtil and StateReader.  The cache is a mechanism to 'intern' the Version objects, so that we could retain unique instances of Versions according to the values.
Comment 2 Thomas Watson CLA 2009-08-07 14:53:33 EDT
Thanks Brian,

I wonder if it is worth while to create a generic intern mechanism.  I also wonder if we should still look into indexing the duplicate Version objects in the persistent resolver cache.  This would avoid the need to create all of these Version objects just to throw them away when we intern them.
Comment 3 John Arthorne CLA 2009-08-10 09:39:00 EDT
We ended up creating a factory method in the p2 version class to allow for this kind of optimization on a wider scale. We found a large number of the duplicates were "0.0.0" versions for things like package imports, which were easily optimized by checking for that special case.
Comment 4 Thomas Watson CLA 2009-08-10 10:32:24 EDT
(In reply to comment #3)
> We ended up creating a factory method in the p2 version class to allow for this
> kind of optimization on a wider scale. We found a large number of the
> duplicates were "0.0.0" versions for things like package imports, which were
> easily optimized by checking for that special case.
> 

In the resolver cache we store empty versions as null and when we read them back in we set them all to Version.emptyVersion.  This allows us to reduce the number of instances of 0.0.0 Version objects when launched from a cached state.  But for Version ranges we seem to duplicate the max version when using a range that is is >= X (i.e. Import-Package: foo; version=1.0

In this case we have a version range of [1.0, MAX] and we duplicate MAX all over the place.  Also, in this particular case Brian has a product that really does have a LOT of duplicate versions that are not 0.0.0.
Comment 5 Thomas Watson CLA 2009-08-10 17:18:24 EDT
Created attachment 143970 [details]
Fix + test

I abstracted out the notion of ObjectPool into a separate class.  I also use the same pool for both String and Version objects for the resolver and bundledata caches.

The testcase is only testing that the ObjectPool really does GC the objects if they do not have a strong reference to them.  The testcase passes consistently on my machine/VM but I could see how it may depend on the GC implementation.  The testcase assumes all weak referenced objects WILL be GC'ed on the first call to System.gc().  Not sure that is a valid assumption.

There is a concern about the impact to startup performance from a cached state.  We will have to monitor the startup performance tests to see how this change effects startup performance.

I also added some debug trace options to ObjectPool so we can monitor how big it gets and how many duplicates are found etc.
Comment 6 Thomas Watson CLA 2009-08-10 17:19:24 EDT
Comment on attachment 143796 [details]
Proposed patch

Mark for iplog because I based my patch off of Brian's contribution.  Thanks Brian.
Comment 7 Thomas Watson CLA 2009-08-10 18:07:00 EDT
John, I would like a review for 3.6.  Things to consider:

- I wonder if we should consider having a different Map for each type of object that is "interned" instead of using one Map.

- Are there other cases in Eclipse where such an ObjectPool would be useful?  Should we consider some service or static API to do pooling?

- Note that we do not use the StringPool approach here because the data structures we are loading in this case are not fully loaded into memory at one time.  Sections of the data structure can be loaded and flushed from memory so we do not have a full view of the data structure to go over and pool the objects effectively.

- We use a WeakHashMap because we want to prevent pinning the objects from GC when a section of the data structure is flushed from memory.

If you want to see the pool grow and shrink then set the following configuration option:

osgi.lazyStateUnloadingTime=1000

And set the following trace option:

org.eclipse.osgi/debug/objectPool/adds=true

If you switch to a new perspective then you will see that as new bundles are loaded then new objects are added to the pool.  Some objects should have been flushed from memory (because you set osgi.lazyStateUnloadingTime to be 1 second).  This should result in a smaller ObjectPool map size when new objects are interned.
Comment 8 John Arthorne CLA 2009-08-13 16:52:45 EDT
I reviewed the patch and verified that it works as expected. My only general problem with this approach is the overhead for each cached object: A hash map entry object, and weak references for both key and value. This means a performance degradation for cases where the object had no duplicates (or perhaps even for a small number of duplicates). However, since this is already being done for strings this isn't a new problem. And, since Version objects are larger, it doesn't take much duplication before we see a net benefit. When I ran tests with this on my install it certainly found a large number of duplicates.

Since Strings interned via String.intern() are now garbage collected on all JVM's version 1.5 or greater, we should consider switching to String.intern() for strings and only keep the object pool for non-string objects (and Strings on older VM's). That would free up a lot of the current overhead of weak references, weak map entries, etc.
Comment 9 Thomas Watson CLA 2009-08-13 18:04:55 EDT
Patch release (with updated copyright dates).  I also added more gc() runFinalization() calls to the ObjectPool unit test to try and ensure the testcase consistently GC's the weak references.