Community
Participate
Working Groups
Build ID: 3.4 Steps To Reproduce: The target is a product built on top of Eclipse. There are a sizable number of bundles, and also a number of import/export packages with versions. During performance test, following a set of activities that occur on the platform, a java dump is initiated and analyzed with eclipse memory analyzer. The analysis shows that there are a significant number of Version objects that have been created, but only a small fraction of which have unique values. Depending upon the collection of plugins being used and the point at which the dump is initiated, this might be 406 unique out of 7742, 165 out of 2874, 195 out of 3398 -- all of which come out to about 5-6% unique. For those particular sizes, memory consumption appears to be reduced by about 80K or more, depending upon the number of objects created.
Created attachment 143796 [details] Proposed patch Patch adds a cache to StateImpl, so that it is accessible via both AdaptorUtil and StateReader. The cache is a mechanism to 'intern' the Version objects, so that we could retain unique instances of Versions according to the values.
Thanks Brian, I wonder if it is worth while to create a generic intern mechanism. I also wonder if we should still look into indexing the duplicate Version objects in the persistent resolver cache. This would avoid the need to create all of these Version objects just to throw them away when we intern them.
We ended up creating a factory method in the p2 version class to allow for this kind of optimization on a wider scale. We found a large number of the duplicates were "0.0.0" versions for things like package imports, which were easily optimized by checking for that special case.
(In reply to comment #3) > We ended up creating a factory method in the p2 version class to allow for this > kind of optimization on a wider scale. We found a large number of the > duplicates were "0.0.0" versions for things like package imports, which were > easily optimized by checking for that special case. > In the resolver cache we store empty versions as null and when we read them back in we set them all to Version.emptyVersion. This allows us to reduce the number of instances of 0.0.0 Version objects when launched from a cached state. But for Version ranges we seem to duplicate the max version when using a range that is is >= X (i.e. Import-Package: foo; version=1.0 In this case we have a version range of [1.0, MAX] and we duplicate MAX all over the place. Also, in this particular case Brian has a product that really does have a LOT of duplicate versions that are not 0.0.0.
Created attachment 143970 [details] Fix + test I abstracted out the notion of ObjectPool into a separate class. I also use the same pool for both String and Version objects for the resolver and bundledata caches. The testcase is only testing that the ObjectPool really does GC the objects if they do not have a strong reference to them. The testcase passes consistently on my machine/VM but I could see how it may depend on the GC implementation. The testcase assumes all weak referenced objects WILL be GC'ed on the first call to System.gc(). Not sure that is a valid assumption. There is a concern about the impact to startup performance from a cached state. We will have to monitor the startup performance tests to see how this change effects startup performance. I also added some debug trace options to ObjectPool so we can monitor how big it gets and how many duplicates are found etc.
Comment on attachment 143796 [details] Proposed patch Mark for iplog because I based my patch off of Brian's contribution. Thanks Brian.
John, I would like a review for 3.6. Things to consider: - I wonder if we should consider having a different Map for each type of object that is "interned" instead of using one Map. - Are there other cases in Eclipse where such an ObjectPool would be useful? Should we consider some service or static API to do pooling? - Note that we do not use the StringPool approach here because the data structures we are loading in this case are not fully loaded into memory at one time. Sections of the data structure can be loaded and flushed from memory so we do not have a full view of the data structure to go over and pool the objects effectively. - We use a WeakHashMap because we want to prevent pinning the objects from GC when a section of the data structure is flushed from memory. If you want to see the pool grow and shrink then set the following configuration option: osgi.lazyStateUnloadingTime=1000 And set the following trace option: org.eclipse.osgi/debug/objectPool/adds=true If you switch to a new perspective then you will see that as new bundles are loaded then new objects are added to the pool. Some objects should have been flushed from memory (because you set osgi.lazyStateUnloadingTime to be 1 second). This should result in a smaller ObjectPool map size when new objects are interned.
I reviewed the patch and verified that it works as expected. My only general problem with this approach is the overhead for each cached object: A hash map entry object, and weak references for both key and value. This means a performance degradation for cases where the object had no duplicates (or perhaps even for a small number of duplicates). However, since this is already being done for strings this isn't a new problem. And, since Version objects are larger, it doesn't take much duplication before we see a net benefit. When I ran tests with this on my install it certainly found a large number of duplicates. Since Strings interned via String.intern() are now garbage collected on all JVM's version 1.5 or greater, we should consider switching to String.intern() for strings and only keep the object pool for non-string objects (and Strings on older VM's). That would free up a lot of the current overhead of weak references, weak map entries, etc.
Patch release (with updated copyright dates). I also added more gc() runFinalization() calls to the ObjectPool unit test to try and ensure the testcase consistently GC's the weak references.