Community
Participate
Working Groups
The ResourceDescriptions of the BuilderState contains of many URIs. The Segments of a URI are cached by EMF, but this is not sufficient for the EMFBasedPersister and the BuilderState. 1. The Fragment is determined by split, which leads to a view to the substring, but keeps the original full Char-Array in memory => Each URI load from file is in memory 2. The files are stored in directories, but the uri is not build as hierarchical but with a complete String. => At least for each model file the hole path is in memory.
Created attachment 208325 [details] Patch for optimizing the ResourceDescriptions in the ClusteringBuilderState The URI inside an editor are build by the cached base-URI of emf and an appended new fragment. The biggest memory consumption from the ResourceDescriptions is avoided by splitting the segments and fragment and reuse them for other URIs.
One additional mark: The 'Open model element' dialog opens with this changes and 45.000 elements after seconds. Before it shows searching for minutes and was stalling at 30.000 Elements.
I've also encountered this issue before. We also found the first problem you identified to be the worse of them. The second problem we didn't bother doing anything about, since there are in the end not that many resources. But I suppose even that can make a small difference. Can you post some numbers on the memory savings achieved with these optimizations? Also the effect it has on the load time of the builder state would be interesting to know. IMHO it would make more sense to call the URIMemoryOptimizer directly from BuilderStateFactoryImpl where the URI objects are created (when loading the builder state). Likewise I would rather let BuilderStateUtil#create() apply this optimization than doing it in ClusteringBuilderState. If we need / want Guice injection, this might be a good time to make the methods of BuilderState non static and have it injected in ClusteringBuilderState. We would then not need the ResourceDescriptionMemoryOptimizer class. Any other thoughts on this? Also note that there is no point in caching the URI's scheme as that ends up being interned (String#intern()) in URI.
> Can you post some numbers on the memory savings achieved with these > optimizations? Because the memory was the main problem, I've measured this by retained Size of ResourceDescriptions. Before optimization we needed 226MB for the one project, afterwards there are only 72MB remaining. For this test I only used the optimized persister because the Builder can't be changed so easily. For the editor we use the UriConverter from the resourceSet with another optimizer (not sure if it could be singleton without side effects yet), which is accurate enough. > Also the effect it has on the load time of the builder state > would be interesting to know. I've not measured yet. The sice of the saved file is not affected by this change. > IMHO it would make more sense to call the URIMemoryOptimizer directly from > BuilderStateFactoryImpl where the URI objects are created (when loading the > builder state). Likewise I would rather let BuilderStateUtil#create() apply > this optimization than doing it in ClusteringBuilderState. If we need / want > Guice injection, this might be a good time to make the methods of BuilderState > non static and have it injected in ClusteringBuilderState. We would then not > need the ResourceDescriptionMemoryOptimizer class. Any other thoughts on this? Exactly. I do not know why the BuilderStateUtil is static. If ResourceDescriptionMemoryOptimizer is inside the AbstractBuilderState it can't be reused and should apply at the load method were the class of the ResourceDescription could be a custom one. > Also note that there is no point in caching the URI's scheme as that ends up > being interned (String#intern()) in URI. Thanks. It's still important for the other values to avoid large nested char-Arrays.
Created attachment 208339 [details] Addition patch for IBuilderStateUtil
Created attachment 208345 [details] All in one patch file for injected IBuilderStateUtil
Time profiling for a smaller model shows, that URI.createHierarchicalUri is the most expensive way to create an URI because it validates all parts, except the fragment, which is expensive. In profiler the time grows from 46s to 74s. If I take the new URI, trimFragment and append the (cached) optimized fragment I need 10% more memory (44MB against 40MB) but can load it in 51 seconds. I think this should be the way here. I'll update the patch later.
Also note that it's only URI#createURI(String) which uses the caching inside URI. So I don't think it's worth optimizing the segments etc. Just the fragment. That's what we did in our product. I suppose we could check if this is an enhancement that could go into EMF. I.e. in URI#createURI() call new String() for the URI base and the fragment.
Created attachment 208756 [details] Patch for optimizing only the fragment of uris I tried also caching the baseUri based on segmented uri and use them to append different fragments. This way the validation is done only if necessary. This caching could also be part of EMF, but it caches on a different level. I'm not sure if URI should reuse segments or replace fragments with own short Strings, because it would increase memory in many other circumstances. The profiling result showed in my example, that caching and reusing the segments leads to 500kb of 90MB ResourceDescriptions is saved for Strings, but WeakHashMap and WeakReferences took more than 600kb additionally. The loading time for saved ResourceDescriptions increased from 6.7s to 6.9s approx. With this results I prefere a short optimizer which optimizes only the fragments. The optimizer is injected and could be replaced in customer projects.
I think this enhancement may no longer be necessary when using EMF 2.8 in the runtime (see bug 367911). It actually has a big advantage even: It doesn't attempt to reduce the memory footprint after the fact (i.e. after loading the builder state from disk): The fragments will already be stored as separate new String objects to begin with.
I think we can mark it here as wont fix. Everyone can take a look at my workaround, but for xtext the changes in emf should be sufficient.