Community
Participate
Working Groups
Build Identifier: I20101028-1441 I've opened a moderately sized file (114561 B) and using my profiler I see that there are 8 copies of this file in memory. In the attachment (DuplicateChars.html) you see the path from the GC roots to the character arrays. I know that Strings are often backed by the same array but I suspect that this is not the case here since the profiler does seem to pick up on this (see the reference from the "divmod" and "numeric_std" Strings). I'm testing with files up to 2 MB. That would result in 4 MB per to store on copy (Chars are 2 B). That times 8 is 32 MB. Quite a lot. In the same vain I see 16 large ConcurrentHashMaps (CHMHashEntry.html) that are identical according to the profiler. I don't get yet what they are for but to me they are smelly. You may want to take a look at those. Reproducible: Always
Created attachment 185545 [details] Dump of profiling results Note that I did remove part of the string representing the file content. If not, this HTML would have been over a MB.
Created attachment 185546 [details] Dump of profiling results
I am seeing the same thing with XText 1.0.2
There is not much that can be done about this one with reasonable effort. A freshly opened editor causes the complete string to be hold 3 times in memory. There is the dirty state manager, that refers to the content of the resource, the resource's parse result and the last document event that the text viewer refers to. A second editor with another resource that has a cross link to the first one, will cause the first one to be copied one more time. The document itself will hold references to a number of substrings for each line of the input. I could implement something (modifying existing APIs) which would save exactly one of the copies of the entire input. If the fact that the string is stored 4 times is not a show stopper on your side, I'm inclined to postpone this ticket. Please let me know if that is no option for your use case.
I think that the duplicate files are not yet a show stopper. Removing 1 of X duplicates is probably not useful anyway so I'm fine with postponing.