Community
Participate
Working Groups
When creating an URI including a fragment (e.g. "foo:/bar#baz") using only URI#createURI() the resulting URI's 'fragment' field will be a String which still contains the complete URI ("foo:/bar#baz" in the example) in its 'value' field. That is due to the implementation of String#substring(). In applications dealing with many URI instances of which many have a common base this is can be a substantial memory overhead. (In an Xtext based application I've been working on we are dealing with around 5 million URI instances (with fragments) which have an URI base with an average length of 50 characters. There are only around 25 thousand unique URI bases. This means there would be around 5*10^6*50*2 bytes (so almost 500 MB) of "wasted" memory in the fragment Strings if they were all loaded into memory using URI#createURI() only.) I would therefore like to propose the following change to URI: public static URI createURIWithCache(String uri) { int i = uri.indexOf(FRAGMENT_SEPARATOR); String base = i == -1 ? uri : uri.substring(0, i); String fragment = i == -1 ? null : uri.substring(i + 1); URI result = uriCache.get(base); if (result == null) { result = parseIntoURI(base); uriCache.put(base, result); } if (fragment != null) { // WAS: result = result.appendFragment(fragment); result = result.appendFragment(new String(fragment)); } return result; } The proposed change is based on the assumption that the URI instances created using URI#createURI() will typically share a common base with at least one other URI created URI#createURI(). If not (i.e. all created URIs with a fragment have a unique URI base), then the proposed change will actually lead to a slight memory overhead, as there will be two char[] instances for every URI instead of just one. But I presume the URI#uriCache is based on the same assumption. As a workaround an application can do this on its own. I.e. instead of simply calling URI#createURI() for a String with a fragment, it could first split the String into base and fragment and then call URI#createURI() for the base and then URI#appendFragment() for the fragment (which would be wrapped in a new String).
What about this variation of your idea: public static URI createURIWithCache(String uri) { int i = uri.indexOf(FRAGMENT_SEPARATOR); String base = i == -1 ? uri : uri.substring(0, i); String fragment = i == -1 ? null : uri.substring(i + 1); URI result = uriCache.get(base); if (result == null) { result = parseIntoURI(base); uriCache.put(base, result); if (fragment != null) { result = result.appendFragment(fragment); } } else if (fragment != null) { result = result.appendFragment(new String(fragment)); } return result; } I.e., if there isn't a cached result, we already end up with lots of substrings of the parsed original string, so we might as well use a substring for the fragment; otherwise, we do as you suggested. This would avoid the change increasing the footprint in cases where there are few cache hits. What do you think?
That's a clever improvement! So if a given URI base is only encountered once by URI#createURI() (when called with or without fragment), all created String objects (one for the schema, one for every path segment, one for the fragment, etc.) will reference the same char[] instance in their 'value' field. So in the end there should only be the additional slight overhead of calling the String(String) constructor. But IMHO that is worth it given the gains in memory usage. Furthermore not calling String#intern() in URI(...) (see bug 358175) would make a bigger difference, if performance is a concern.
Would you still like to see the intern() thing gone? We never really measured the performance for this and intended it only to help with footprint. It's not clear it ever saves much. Most often it's called where other substrings keep the character array in memory and clients who call methods that supply a scheme likely pass in a constant that's already interned...
Sorry about the confusion. I only wanted to point out that I don't think the additional proposed String(String) constructor call in URI#createURIWithCache(String) should be a noticeable performance hit. If you are concerned about this, then I think there would be more to be gained by removing the call to String#intern() in cases where it isn't required (e.g. URI#appendFragment(String)). Some measurements I made indicated that URI#appendFragment(String) could then be sped up by about 50%. But I think you are right about String#intern() not really saving any memory. I would certainly not miss it if you were to remove it :-)
FWIW, the thing that bothered me about the String#intern() call was that when CPU profiling an application with the "sampling" mode in YourKit, the method would in our application often show up as one of the most expensive operations. The reason being that it is called so many times (from the URI class) and that it's running time is lower than YourKit's sampling period (which obviously will lead to inaccurate reporting). A solution is to use YouKit's more time consuming "tracing" profiling mode. But this of course isn't an EMF issue in any way.
The enhancement is committed to git for 2.8.
Changing resolution...
The changes are available in the M6 build.
I'm making significant new changes via https://bugs.eclipse.org/bugs/show_bug.cgi?id=403093.