Community
Participate
Working Groups
Created attachment 107759 [details] Fixes to the ArtifactRepositoryManager and MirrorRequest Problem: In certain usage scenarios, it is possible for each MirrorRequest to receive it's own in-memory instance of the bundle pool, causing both a massive growth in memory (if the existing pool is large) as well as causing subsequent provisioning to fail with messages such as artifact not found, which can ripple up as a failure to prepare a partial IU. The underlying issue had to do with how slashes were being processed in the ArtifactRepositoryManager. In addition, while working through this code, there were places in the manager where the synchronization looked suspect in protecting to ensure that only a single instance of a given repo can exist for a certain URL. Details: The basic flow goes something like this (to orchestrate the failure): 1. The bundle pool repo is added (not loaded) without a trailing slash: C:\MyCache 2. The bundle pool repo is added (not loaded) -with- a trailing slash: C:\MyCache\ 3. The manager will only ever return the first one in hash order when iterating over the repos in the manager. Let's say that the manager iterates and finds C:\MyCache first. 4. While building up the MirrorRequest, we associate C:\MyCache\ repository with the request in the collect phase 5. The code triggers a load of the repo. The manager finds the RepositoryInfo object associated with C:\MyCache since the URLUtil.equals() matches even if there is a trailing slash. 6. The manager loads the repo, and calls add with the repo object 7. The repo add looks up the RepositoryInfo based on -key- (not by iterating) which finds the correct RepositoryInfo associated to C:\MyCache\ and sets the repository = reference to the loaded repo. 8. Repeat for all the artifacts being downloaded and for each one, the C:\MyCache RepositoryInfo has no value in repository field and thus triggers another load. 9. Upon completion of the collect action, all of the MirrorRequests are successful, but only one of the added ones shows up in the bundle pool and ultimately in the artifacts.xml. 10. The prepare partial IU phase then breaks since one of the expected pieces of software is not actually found in the list of cached artifacts Solution: I'm attaching a patch that solves this by fixing the repo to ignore trailing slashes in URLs for all repos as well as by switching to instead of using the iterator form for finding the repo, simply use the look up by key for the repo. The combination of these fixes ensures that two clients attempting to get the bundle repo receive the same value even if the trailing slashes are present and that the loads and adds go to the same place. Also, in the patch there are fixes for synchronization since upon inspection, there were areas in the manager that would allow two threads to create / load the same repo under certain cases. Note that there may be related issues over on the metadata repo manager but I have not looked yet. In the MirrorRequest, there was some cleanup code to remove a descriptor from the repo upon particular failures, but when we were diagnosing this issue, we found that the "protective" cleanup call was not done for errors if the original attempt was for a canonical descriptor (or vice versa -- a bit of a blur now after extensive digging in the code). Also, note that this failure (of the mirror requests succeeding but not really working) caused a slew of other odd behavior around provisioning of the software after the collect phase was completed. When we did some diagnostics by adding a cache-dump phase before and after collect, we could indeed validate that the bundle pool only had a single value added to it even if the collect phase supposedly downloaded 500 pieces of software.
Tentatively marking for 3.4.1
Created attachment 109927 [details] Minor cleanup on previous patch - removed references to genuitec logging code - fixed NON-NLS warnings, formatted.
If I recall, the purpose of the approach of using URLUtil#sameURL was to avoid duplicate entries for the same physical repository. For example c:\MYCACHE and c:\mycache are the same physical repository on Windows but their location keys would be different. Having said that, from your investigation there were obviously leaks in this code that allowed multiple entries to be created for the same repository. The addRepository(URL) method does a contains() check that also checks for the with/without trailing slash variants, but the addRepository(IArtifactRepository) method does not. My best guess is that this is how we ended up with the duplication. Although there is possibly a more localized fix to address the problem, I'm inclined to use this patch and handle the case variant problem in a similar way (convert to canonical form in the getKey(..) method). Some more things to do: - apply same fix in MetadataRepositoryManager - Regression tests. Leaving this for today because the p2 repository is about to be moved to its new location.
Created attachment 109930 [details] Latest patch Removed code in contains(..) that did a lookup with alternate key with/without trailing slash. This is no longer needed if we convert the location key to canonical (non-slash) form.
Created attachment 110014 [details] Test case reproducing the problem
I have released the patch and tests to HEAD. I will backport to 3.4.x next week assuming all goes well in 3.5 stream.
Released to 3.4.1.
*** Bug 196870 has been marked as a duplicate of this bug. ***