Community
Participate
Working Groups
If the MD5 hash is bad, then we get into an Infinite loop. This is because the repo doesn't get marked as 'bad' so we just keep retrying.
I spent some time on this, and I'm not sure there is an easy fix here. We use a mirror list (and mirror selector). Each mirror has a 'failure' count, which goes up and down depending on how well the mirror responds to certain requests. For example, if you download a file with a decent connection speed the mirror failure count goes to zero. If something fails (MD5 hash for example) the failure count goes up. However, as long as there exists at least 1 mirror whose failure count != 2, we will keep retrying. In the case of the MD5 problem, we will try the mirror, download the artifact (set failure count to 0), check the MD5 (which fails, set the failure count to 1), and repeat. While we can modify this algorithm a bit, it turns out there is a lot of things that contribute to the failure count, and in a system with lots of mirrors, it's hard to know for sure if we will ever terminate. For Eclipse 3.6.2, I propose we put a hard limit on the number of times we will try to mirror a particluar artifact. For example, if we have tried 50 consecutive times without success (to download a single artifact) then just give up. This will at least ensure we don't end up in an infinite loop. For 3.7 maybe we can look at a better mirrors selection / failure count solution.
Created attachment 180456 [details] Patch v1 This puts a limit of 50 on the number of times a particular mirror request will be tried.
John, I wanted to bring this bug to your attention. Pascal mentioned that you're probably the best person to architect a solution. If you want to chat about the problem, ping me on IRC.
A perhaps bold opinion, but why don't we remove the whole mirroring concept from p2 altogether? There are plenty of solutions out there that deals with this in much better ways. As a side note, I'm not even able to find documentation on how to set up a server farm that supports p2's mirroring scheme. Is anyone using it besides Eclipse.org?
(In reply to comment #4) > A perhaps bold opinion, but why don't we remove the whole mirroring concept > from p2 altogether? There are plenty of solutions out there that deals with > this in much better ways. While this might be a possibility, this is not a 3.6.2 solution. We need both a 3.6.2 and 3.7 solution here.
(In reply to comment #4) > A perhaps bold opinion, but why don't we remove the whole mirroring concept > from p2 altogether? There are plenty of solutions out there that deals with > this in much better ways. Suggestions? > As a side note, I'm not even able to find documentation on how to set up a > server farm that supports p2's mirroring scheme. Is anyone using it besides > Eclipse.org? As far as I recall the mirroring scheme in p2 is there to support the Eclipse.org setup not the other way around. They supply a xml doc that lists the mirrors thought to contain interesting stuff and we have to cycle through them trying to find what we need. Ian's approach seems like a reasonable way to address the immediate problem though personally I'd go for a smaller number of tries.
(In reply to comment #6) > (In reply to comment #4) > > A perhaps bold opinion, but why don't we remove the whole mirroring concept > > from p2 altogether? There are plenty of solutions out there that deals with > > this in much better ways. > > Suggestions? > Use "Warp speed from the amazon cloud" ;-) > > As a side note, I'm not even able to find documentation on how to set up a > > server farm that supports p2's mirroring scheme. Is anyone using it besides > > Eclipse.org? > > As far as I recall the mirroring scheme in p2 is there to support the > Eclipse.org setup not the other way around. Yes. But since p2 is a generic provisioning system, I would assume a built in mirror mechanism to be generic as well. I would also assume documentation on par with other API's. At present, the mirror system is tied to Eclipse.org and useless to other provisioning organizations unless they first engage in advanced reverse engineering.
(In reply to comment #7) > Yes. But since p2 is a generic provisioning system, I would assume a built in > mirror mechanism to be generic as well. I would also assume documentation on > par with other API's. At present, the mirror system is tied to Eclipse.org and > useless to other provisioning organizations unless they first engage in > advanced reverse engineering. In case this is just an indirect way of requesting improvement to the doc, I have entered bug 330247 to improve the documentation. There is nothing in the p2 mirroring mechanism that ties it specifically to eclipse.org.
(In reply to comment #6) > Ian's approach seems like a reasonable way to address the immediate problem > though personally I'd go for a smaller number of tries. Sure, 50 is likely a little high. But I wouldn't make it too small (say 25-30). Remember, this check is really just a fail safe. 99.9% of the time, this limit is never reached.
The problem is real, the patch almost complete, if not complete. We should strive to release that and not let the patch rot. Assigning to Ian to drive to completion.
Ian, this seems similar to bug 340165. We are successfully downloading the artifact but then one of the processing steps is failing so we continue to re-download, etc. Thoughts?
*** Bug 340165 has been marked as a duplicate of this bug. ***
Ian, is there a related issue here in that we are setting the status code to RETRY=true? I think this is roughly the order in which we do things: - download artifact successfully - set retry=true - run md5, unpack, jar verifier, etc - return error (note retry is still true) - code says: do { ... } while (code == RETRY);
(In reply to comment #13) > Ian, is there a related issue here in that we are setting the status code to > RETRY=true? > > I think this is roughly the order in which we do things: > - download artifact successfully > - set retry=true > - run md5, unpack, jar verifier, etc > - return error (note retry is still true) > - code says: > do { > ... > } while (code == RETRY); Yes, that's roughly the steps. IIRC, if you have one 'good' mirror left, then we don't set retry to false. Now, you might think that the mirror we just tried (and failed to validate the MD5 hash from) would be 'bad', but here's the catch, each mirror gets a second chance. So this 'almost bad' mirror is tested again and artifact is fetched and p2 marks the mirror 'good' again. Of course, the Md5 fails, so the mirror is 'almost bad' and the loop continue (retry the artifact, everyone is happy, fail the md5, wash rinse repeat). Let's at least get the retry limit solution in for 3.7
I've set the max retry count to 10 and committed the fix to head. FWIW, simply by reading the comments you can see the problem. When we fail to download an artifact from a mirror we write: don't punish the mirror too much just because we failed to download one artifact. Higher up in the stack we write: keep trying to download the artifact as long as we have one good mirror left. This is not great long term solution, but at least we will stop trying after 10 attempts. The reason this problem is hard is because there is no exposed API for mirror selection. Mirror selection is done by the repository itself, however, we while downloading the artifacts we make some assumptions about how those repositories will behave (wrt to mirror selection). To fix this properly, we will likely need to make mirror selection part of the artifact repository API. To make this even more confusing, we have mirror requests (mirror as a verb) and mirrors (mirror as a noun). ;-)
I've opened bug 341405 to discuss enhancing the mirror concept to support 'bad artifacts in good mirrors'.