Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 329923

Summary: p2 meta-data files are being served from mirrors
Product: Community Reporter: David Williams <david_williams>
Component: Cross-ProjectAssignee: Eclipse Webmaster <webmaster>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: am2605, contact, irbull, mario.pierro, mober.at+eclipse, pwebster, raulfortes, scott, stephan.herrmann
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows Server 2008   
Whiteboard:
Attachments:
Description Flags
simple bash script to check content.jar files for Indigo M3. none

Description David Williams CLA 2010-11-10 11:42:28 EST
And, I'm not sure this is such a good idea. Maybe its ok, but thought it deserved some attention and discussion. 

What I mean is that a request for 

http://download.eclipse.org/webtools/downloads/drops/R3.3.0/S-3.3.0M3-20101104191817/repository/content.jar

will automatically be redirected to 

http://ftp.osuosl.org/pub/eclipse/webtools/downloads/drops/R3.3.0/S-3.3.0M3-20101104191817/repository/content.jar

(from my location in North Carolina ... may depend on location?) 

This is questionable for a few reasons. Each maybe can be "overcome" but suspect all at least deserve discussion, if not to reconfigure mirroring-redirect rules. 

1. I first became aware of this because someone on mailing list complained their update/install didn't work because they were getting 404 (not found) from the mirror. By the time I tried, the mirror did have the file and was working again ... but ... seems to me by definition mirrors are not always reliable, and these meta-data files are right were reliability is required. 

2. p2 works by getting the relatively small meta-data files, and then using the mirrorsurl in the artifacts.jar/xml file to get the heavy stuff from mirrors .. and it knows to retry next mirror if one fails ... but, if meta-data file fails, then whole thing fails. 

3. We use relative URLs in some of our metadata files ... not sure that works well with mirrors ... not sure they have to duplicate whole, exact file structure? 

4. The meta-data data files are not signed, so, some small chance of malicious tampering there. 

So, if required to serve these files from elsewhere, guess we could work on overcoming the above issues ... but, if there's no hard reason to redirect these requests, then server config could be chagned. I might be missing some, but the p2 meta data files have the following names. Note, I write jar/xml meaning the file extension could be either jar or xml. 

content.jar/xml
artifacts.jar/xml
compositeContent.jar/xml
compositeArtifacts.jar/xml

Can these 8 files be excluded from being redirected?
Comment 1 Stephan Herrmann CLA 2010-11-10 17:14:13 EST
Even with the current scheme I've experienced some painfully long
times of just fetching the metadata. This is time the user actually
waits, eager to select the software to be installed (as opposed to the
actual download when everybody will be at the coffee machine anyway).

So if turning off redirection for metadata possibly creates a new 
bottleneck this could hurt.

Are you saying retry-next-mirror is not working for metadata?
Is that the point?
Comment 2 David Williams CLA 2010-11-10 21:28:09 EST
> 
> Are you saying retry-next-mirror is not working for metadata?
> Is that the point?

No, the point is the settings on the Eclipse Foundation's webserver(s). 
p2 never gets a chance to retry-next-mirror, It doesn't get that far. To be clear, I mean it doesn't get that far when it get's a 404 error while trying to get the initial content.jar. 

p2 has never used the "mirrorsURL" list for this type of metadata ... only uses it once it has figured out what it wants, and goes to get the artifact(s). 

When p2 decides it needs a whole bunch of content.jar/xml files, I'm not sure of the effect on performance. I'd think most the time it'd be "pass or fail", not slower. But could be failing on some content.jar requests, but still finds what it needs else where? 

I forgot the mention, another reason I opened this bug is that when using the b2 aggregator, many repositories show up as "no valid repository" that the provided URL ... yet on the build machine, the b3 aggregator works fine. This could be a bug in the aggregator, or something, but also made me think it is failing to retrieve anything from the auto-mirrored site, but on build.eclipse.org it can get to download.eclipse.org just fine, 

I guess the best test would be for someone to stay up late :) and write a script to try and wget many of the content.jar's provided for indigo or helios, and see how many of them fail. 

But, I don't think going to Eclipse.org to get these relatively small content.jar files would be that much slower than getting them from mirrors, since presumably they are relatively small ... but, guess that's something else that could be investigated explicitly (or, evaluated by someone who actually knows about this stuff, more than I do. :)
Comment 3 David Williams CLA 2010-11-10 23:43:02 EST
well ... I wrote a simple script to directly "get" many of the content.jar files for indigo M3 contributions, About 50 of them. And apparently some really are invalid ... some projects "contribute" more than one URL for some reason. so those failures are not a matter of "bad mirrors". 

Here's some "stats" on the results. All 50 jars totaled approx. 5 Megs. (what's that ... 100K a piece? on average. When I ran the script on build.eclipse.org, none when to a mirror. When I ran here in North Carolina, most were retrieved from download.eclipse.org (as expected) but 6 were automatically redirected to ftp.osuosl.org. None resulted in 404 errors. 

I'll attach the simple script. Note many URLs that would require a compositeContent.jar were simple omitted from the list, to save myself a little editing or programming. More complicated (and complete) tests could be made ... but I think I'd want to do it in Java instead of bash :) I'll attach the script ... if others wanted to run it occasionally, from other locations? it'd be interesting to hear the results (if any 404's occurred and/or different mirrors used.
Comment 4 David Williams CLA 2010-11-10 23:44:46 EST
Assigning to webmasters to reduce spamming so many on auto cc list with each message ... but doesn't mean they are solely responsible for "fixing" if they really want these requests to go to mirrors.
Comment 5 David Williams CLA 2010-11-10 23:46:19 EST
Created attachment 182870 [details]
simple bash script to check content.jar files for Indigo M3.
Comment 6 Stephan Herrmann CLA 2010-11-11 06:52:00 EST
(In reply to comment #3)
> I'll attach the script
> ... if others wanted to run it occasionally, from other locations? it'd be
> interesting to hear the results (if any 404's occurred and/or different mirrors
> used.

Here's my mileage from Berlin:

All found at download.eclipse.org except for the following which were 
fetched from http://ftp.osuosl.org :

http://download.eclipse.org/birt/update-site/4.0-interim/content.jar
http://download.eclipse.org/eclipse/updates/3.7milestones/S-3.7M3-201010281441/content.jar
http://download.eclipse.org/eclipse/updates/3.7milestones/S-3.7M3-201010281441/content.jar
http://download.eclipse.org/eclipse/updates/3.7milestones/compositeContent.jar
http://download.eclipse.org/tools/gef/updates/releases/content.jar
http://download.eclipse.org/webtools/downloads/drops/R3.3.0/S-3.3.0M3-20101104191817/repository/content.jar

(duplicates are duplicates in the script :)

Total time: real    2m47.507s
with a high for emf: 558K in 18s.

I ran it twice and results were actually the same.
Comment 7 Martin Oberhuber CLA 2010-11-11 08:43:20 EST
(In reply to comment #6)
From Salzburg, Austria it's very similar:

 - 2:12 minutes total download time
 - High for EMF Releases (571K in 12 seconds)
        and EMF Milestones (501K in 17 seconds)
 - Same 6 files redirected to osuosl.org as Stephan found

I agree with Stephan that the end user experience of working with the "Install new software" dialog is still not breathtaking in terms of performance. Especially when a couple of composites are enabled, such as when starting with the Eclipse 4.1 M3 SDK from http://download.eclipse.org/e4/sdk/ .
Comment 8 Denis Roy CLA 2010-11-11 09:20:26 EST
> And, I'm not sure this is such a good idea.

It's all just a seasonal thing -- September and October are our busiest months of the year.  We're seeing a return to normalcy, so these redirects will be turned off soon.

I figured it would be better to redirect high-traffic files to stable mirrors with gobs of bandwidth, rather than having our users wait 2 minutes to fetch meta-data files from d.e.o at 27K/sec...
Comment 9 David Williams CLA 2010-11-11 10:47:52 EST
(In reply to comment #8)
> > And, I'm not sure this is such a good idea.
> 
> It's all just a seasonal thing -- September and October are our busiest months
> of the year.  We're seeing a return to normalcy, so these redirects will be
> turned off soon.
> 
> I figured it would be better to redirect high-traffic files to stable mirrors
> with gobs of bandwidth, rather than having our users wait 2 minutes to fetch
> meta-data files from d.e.o at 27K/sec...

Ok, well now that we know its intentional, and desirable, I guess the next step is to assess if anything in p2 (or our process) needs to change. Maybe not, I'm just asking.  Let's assume for now the 404 error is rare and doesn't need any sort of fix or fallback behavior. That leaves

1) should these meta-data type files be signed? 

2) does p2 handle the relative URLs correctly when compositeContent.jar is fetched from a mirror? 

The first is a general security type question to everyone ... the second is the more important question, and I hope p2 team knows off the top of their head, so I'll "assign" bug to Pascal to help get his attention.
Comment 10 David Williams CLA 2010-11-11 21:39:18 EST
Well, maybe 404 is not so rare after all ... here's a comment from platform newsgroup. This is the third person having issues ... the "workaround" he refers to is going to a non-eclipse site. 


= = = = = = = 

Thanks for the workaround Shane.

I have been having the exact same issue since about Friday.  Here's the output of wget from my command line.

C:\Users\andrew>wget http://download.eclipse.org/releases/helios/compositeConten
t.jar
SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files\gnuwin32/etc/wgetrc
--2010-11-10 10:25:54--  http://download.eclipse.org/releases/helios/compositeCo
ntent.jar
Resolving download.eclipse.org... 206.191.52.47
Connecting to download.eclipse.org|206.191.52.47|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://www.gtlib.gatech.edu/pub/eclipse/releases/helios/compositeConte
nt.jar [following]
--2010-11-10 10:25:55--  http://www.gtlib.gatech.edu/pub/eclipse/releases/helios
/compositeContent.jar
Resolving www.gtlib.gatech.edu... 128.61.111.10, 128.61.111.11, 128.61.111.9
Connecting to www.gtlib.gatech.edu|128.61.111.10|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2010-11-10 10:25:56 ERROR 403: Forbidden.


I've literally wasted 3 days on this! 
= = = = =
Comment 11 David Williams CLA 2010-11-12 13:52:24 EST
And, now I am not getting a 404, but several hours after making a change to 
/indigo/releases/compositeContent.jar I am still getting "old" version from 
www.gtlib.gatech.edu

Not good. 

$ wget http://download.eclipse.org/releases/indigo/compositeContent.jar
--2010-11-12 13:43:07--  http://download.eclipse.org/releases/indigo/compositeContent.jar
Resolving download.eclipse.org... 206.191.52.47
Connecting to download.eclipse.org|206.191.52.47|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://www.gtlib.gatech.edu/pub/eclipse/releases/indigo/compositeContent.jar [following]
--2010-11-12 13:43:07--  http://www.gtlib.gatech.edu/pub/eclipse/releases/indigo/compositeContent.jar
Resolving www.gtlib.gatech.edu... 128.61.111.11, 128.61.111.9, 128.61.111.10, ...
Connecting to www.gtlib.gatech.edu|128.61.111.11|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 413 [application/x-java-archive]
Saving to: `compositeContent.jar.2'
Comment 12 Denis Roy CLA 2010-11-12 14:39:43 EST
I've removed the redirects.  Ah well, I guess a crawling download is better than a broken one.
Comment 13 David Williams CLA 2010-11-12 14:53:21 EST
(In reply to comment #12)
> I've removed the redirects.  Ah well, I guess a crawling download is better
> than a broken one.

Thank you. Especially for these 8 specific files ... people depend on them for downstream builds, usually immediately after the URL is "made available" and it would be very hard (if not impossible) for them to know they are getting "old" stuff.
Comment 14 Raul Fortes CLA 2010-11-17 14:27:31 EST
I have problem with a repository:

"Unable to read repository at http://download.eclipse.org/eclipse/updates/3.6.
http://download.eclipse.org/eclipse/updates/3.6 is not a valid repository location."

I using 3.6.1 for Linux 64bit.

Any idea ?

[]'s
Raul
Comment 15 Mario Pierro CLA 2012-01-18 09:13:35 EST
Are the meta-data files still being served from mirrors?

Some of our users were unable to install the plugins because of a missing EMF dependency.

Running wget to fetch compositeContent.jar results in a redirect to the gatech.edu mirror, which seems to be down.

>wget http://download.eclipse.org/releases/helios/compositeContent.jar
--15:05:51--  http://download.eclipse.org/releases/helios/compositeContent.jar
           => `compositeContent.jar'
Resolving download.eclipse.org... 206.191.52.47
Connecting to download.eclipse.org|206.191.52.47|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://www.gtlib.gatech.edu/pub/eclipse/releases/helios/compositeContent.jar [following]
--15:05:51--  http://www.gtlib.gatech.edu/pub/eclipse/releases/helios/compositeContent.jar
           => `compositeContent.jar'
Resolving www.gtlib.gatech.edu... 128.61.111.9, 128.61.111.10, 128.61.111.11
Connecting to www.gtlib.gatech.edu|128.61.111.9|:80... failed: Connection timed out.
Connecting to www.gtlib.gatech.edu|128.61.111.10|:80... failed: Connection timed out.
Connecting to www.gtlib.gatech.edu|128.61.111.11|:80... failed: Connection timed  out.

Any ideas?
Comment 16 Mario Pierro CLA 2012-01-18 09:55:44 EST
I was able to fix this by:

* Disabling the existing http://download.eclipse.org/releases/helios update site (or http://download.eclipse.org/releases/indigo for Eclipse 3.7)

* Adding one of the mirror sites directly, e.g. http://ftp-stud.fht-esslingen.de/pub/Mirrors/eclipse/releases/helios/ (or http://ftp-stud.fht-esslingen.de/pub/Mirrors/eclipse/releases/indigo/ for Eclipse 3.7)

If the download.eclipse.org update site is not disabled, the install will proceed but it will be too slow to be usable. I suppose this is because the broken mirror is still being accessed first, switching to the new mirror after it has timed out.

So, basically it seems that the redirection system in eclipse.org uses mirrors which are not working, and nothing can be done automatically from the client side to prevent this...
Comment 17 David Williams CLA 2012-01-18 13:53:23 EST
According to cross-project posting, current issue was fixed by "I've redirected these to OSU OSL". But ... point of this bug was that metadata itself (those 8 files) should not ever come from mirrors. See comment 12. 

I guess it works most of the time ... but ... that's not really the way it was designed to work and will sometimes fail outright or (maybe worse) appear to work but be out of date. 

FYI, besides the 8 files originally mentioned, p2.index should come from mirrors either (with new things I've learned since this was opened). 

I understand the reasoning behind redirecting to a mirror, but ... there are risks involved when doing so ... for these 9 file names.
Comment 18 David Williams CLA 2012-02-22 22:32:42 EST
This is about to bite us again. 

While not "publically released" yet, nor yet tied in to indigo SR2 -- until Friday at 9 AM ... I wanted to do an early test of platform's "access" for p2. 

Hence (knowing the secret location :) I pointed p2 to 

http://download.eclipse.org/eclipse/updates/3.7/R-3.7.2-201202080800/

and received "no repo found". Having learned from experience, I tried 

http://download.eclipse.org/eclipse/updates/3.7/R-3.7.2-201202080800/content.jar 
from a browser and wget and could see this request was being incorrectly mirrored to a mirror that did not have that file:

$ wget http://download.eclipse.org/eclipse/updates/3.7/R-3.7.2-201202080800/content.jar
--2012-02-22 22:13:28--  http://download.eclipse.org/eclipse/updates/3.7/R-3.7.2-201202080800/content.jar
Resolving download.eclipse.org... 206.191.52.47
Connecting to download.eclipse.org|206.191.52.47|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://ftp.osuosl.org/pub/eclipse/eclipse/updates/3.7/R-3.7.2-201202080800/content.jar [following]
--2012-02-22 22:13:28--  http://ftp.osuosl.org/pub/eclipse/eclipse/updates/3.7/R-3.7.2-201202080800/content.jar
Resolving ftp.osuosl.org... 64.50.233.100, 64.50.236.52
Connecting to ftp.osuosl.org|64.50.233.100|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2012-02-22 22:13:28 ERROR 404: Not Found.

Particularly disappointing, since my mini test script shows there are 8 mirrors containing the artifacts

number of http mirrors:    8  for /eclipse/updates/3.7/R-3.7.2-201202080800/

but no way to get to them without the content.jar. 

If this kind of thing happens Friday, users won't get updates as expected, either, at all ... or, perhaps "inaccurate" or "partial" updates.
Comment 19 David Williams CLA 2012-02-22 22:33:38 EST
Not sure why this was assigned to Pascal ... seems a "webmasters" problem to solve.
Comment 20 Mario Pierro CLA 2012-02-28 08:29:11 EST
This issue is happening again now for users of our plugins, as an EMF dependency needs to be downloaded from download.eclipse.org when they are installed.

As mentioned in my previous comment, a workaround is to add one of the mirror sites to the list of available update sites - but it requires all download.eclipse.org update sites to be disabled - leaving users with unstable settings once the installation has finished.
Comment 21 David Williams CLA 2013-08-19 15:45:45 EDT
Haven't seen any issues for a while, so will re-close as fixed. 

Be sure to say if others see issues ... or if I am misunderstanding.