Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 84103

Summary: Many duplicate HTTP GETs during update operations
Product: [Eclipse Project] Platform Reporter: Gordon Hirsch <gordon.hirsch>
Component: Update (deprecated - use Eclipse>Equinox>p2)Assignee: Platform-Update-Inbox <platform-update-inbox>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: P2 CC: francois, Joel.Kamentz, pombredanne
Version: 3.1.1   
Target Milestone: 3.2 RC7   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on: 144876    
Bug Blocks:    
Attachments:
Description Flags
Proposed Patch for 84103
none
modified patch none

Description Gordon Hirsch CLA 2005-01-31 18:47:52 EST
While experimenting with my own update site, I noticed that there were many
duplicated HTTP GETs reported in my web server's access log. For example, here's
a small section of my Tomcat log:

10.25.11.12 - - [31/Jan/2005:16:12:21 -0500] "GET /update-site HTTP/1.1" 302 -
10.25.11.12 - - [31/Jan/2005:16:12:21 -0500] "GET /update-site/ HTTP/1.1" 200 2497
10.25.11.12 - - [31/Jan/2005:16:12:21 -0500] "GET /update-site HTTP/1.1" 302 -
10.25.11.12 - - [31/Jan/2005:16:12:21 -0500] "GET /update-site/ HTTP/1.1" 200 2497
10.25.11.12 - - [31/Jan/2005:16:12:21 -0500] "GET /update-site/site.xml
HTTP/1.1" 200 986
10.25.11.12 - - [31/Jan/2005:16:12:21 -0500] "GET /update-site/site.xml
HTTP/1.1" 200 986
10.25.11.12 - - [31/Jan/2005:16:12:21 -0500] "GET /update-site HTTP/1.1" 302 -
10.25.11.12 - - [31/Jan/2005:16:12:21 -0500] "GET /update-site/ HTTP/1.1" 200 2497
127.0.0.1 - - [31/Jan/2005:16:13:24 -0500] "GET /update-site HTTP/1.1" 302 -
127.0.0.1 - - [31/Jan/2005:16:13:24 -0500] "GET /update-site/ HTTP/1.1" 200 2497
127.0.0.1 - - [31/Jan/2005:16:13:24 -0500] "GET
/update-site/features/com.sas.test.simplemenufeature_1.0.0.jar HTTP/1.1" 200 463
127.0.0.1 - - [31/Jan/2005:16:13:24 -0500] "GET
/update-site/features/com.sas.test.simplemenufeature_1.0.0.jar HTTP/1.1" 200 463
127.0.0.1 - - [31/Jan/2005:16:13:24 -0500] "GET
/update-site/features/com.sas.test.simplemenufeature_1.0.1.jar HTTP/1.1" 200 463
127.0.0.1 - - [31/Jan/2005:16:13:24 -0500] "GET
/update-site/features/com.sas.test.simplemenufeature_1.0.1.jar HTTP/1.1" 200 463
127.0.0.1 - - [31/Jan/2005:16:13:24 -0500] "GET
/update-site/features/com.sas.test.simplemenufeature_1.0.2.jar HTTP/1.1" 200 461
127.0.0.1 - - [31/Jan/2005:16:13:24 -0500] "GET
/update-site/features/com.sas.test.simplemenufeature_1.0.2.jar HTTP/1.1" 200 461

The duplication seems due to the way the
org.eclipse.update.internal.core.HttpResponse class is handling connections. 
The getInputStream() method is correctly checking for a pre-existing InputStream
(in), but it is not checking for a pre-existing URLConnection (connection). A
lot of the update manager code that calls getInputStream() calls getStatusCode()
(or maybe getStatusMessage()) first. The first URLConnection is established
there, but it is ignored by getInputStream(). 

Obviously, the extra GETs are significant in a production environment because
they will noticeably increase load on the server. But there is also a problem
with correctness. Status code checking is done with the first connection, but
the response data comes from the second connection. The second connection might
have failed for some reason, and that failure won't be caught. 

I saw this with 3.1M4, but the code seems the same in the latest integration build.
Comment 1 Dorian Birsan CLA 2005-02-01 14:27:21 EST
yes, there has been some discussion on a similar issue in another bug.
I think this should be looked at.
Comment 2 Gordon Hirsch CLA 2005-02-01 16:04:42 EST
Created attachment 17612 [details]
Proposed Patch for 84103

Uses existing connection to avoid duplicated GETs
Comment 3 Gordon Hirsch CLA 2005-02-01 16:10:46 EST
Was 63932 the similar bug you were referring to? It would be great if some of
the redundancies described there could be reduced. I'm not sure I described this
problem well above, but it is a lower-level problem that causes almost every GET
to be issued twice, consecutively. I've attached a very simple patch that seems
to fix the problem. I think it is worth looking at independently of the more
complicated issues. 
Comment 4 Dorian Birsan CLA 2005-02-01 16:21:57 EST
Created attachment 17613 [details]
modified patch

Thanks Gordon. I have modified the patch to set the connection to null after a
disconnect, and also do it for OtherResponse type.
Comment 5 Dorian Birsan CLA 2005-02-01 16:33:02 EST
No, not that bug (unless there is typo in the bug number you entered :-)
There are a couple of bugs opened regarding networking problem, and one of 
them in particular talks about closing connections when opening fails (but 
this can't be done, at least using java API's). I'll dig the bug later...
Comment 6 Dorian Birsan CLA 2005-02-01 16:41:54 EST
fixed as per my modified patch.
Thanks Gordon.
Comment 7 Gordon Hirsch CLA 2005-02-01 16:46:34 EST
Ok, the problem closing connections sounds like
https://bugs.eclipse.org/bugs/show_bug.cgi?id=81967. 

Thanks for the quick resolution. 
Comment 8 Joel Kamentz CLA 2005-12-13 18:39:53 EST
I am having a problem with repeated gets for a jar which has already been downloaded.  I am re-opening this bug, which seems the best match, but it is potentially also related to https://bugs.eclipse.org/bugs/show_bug.cgi?id=81967.

To some extent, there are two problems.  The overriding problem is that the update manager downloads a plugin and then hits the web server _again_ just to check the last modified timestamp.  Using if-modified-since (or whatever that http header is) and generally remembering the timestamp the first time around is the way to go.

A sub-problem is that the timestamp request uses GET (rather than HEAD) and doesn't read the full response or otherwise close the connection.  The _http_ response isn't using the fact that the connection is an _http_ connection!  As for the solution, the best fix would be to get rid of the redundant request.  However, in general, HttpURLConnection.disconnect() is probably too heavy-handed, as it may prematurely disconnect an otherwise re-usable connection to the server?  Better would be to use the HEAD method for mere timestamp checks.

This may sound somewhat trivial, but it really bogs down an update site we have running on Apache.  The Eclipse update manager fetches a bunch of plugins to temp files.  Then, as part of jar validation, it hits the server again for each jar file using the GET method.  However, it doesn't actually read the file contents which are sent back.  Because HTTP 1.1 connection re-use is involved, this causes everything (the update process within Eclipse) to hang until the server times out on that request.  Once that happens, the server responds to the next request, tries to send the file contents which the Eclipse Update Manager isn't paying attention to, and it all happens again.

The update site in question has around 175 plugins totalling over 115 meg.  On our lan, the update manager performs the actual download in maybe 2 minutes.  Then, because of this problem, verifying the downloaded jars takes HOUR(S)!!!

This is with Eclipse 3.1.1.  I've tried it with Java 1.5.0_06 and Java 1.4.2_10.

Besides setting the request method to HEAD, another way to avoid this particular problem is to call getInputStream on the connection and then immediately call close() on the stream.  But, using HEAD would be better, and getting rid of the extra network traffic better still.

[I apparently cannot re-open this bug (or change it to Eclipse 3.1.1), as I'm not the original submitter.  Am I supposed to create a dup bug?  I'll try asking Gordon to re-open this.]
Comment 9 Gordon Hirsch CLA 2005-12-14 11:18:25 EST
Re-opening per Joel's request. 
Comment 10 Branko Tripkovic CLA 2006-06-22 16:55:05 EDT
fixed as part of the fix for bug 144876