Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 79212

Summary: Connection is broken during an update
Product: [Eclipse Project] Platform Reporter: Roman Smirak <smirakr>
Component: Update (deprecated - use Eclipse>Equinox>p2)Assignee: Platform-Update-Inbox <platform-update-inbox>
Status: RESOLVED DUPLICATE QA Contact:
Severity: normal    
Priority: P3 CC: champion, david_williams, Michal.Tkacz, rk, stmoebius, ws
Version: 3.0   
Target Milestone: 3.2 RC7   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on: 144876    
Bug Blocks:    
Attachments:
Description Flags
netstat log during an update
none
an error dialog I can see when an update fails
none
netstat since KeepAlive set to Off
none
netstat log about unsuccessful test on linux none

Description Roman Smirak CLA 2004-11-22 12:31:29 EST
an error with a connection arises during an update although everything is ok 
with network and update site; moreover, my workstation is disconnected from 
intranet for cca 1 minute (i.e. you can not ping from the station to some 
neighbor). Local network works Ok (i.e. other stations does not have any 
problem with connection). 
 
We do not have exact case to reproduce or describe the problem configuration, 
unfortunately - the problem has been observed at different PC's or networks (at 
our customer's side). 
 
Mode: 
- Local network update site: 1/ Apache httpd (2.0.49-4) + Apache Tomcat 
(5.0.28), OS: Linux, Windows2000, where Apache httpd is used for download of 
the upgrades, 2/ (other customer => other network & server node) Apache Tomcat 
(4.1.27), OS: Windows2000 
 
Platform:
- Eclipse3.0
- Java 1.4
- MS Windows 2000 as well as XP
- PC, 10/100MBit Ethernet
Comment 1 Roman Smirak CLA 2004-11-23 04:04:06 EST
Another issue: we have switched to Eclipse3.0.1 & made some changes in our 
code; now, it spent cca 1min with a verification of random plugin, let say 
plugin-X: ping returns time less than 1ms, no interruption during pinging 
arises, processor utilization: less than 1%, network utilization: less than 1% -
 at both: at client as well as at server side; download of next plugin-Y fails 
because of: java.net.SocketTimeoutException: Read timed out. Results from test 
with local update site (directly from filesystem) using same build of an 
application: it works well including no pause during the plugin-X verification. 
 
Web Server:
Single Apache (2.0.52) used on local network, no proxy is used.
 
Note that update site contains 138 plugins.
Comment 2 Dorian Birsan CLA 2004-11-23 08:20:03 EST
Roman, what do you mean by "verification" ? Is it the string that you see in 
the progress monitor ("Verifying blah blah...") ?

Because there are many connections being created (one for each plugin), 
perhaps there is something in the network layer that can't handle. What jre 
are you using? (you only mention 1.4, but not the exact version and provider)

Have you tried a different http server for windows? How about apache/tomcat on 
an xp server?
Comment 3 Roman Smirak CLA 2004-11-23 09:59:38 EST
Dorian,

   about verification: yes, you are right - it means the string in context of 
progress bar.

about JVM: 1/ Sun JVM build 1.4.2_05-b04, 2/ (other test) Sun JVM build 1.5.0-
b64

about http server: I've tested issue no.2 (pause during 
verification/installation => SocketTimeoutException) with Apache on Linux 
successfully !!? Note that updating using Apache on Win2000 doesn't work yet 
and produces issue no.2 still. Note that I can not say any progress about first 
issue, because it did not indicate regular occurrence.

I should try other http server for windows but I can't see any reason why 
proofed solution like apache doesn't work.
Comment 4 Roman Smirak CLA 2004-11-23 13:13:32 EST
Result of test with Tomcat on Windows: successful. But I can find exceptions in 
context of catalina.log:

StandardWrapperValve[default]: Servlet.service() for servlet default threw 
exception
java.net.SocketException: Software caused connection abort: socket write error
 at java.net.SocketOutputStream.socketWrite0(Native Method)
 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
 at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 at 
org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite
(InternalOutputBuffer.java:668)
 at org.apache.coyote.http11.filters.IdentityOutputFilter.doWrite
(IdentityOutputFilter.java:160)
 at org.apache.coyote.http11.InternalOutputBuffer.doWrite
(InternalOutputBuffer.java:523)
 at org.apache.coyote.Response.doWrite(Response.java:524)
 at org.apache.coyote.tomcat4.OutputBuffer.realWriteBytes(OutputBuffer.java:384)
 at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:439)
 at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:359)
 at org.apache.coyote.tomcat4.OutputBuffer.writeBytes(OutputBuffer.java:411)
 at org.apache.coyote.tomcat4.OutputBuffer.write(OutputBuffer.java:398)
 at org.apache.coyote.tomcat4.CoyoteOutputStream.write
(CoyoteOutputStream.java:110)
 at org.apache.catalina.servlets.DefaultServlet.copyRange
(DefaultServlet.java:1996)
 at org.apache.catalina.servlets.DefaultServlet.copy(DefaultServlet.java:1745)
 at org.apache.catalina.servlets.DefaultServlet.serveResource
(DefaultServlet.java:1073)
 at org.apache.catalina.servlets.DefaultServlet.doGet(DefaultServlet.java:506)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:740)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter
(ApplicationFilterChain.java:247)
 at org.apache.catalina.core.ApplicationFilterChain.access$000
(ApplicationFilterChain.java:98)
 at org.apache.catalina.core.ApplicationFilterChain$1.run
(ApplicationFilterChain.java:176)
 at java.security.AccessController.doPrivileged(Native Method)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter
(ApplicationFilterChain.java:172)
 at org.apache.catalina.core.StandardWrapperValve.invoke
(StandardWrapperValve.java:256)
 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNex
t(StandardPipeline.java:643)
 at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
 at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
 at org.apache.catalina.core.StandardContextValve.invoke
(StandardContextValve.java:191)
 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNex
t(StandardPipeline.java:643)
 at org.apache.catalina.valves.CertificatesValve.invoke
(CertificatesValve.java:246)
 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNex
t(StandardPipeline.java:641)
 at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
 at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
 at org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2416)
 at org.apache.catalina.core.StandardHostValve.invoke
(StandardHostValve.java:180)
 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNex
t(StandardPipeline.java:643)
 at org.apache.catalina.valves.ErrorDispatcherValve.invoke
(ErrorDispatcherValve.java:171)
 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNex
t(StandardPipeline.java:641)
 at org.apache.catalina.valves.ErrorReportValve.invoke
(ErrorReportValve.java:172)
 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNex
t(StandardPipeline.java:641)
 at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
 at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
 at org.apache.catalina.core.StandardEngineValve.invoke
(StandardEngineValve.java:174)
 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNex
t(StandardPipeline.java:643)
 at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
 at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
 at org.apache.coyote.tomcat4.CoyoteAdapter.service(CoyoteAdapter.java:223)
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:601)
 at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnectio
n(Http11Protocol.java:392)
 at org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:565)
 at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run
(ThreadPool.java:619)
 at java.lang.Thread.run(Thread.java:534)
Comment 5 Dorian Birsan CLA 2004-11-23 13:24:07 EST
Thanks.
The apache server works best on linux (even though it is multiplatform, the 
windows version is probably not as good iis).
The error from Tomcat maybe something that happened, but the server recovered.
You may want to post some question or check the archives at the apache's 
jakarta project to get more info on tomcat errors.

There may be a hidden bug in update, or in the jre itself (some network 
streams not properly closed, etc.), but it is hard to pinpoint what's 
happening. I tried updating the entire eclipse from my local apache http 
server and I had no problems. At some point, I was also copying another very 
large file into the same DocumentRoot folder for the apache http server, in 
which case the whole local network download became very slow. This is no 
related to what you do, but just fyi.
Comment 6 Roman Smirak CLA 2004-11-23 13:35:20 EST
Ok, thanks. I have write Summary of the problem below. If you have any idea 
what should I do next or if you can answer my questions in context of it, would 
be very helpful.  

Summary:
Issue no. 1: 
- description: problem during update leading to complete disconnection of a 
client (when system can do that ??)
- occurrence: unknown (different networks, different clients)

Issue no. 2: 
- description: problem during installation of some plugin, download of the next 
plugin fails with Read timed out.
- occurrence: 
- it doesn't work with a client on Windows XP (2 stations tested)
- it doesn't work on platform SDK 3.0 as well as 3.0.1
- automatic update as well as update made by hand (feature-by-feature) doesn't 
work
- local update site (file system) does work without any problem 
- it doesn't work with Apache on Windows, it does work with Apache on Linux and 
Tomcat on Windows (but: Software caused connection abort: socket write error; 
Q: when such kind of error occurres ?)
- ping between client and server does work on 100% (without any break)
- tested on JVM 1.4 as well as 1.5
- tested on local network
- processor utilization: less than 1%, network utilization: less than 1% 
Comment 7 Dorian Birsan CLA 2004-11-24 11:17:42 EST
On issue 2: if feasible, use the os and server (linux/apache) that works, or 
causes the fewest problems

Issue 1: it isn't clear whether the problem still occurs when using 
linux/apache
Comment 8 Roman Smirak CLA 2004-11-24 13:15:54 EST
Dorian,

   the issue no.1 occurred in all tested cases, unfortunately - i.e. 
Apache/Linux as well as Tomcat/Windows. Client for all cases: WindowsXP 

News about issue no.1: it seems there is problem with too many connections - 
remove of one of handles pointing to \Device\Tcp => ping and other connections 
to local network started to work.
  
I'm not familiar with windows socket I can't tell you information about that in 
detail. 

it seems to me there is a problem with the concept: a download connection per 
plugin (does it mean: 138 plugins => 138 connections ?). my question: is there 
any possibility to disable that ? (=> only one connection for all.) If it is 
not, then: is there any chance to change it simply in context of source code ? 
Comment 9 Dorian Birsan CLA 2004-11-24 13:29:17 EST
All these connections should be happening in a serial fashion. I wonder if 
anything is left open, the so the network layer does not release those 
connection. I will investigate this.
Comment 10 Roman Smirak CLA 2004-11-25 05:21:48 EST
Created attachment 16124 [details]
netstat log during an update

there you can see info produced by netstat tool I executed in different states
of an update
Comment 11 Roman Smirak CLA 2004-11-25 05:24:18 EST
Created attachment 16125 [details]
an error dialog I can see when an update fails
Comment 12 Dorian Birsan CLA 2004-11-25 08:54:19 EST
Thanks for the extra info, that should help.
You may also play with the http server, they usually have settings for the 
number of connections allowed.
Some download code in update is done via threads, but only to be able to 
cancel, in which case the connection dies by itself. I may need to revisit 
that code to see if it can cause your problems.
I would also recommend trying various jdk's, to see if things improve.
Comment 13 Roman Smirak CLA 2004-11-25 09:16:22 EST
I did it, I played with the params like no. of connections - no effect. Then I 
had changed param KeepAlive On -> Off and result was: 1/ the connections in 
state ESTABLISHED erlier was in state CLOSE_WAIT then (represents waiting for a 
connection termination request from the local user) but stay there till 
application close 2/ the error occurred later than in case of the previous set-
up.

I'm worry about long ongoing in state FIN_WAIT_2 you can see in previous log as 
well.

So, it seems to me there is realy hard-core problem at low network level about 
connection finnalization.

I'm going to send other log.
Comment 14 Roman Smirak CLA 2004-11-25 09:18:31 EST
Created attachment 16130 [details]
netstat since KeepAlive set to Off
Comment 15 Dorian Birsan CLA 2004-11-25 09:28:20 EST
I assume you're running with the latest fix packs, right ?
(or at least as far as networking fixes goes).
Comment 16 Roman Smirak CLA 2004-11-25 09:30:26 EST
do you mean windows service packs or jvm fixes or ?
Comment 17 Dorian Birsan CLA 2004-11-25 09:35:27 EST
sorry, meant OS fixes
Comment 18 Roman Smirak CLA 2004-11-25 09:39:15 EST
yes, I do have latest pack, my current OS: Microsoft Windows XP - Professional, 
Version 2002, Service Pack 2.
Comment 19 Dorian Birsan CLA 2004-11-25 09:46:55 EST
given all the discussions around service pack 2 for xp, I would have blamed 
that, but you see the problem on win2k, so there must be something else.
I wouldn't rule out bugs in update code, but so far, this is the first time 
we've heard this problem report.
Anyway, I will investigate more, but there are some competing issues that I 
also need to look at, so don't expect a very quick resolution.
thanks!
Comment 20 Roman Smirak CLA 2004-11-25 11:39:13 EST
Tested on Linux + Sun JRE build 1.4.2_04-b05:
1/ Client: Eclipse3.0, Linux 2.4.20-8 (Red Hat Linux 3.2.2-5), Server: Windows 
XP SP1, Apache2.0.52 => same error (see attached netstat log)
2/ Client: Eclipse3.0, Linux 2.4.20-8 (Red Hat Linux 3.2.2-5), Server: Linux 
2.4.20-8 (Red Hat Linux 3.2.2-5), Apache HTTPD 2.0.49-4 => SUCCESSFUL
Comment 21 Roman Smirak CLA 2004-11-25 11:40:23 EST
Created attachment 16133 [details]
netstat log about unsuccessful test on linux
Comment 22 Dorian Birsan CLA 2004-11-25 11:55:29 EST
Without dragging this for too long, so far it looks like the http server does 
not work well on windows, but the update client needs a bit more investigation 
to check that it closes connections. 

Did some searches and found a few interesting docs:
http://www.apache.org/dist/httpd/binaries/win32/README.html#xpbug
http://www.auburn.edu/docs/apache/misc/fin_wait_2.html
Comment 23 Stefan Moebius CLA 2005-07-07 09:17:25 EDT
I'm also seeing this when trying to install official eclipse.org plugins (like
EMF). I'm running Eclipse 3.1.0 on WinXP/SP2. I tried different mirrors
(SunSITE, eclipse.org, 100MB mirror) to no avail. The update process hangs at
some random point into verifying the downloads.

Is there anything I can do to help track this down?
Comment 24 Michael Reitz CLA 2005-07-10 08:42:57 EDT
Hi

I noticed the same bug after installing Eclipse 3.1. Eclipse 3.0 went fine, but
now with Eclipse 3.1 when I want to install new features through the update
manager, my complete network connection hangs after a certain time. That means
that my pc can't connect to anything in the web or the local lan.

The difference to the problem of the bug reporter is, that it happens when im
updating official features like EMF from the official update site. A workaround
is to only update one feature at once, but this works only for small features,
not for EMF. If the network connection is broken I have to kill eclipse and the
connection returns.

Facts:
Eclipse 3.1
Java(TM) 2 Platform Standard Edition 5.0 Update 2
WinXP with SP2

Yours sincerely
Michael Reitz
Comment 25 Eric Anderson CLA 2005-11-19 14:38:13 EST
I can also confirm this bug on Windows XP SP2 when searching for updates for all 
features. The network stops working about one minute after starting the update 
process. A connection returns when Eclipse is exited. Both local area 
connections (to shares on local Windows XP and Linux computers) and the Internet 
are lost during this period.

Using Eclipse 3.1.1, Windows XP Service Pack 2, JDK 1.5.0_05. Ethernet 
controller is a Gigabit Marvell Yukon 88E8001/8003/8010 with latest manufacturer 
drivers. Same behavior occurs with Gigabit nVidia nForce NIC. 

I recently bought my current new computer. Before I had the new machine, I never 
had this problem. I have attached my computer's full specs in case it helps.

I think, as mentioned before, this is a problem with too many concurrent open 
connections. Windows XP SP2 restricts the number of TCP connections per second 
to 10. If I recall correctly, I had installed an unoffical patch on my old 
computer that restored the SP1 number of connections per second. I believe this 
was before I started using Eclipse (3.1 betas) on that machine.

System
======
Windows XP Professional Service Pack 2 (build 2600)

2.20 gigahertz AMD Athlon64 X2 4400+
128 kilobyte primary memory cache
1024 kilobyte secondary memory cache	
 	
ASUSTeK Computer INC. A8N-SLI DELUXE 1.XX
Bus Clock: 200 megahertz
BIOS: Phoenix Technologies, LTD ASUS A8N-SLI DELUXE ACPI BIOS Revision 1013 07/
26/2005	

2048 Megabytes PC3200 in Dual Channel

250.05 gigabyte Seagate ST3250823AS HDD

_NEC DVD_RW ND-3540A [CD-ROM drive]
LITE-ON DVD SOHD-16P9S [CD-ROM drive]
3.5" format removeable media [Floppy drive]
	
NVIDIA GeForce 6600 GT [Display adapter]
DELL M770 [Monitor] (14.9"vis, s/n 1780RH3SF024, January 2000)	

Marvell Yukon 88E8001/8003/8010 PCI Gigabit Ethernet Controller	
IP Address: 		192.168.1.108
Gateway: 			192.168.1.1	
Dhcp Server: 		192.168.1.1

NVIDIA nForce Networking Controller	(not used)
 	
Virus Protection
================	
avast! antivirus 4.6.731 [VPS 0546-4] Version 4.6.731	
    Realtime File Scanning On	
Comment 26 KP CLA 2006-04-14 08:13:58 EDT
I have exactly the same problem

Eclipse Version: 3.1.2
Build id: M20060118-1600

on Windows XP

on Linux it worked however. The connection is last at some predictable point (during verification of some RAR for some updates, and during download for another). It does the same thing on different servers at the same moment.
Comment 27 Branko Tripkovic CLA 2006-06-22 16:06:22 EDT
please try this with 3.2 rc7 or later, we released some major improvements in RC7 that probably fixed your problem. 
Comment 28 Eric Anderson CLA 2006-06-22 18:28:12 EDT
(In reply to comment #27)
> please try this with 3.2 rc7 or later, we released some major improvements in
> RC7 that probably fixed your problem. 
> 

Yes, I no longer encounter this problem. Thanks for fixing it!
Comment 29 Branko Tripkovic CLA 2006-06-23 01:26:18 EDT

*** This bug has been marked as a duplicate of 144876 ***
Comment 30 Wolfgang Schell CLA 2006-10-12 08:11:34 EDT
I can also confirm this bug on Windows XP SP2 when performing auto updates using Eclipse SDK 3.2. The network stops working for at least one minute, sometimes longer.

Interesstingly, SSH sessions seem to survive as long as there is no actual traffic going on (dormant shell session).

After reading comment #25 (https://bugs.eclipse.org/bugs/show_bug.cgi?id=79212#c25) in this bug report, I tryed to diagnose something along these lines.

netstat showed, that Eclipse had 25 connections open to our web proxy, with the 25th remaining in SYNC state.

After disabling "Deterministic Network Enhancer" and "QoS Packet Scheduler" in the network connection's properties dialog, Eclipse was able to open at least 50 connections (probably as many as there were feature/plugin jars mentioned in the site.xml for timestamp checks or something) and everything worked like a charm. 

I can't tell, which of the two services/protocols/drivers blocks connection attempts, as I didn't have the time to rule out one of them, but at least it works now. I already thought, I had a hardware failure, phuh....