| Summary: | Connection is broken during an update | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] Platform | Reporter: | Roman Smirak <smirakr> | ||||||||||
| Component: | Update (deprecated - use Eclipse>Equinox>p2) | Assignee: | Platform-Update-Inbox <platform-update-inbox> | ||||||||||
| Status: | RESOLVED DUPLICATE | QA Contact: | |||||||||||
| Severity: | normal | ||||||||||||
| Priority: | P3 | CC: | champion, david_williams, Michal.Tkacz, rk, stmoebius, ws | ||||||||||
| Version: | 3.0 | ||||||||||||
| Target Milestone: | 3.2 RC7 | ||||||||||||
| Hardware: | PC | ||||||||||||
| OS: | Windows XP | ||||||||||||
| Whiteboard: | |||||||||||||
| Bug Depends on: | 144876 | ||||||||||||
| Bug Blocks: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Roman Smirak
Another issue: we have switched to Eclipse3.0.1 & made some changes in our code; now, it spent cca 1min with a verification of random plugin, let say plugin-X: ping returns time less than 1ms, no interruption during pinging arises, processor utilization: less than 1%, network utilization: less than 1% - at both: at client as well as at server side; download of next plugin-Y fails because of: java.net.SocketTimeoutException: Read timed out. Results from test with local update site (directly from filesystem) using same build of an application: it works well including no pause during the plugin-X verification. Web Server: Single Apache (2.0.52) used on local network, no proxy is used. Note that update site contains 138 plugins. Roman, what do you mean by "verification" ? Is it the string that you see in
the progress monitor ("Verifying blah blah...") ?
Because there are many connections being created (one for each plugin),
perhaps there is something in the network layer that can't handle. What jre
are you using? (you only mention 1.4, but not the exact version and provider)
Have you tried a different http server for windows? How about apache/tomcat on
an xp server?
Dorian, about verification: yes, you are right - it means the string in context of progress bar. about JVM: 1/ Sun JVM build 1.4.2_05-b04, 2/ (other test) Sun JVM build 1.5.0- b64 about http server: I've tested issue no.2 (pause during verification/installation => SocketTimeoutException) with Apache on Linux successfully !!? Note that updating using Apache on Win2000 doesn't work yet and produces issue no.2 still. Note that I can not say any progress about first issue, because it did not indicate regular occurrence. I should try other http server for windows but I can't see any reason why proofed solution like apache doesn't work. Result of test with Tomcat on Windows: successful. But I can find exceptions in context of catalina.log: StandardWrapperValve[default]: Servlet.service() for servlet default threw exception java.net.SocketException: Software caused connection abort: socket write error at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite (InternalOutputBuffer.java:668) at org.apache.coyote.http11.filters.IdentityOutputFilter.doWrite (IdentityOutputFilter.java:160) at org.apache.coyote.http11.InternalOutputBuffer.doWrite (InternalOutputBuffer.java:523) at org.apache.coyote.Response.doWrite(Response.java:524) at org.apache.coyote.tomcat4.OutputBuffer.realWriteBytes(OutputBuffer.java:384) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:439) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:359) at org.apache.coyote.tomcat4.OutputBuffer.writeBytes(OutputBuffer.java:411) at org.apache.coyote.tomcat4.OutputBuffer.write(OutputBuffer.java:398) at org.apache.coyote.tomcat4.CoyoteOutputStream.write (CoyoteOutputStream.java:110) at org.apache.catalina.servlets.DefaultServlet.copyRange (DefaultServlet.java:1996) at org.apache.catalina.servlets.DefaultServlet.copy(DefaultServlet.java:1745) at org.apache.catalina.servlets.DefaultServlet.serveResource (DefaultServlet.java:1073) at org.apache.catalina.servlets.DefaultServlet.doGet(DefaultServlet.java:506) at javax.servlet.http.HttpServlet.service(HttpServlet.java:740) at javax.servlet.http.HttpServlet.service(HttpServlet.java:853) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter (ApplicationFilterChain.java:247) at org.apache.catalina.core.ApplicationFilterChain.access$000 (ApplicationFilterChain.java:98) at org.apache.catalina.core.ApplicationFilterChain$1.run (ApplicationFilterChain.java:176) at java.security.AccessController.doPrivileged(Native Method) at org.apache.catalina.core.ApplicationFilterChain.doFilter (ApplicationFilterChain.java:172) at org.apache.catalina.core.StandardWrapperValve.invoke (StandardWrapperValve.java:256) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNex t(StandardPipeline.java:643) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) at org.apache.catalina.core.StandardContextValve.invoke (StandardContextValve.java:191) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNex t(StandardPipeline.java:643) at org.apache.catalina.valves.CertificatesValve.invoke (CertificatesValve.java:246) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNex t(StandardPipeline.java:641) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) at org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2416) at org.apache.catalina.core.StandardHostValve.invoke (StandardHostValve.java:180) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNex t(StandardPipeline.java:643) at org.apache.catalina.valves.ErrorDispatcherValve.invoke (ErrorDispatcherValve.java:171) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNex t(StandardPipeline.java:641) at org.apache.catalina.valves.ErrorReportValve.invoke (ErrorReportValve.java:172) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNex t(StandardPipeline.java:641) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) at org.apache.catalina.core.StandardEngineValve.invoke (StandardEngineValve.java:174) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNex t(StandardPipeline.java:643) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) at org.apache.coyote.tomcat4.CoyoteAdapter.service(CoyoteAdapter.java:223) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:601) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnectio n(Http11Protocol.java:392) at org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:565) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run (ThreadPool.java:619) at java.lang.Thread.run(Thread.java:534) Thanks. The apache server works best on linux (even though it is multiplatform, the windows version is probably not as good iis). The error from Tomcat maybe something that happened, but the server recovered. You may want to post some question or check the archives at the apache's jakarta project to get more info on tomcat errors. There may be a hidden bug in update, or in the jre itself (some network streams not properly closed, etc.), but it is hard to pinpoint what's happening. I tried updating the entire eclipse from my local apache http server and I had no problems. At some point, I was also copying another very large file into the same DocumentRoot folder for the apache http server, in which case the whole local network download became very slow. This is no related to what you do, but just fyi. Ok, thanks. I have write Summary of the problem below. If you have any idea what should I do next or if you can answer my questions in context of it, would be very helpful. Summary: Issue no. 1: - description: problem during update leading to complete disconnection of a client (when system can do that ??) - occurrence: unknown (different networks, different clients) Issue no. 2: - description: problem during installation of some plugin, download of the next plugin fails with Read timed out. - occurrence: - it doesn't work with a client on Windows XP (2 stations tested) - it doesn't work on platform SDK 3.0 as well as 3.0.1 - automatic update as well as update made by hand (feature-by-feature) doesn't work - local update site (file system) does work without any problem - it doesn't work with Apache on Windows, it does work with Apache on Linux and Tomcat on Windows (but: Software caused connection abort: socket write error; Q: when such kind of error occurres ?) - ping between client and server does work on 100% (without any break) - tested on JVM 1.4 as well as 1.5 - tested on local network - processor utilization: less than 1%, network utilization: less than 1% On issue 2: if feasible, use the os and server (linux/apache) that works, or causes the fewest problems Issue 1: it isn't clear whether the problem still occurs when using linux/apache Dorian, the issue no.1 occurred in all tested cases, unfortunately - i.e. Apache/Linux as well as Tomcat/Windows. Client for all cases: WindowsXP News about issue no.1: it seems there is problem with too many connections - remove of one of handles pointing to \Device\Tcp => ping and other connections to local network started to work. I'm not familiar with windows socket I can't tell you information about that in detail. it seems to me there is a problem with the concept: a download connection per plugin (does it mean: 138 plugins => 138 connections ?). my question: is there any possibility to disable that ? (=> only one connection for all.) If it is not, then: is there any chance to change it simply in context of source code ? All these connections should be happening in a serial fashion. I wonder if anything is left open, the so the network layer does not release those connection. I will investigate this. Created attachment 16124 [details]
netstat log during an update
there you can see info produced by netstat tool I executed in different states
of an update
Created attachment 16125 [details]
an error dialog I can see when an update fails
Thanks for the extra info, that should help. You may also play with the http server, they usually have settings for the number of connections allowed. Some download code in update is done via threads, but only to be able to cancel, in which case the connection dies by itself. I may need to revisit that code to see if it can cause your problems. I would also recommend trying various jdk's, to see if things improve. I did it, I played with the params like no. of connections - no effect. Then I had changed param KeepAlive On -> Off and result was: 1/ the connections in state ESTABLISHED erlier was in state CLOSE_WAIT then (represents waiting for a connection termination request from the local user) but stay there till application close 2/ the error occurred later than in case of the previous set- up. I'm worry about long ongoing in state FIN_WAIT_2 you can see in previous log as well. So, it seems to me there is realy hard-core problem at low network level about connection finnalization. I'm going to send other log. Created attachment 16130 [details]
netstat since KeepAlive set to Off
I assume you're running with the latest fix packs, right ? (or at least as far as networking fixes goes). do you mean windows service packs or jvm fixes or ? sorry, meant OS fixes yes, I do have latest pack, my current OS: Microsoft Windows XP - Professional, Version 2002, Service Pack 2. given all the discussions around service pack 2 for xp, I would have blamed that, but you see the problem on win2k, so there must be something else. I wouldn't rule out bugs in update code, but so far, this is the first time we've heard this problem report. Anyway, I will investigate more, but there are some competing issues that I also need to look at, so don't expect a very quick resolution. thanks! Tested on Linux + Sun JRE build 1.4.2_04-b05: 1/ Client: Eclipse3.0, Linux 2.4.20-8 (Red Hat Linux 3.2.2-5), Server: Windows XP SP1, Apache2.0.52 => same error (see attached netstat log) 2/ Client: Eclipse3.0, Linux 2.4.20-8 (Red Hat Linux 3.2.2-5), Server: Linux 2.4.20-8 (Red Hat Linux 3.2.2-5), Apache HTTPD 2.0.49-4 => SUCCESSFUL Created attachment 16133 [details]
netstat log about unsuccessful test on linux
Without dragging this for too long, so far it looks like the http server does not work well on windows, but the update client needs a bit more investigation to check that it closes connections. Did some searches and found a few interesting docs: http://www.apache.org/dist/httpd/binaries/win32/README.html#xpbug http://www.auburn.edu/docs/apache/misc/fin_wait_2.html I'm also seeing this when trying to install official eclipse.org plugins (like EMF). I'm running Eclipse 3.1.0 on WinXP/SP2. I tried different mirrors (SunSITE, eclipse.org, 100MB mirror) to no avail. The update process hangs at some random point into verifying the downloads. Is there anything I can do to help track this down? Hi I noticed the same bug after installing Eclipse 3.1. Eclipse 3.0 went fine, but now with Eclipse 3.1 when I want to install new features through the update manager, my complete network connection hangs after a certain time. That means that my pc can't connect to anything in the web or the local lan. The difference to the problem of the bug reporter is, that it happens when im updating official features like EMF from the official update site. A workaround is to only update one feature at once, but this works only for small features, not for EMF. If the network connection is broken I have to kill eclipse and the connection returns. Facts: Eclipse 3.1 Java(TM) 2 Platform Standard Edition 5.0 Update 2 WinXP with SP2 Yours sincerely Michael Reitz I can also confirm this bug on Windows XP SP2 when searching for updates for all
features. The network stops working about one minute after starting the update
process. A connection returns when Eclipse is exited. Both local area
connections (to shares on local Windows XP and Linux computers) and the Internet
are lost during this period.
Using Eclipse 3.1.1, Windows XP Service Pack 2, JDK 1.5.0_05. Ethernet
controller is a Gigabit Marvell Yukon 88E8001/8003/8010 with latest manufacturer
drivers. Same behavior occurs with Gigabit nVidia nForce NIC.
I recently bought my current new computer. Before I had the new machine, I never
had this problem. I have attached my computer's full specs in case it helps.
I think, as mentioned before, this is a problem with too many concurrent open
connections. Windows XP SP2 restricts the number of TCP connections per second
to 10. If I recall correctly, I had installed an unoffical patch on my old
computer that restored the SP1 number of connections per second. I believe this
was before I started using Eclipse (3.1 betas) on that machine.
System
======
Windows XP Professional Service Pack 2 (build 2600)
2.20 gigahertz AMD Athlon64 X2 4400+
128 kilobyte primary memory cache
1024 kilobyte secondary memory cache
ASUSTeK Computer INC. A8N-SLI DELUXE 1.XX
Bus Clock: 200 megahertz
BIOS: Phoenix Technologies, LTD ASUS A8N-SLI DELUXE ACPI BIOS Revision 1013 07/
26/2005
2048 Megabytes PC3200 in Dual Channel
250.05 gigabyte Seagate ST3250823AS HDD
_NEC DVD_RW ND-3540A [CD-ROM drive]
LITE-ON DVD SOHD-16P9S [CD-ROM drive]
3.5" format removeable media [Floppy drive]
NVIDIA GeForce 6600 GT [Display adapter]
DELL M770 [Monitor] (14.9"vis, s/n 1780RH3SF024, January 2000)
Marvell Yukon 88E8001/8003/8010 PCI Gigabit Ethernet Controller
IP Address: 192.168.1.108
Gateway: 192.168.1.1
Dhcp Server: 192.168.1.1
NVIDIA nForce Networking Controller (not used)
Virus Protection
================
avast! antivirus 4.6.731 [VPS 0546-4] Version 4.6.731
Realtime File Scanning On
I have exactly the same problem Eclipse Version: 3.1.2 Build id: M20060118-1600 on Windows XP on Linux it worked however. The connection is last at some predictable point (during verification of some RAR for some updates, and during download for another). It does the same thing on different servers at the same moment. please try this with 3.2 rc7 or later, we released some major improvements in RC7 that probably fixed your problem. (In reply to comment #27) > please try this with 3.2 rc7 or later, we released some major improvements in > RC7 that probably fixed your problem. > Yes, I no longer encounter this problem. Thanks for fixing it! *** This bug has been marked as a duplicate of 144876 *** I can also confirm this bug on Windows XP SP2 when performing auto updates using Eclipse SDK 3.2. The network stops working for at least one minute, sometimes longer. Interesstingly, SSH sessions seem to survive as long as there is no actual traffic going on (dormant shell session). After reading comment #25 (https://bugs.eclipse.org/bugs/show_bug.cgi?id=79212#c25) in this bug report, I tryed to diagnose something along these lines. netstat showed, that Eclipse had 25 connections open to our web proxy, with the 25th remaining in SYNC state. After disabling "Deterministic Network Enhancer" and "QoS Packet Scheduler" in the network connection's properties dialog, Eclipse was able to open at least 50 connections (probably as many as there were feature/plugin jars mentioned in the site.xml for timestamp checks or something) and everything worked like a charm. I can't tell, which of the two services/protocols/drivers blocks connection attempts, as I didn't have the time to rule out one of them, but at least it works now. I already thought, I had a hardware failure, phuh.... |