| Summary: | Unable to cancel frozen connection | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] Platform | Reporter: | Christophe Elek <celek> | ||||
| Component: | Update (deprecated - use Eclipse>Equinox>p2) | Assignee: | Platform-Update-Inbox <platform-update-inbox> | ||||
| Status: | RESOLVED WONTFIX | QA Contact: | |||||
| Severity: | major | ||||||
| Priority: | P3 | CC: | A.Kuckartz, antonio.petrelli, axb, btripkov, caniszczyk, daniel, Dave_Thomson, eagtstools, erich_gamma, fraenkel, gaetan, jacob, jacob, karasiuk, Kevin_McGuire, klicnik, manahan, nadment, nikolaymetchev, stephen.francisco, ursreupke, ysaillet, zina | ||||
| Version: | 2.0 | Keywords: | helpwanted | ||||
| Target Milestone: | --- | ||||||
| Hardware: | PC | ||||||
| OS: | Windows 2000 | ||||||
| Whiteboard: | |||||||
| Attachments: |
|
||||||
|
Description
Christophe Elek
*** Bug 18813 has been marked as a duplicate of this bug. *** F3 candidate *** Bug 21790 has been marked as a duplicate of this bug. *** *** Bug 22310 has been marked as a duplicate of this bug. *** We will likely not be able to fix this problem for 2.0.1 release. We lack the suitable support in Java to have fine-grain control over attempts to open a network connection to a server with a random URL. We cannot set the timeout for the method to return, and even if we attempt to connect in another thread, we are not sure we can safely kill the connection by killing the thread (not sure if OS resources are cleanly recycled or we will have a resource leak that way). *** Bug 23106 has been marked as a duplicate of this bug. *** *** Bug 23128 has been marked as a duplicate of this bug. *** *** Bug 22143 has been marked as a duplicate of this bug. *** My 2p on this, granted timeouts etc: It would be possible to dispose of the UI component immediately and let the connection timeout naturally and be cleaned up (by executing on another thread) - i.e. not terminating the thread but allowing it to timeout naturally would ensure OS resources arent be gobbled up indefinitely and the user would see immediate feedback through UI being disposed. Maybee this is a leetle too hacky :) That is one possibility, my concern is leaving unclosed connections (maybe 1 out of 100) but still... Also, by looking at teh JDK 1.3 doc, I realize there is no clean way of killing a spawn thread... I believe 1.4 has a way, but not 1.4 for URL stream. So we may have to hack anyway, and start a thread and use a deprecated call if the timer expires.. Unless I miss something... anyone ? For 1.3, you don't have any options for the connect taking forever. There is no good way to kill a thread regardless of deprecated methods. They are all broke. In my case, the reads were blocking which can be controlled via setSoTimeout. hum isn't that only for socket ? Excue my ingnorance but I thought in 1.3 you can set teh socket timout, not the URL connection (even though they may run on sockets right ? ;-) What we open is URL.openConnection()... Would it be simple to attempt to open a socket (where a timeout can be defined) and only attempt to use the URL if the socket was possible? This need only be attempted for http connections (not file:///). If the socket fails to connect, then we report that it was not available and avoid the hang. Found the easiest/only solution. You can affect the connect timeout and read timeout of a HttpURLConnection via the following System properties: sun.net.client.defaultConnectTimeout sun.net.client.defaultReadTimeout The defaults for both are -1. Yep, good catch but ;-) 1) isn't that only in 1.4 ? 2) sun.* classes, never very good ;-) 3) will it work if Eclipse runs on J9 or QNX ? nevertheless, I agree this is the 'perfect' solution in a 'perfect' world. http://java.sun.com/j2se/1.4/docs/guide/net/properties.html *** Bug 20099 has been marked as a duplicate of this bug. *** *** Bug 18505 has been marked as a duplicate of this bug. *** Action Taken: We should target this one for M5. It doesn't have a workaround. Action Plan: Investigate HttpURLConnection.disconnect() I am lowering the priority to P2. Even though there is no workaround, we cannot claim that we can fix it (we are limited by the underlying network support in java.net). P1 would mean that we will not ship without this, which is a bit too strong for this defect. *** Bug 30993 has been marked as a duplicate of this bug. *** *** Bug 32140 has been marked as a duplicate of this bug. *** *** Bug 32140 has been marked as a duplicate of this bug. *** This is how 2.1 implementation is going to look like: 1) We will definitely set sun properties: sun.net.client.defaultConnectTimeout sun.net.client.defaultReadTimeout to something moderate (say 60 seconds) 2) We will find out if equivalent properties are available for other implementations (J9, IBM VM) and set those as well. 3) We will not try to address all connections in 2.1 (too complex), but most of the problems happen at the front end i.e. when trying to connect a site (wrong URL, too slow, network problems, no proxy etc.). We will handle update site connection using a connection manager class. When InputStream is needed from the HttpURLConnection, we will spawn another thread to call 'getInputStream()'. In the main thread (again, not the GUI thread but the main worker thread for the connection), we will call 'join' on the connection thread with some small interval (say 200ms). When the connection thread is done or 'join' times out, we will check if the user pressed 'Cancel' button. We will leave the blocked connection thread die the natural 'timeout' death while the UI is fully responsive. In an unlikely event that the connection does not time out ever, we have a limit of connection threads will can spawn (10). Again, the UI will be fully responsive and the worse that can happen would be to restart Eclipse in an orderly fashion without loosing our work. Restarting would force these threads to close. This change will affect attempts to expand the site bookmark in the UI and search for new updates. In both cases, 'Cancel' will cause the UI to immediately return and be operational. The change will not affect problems with the network that happen in the middle of a read. We read bytes in buffer-size blocks and if the network is slow, we will eventually read the buffer and react to the 'cancel' button between the two reads. If the 'read' method blocks, the only solution is for the timeout to throw IOException. You need to be on JDK 1.4 for this. I am resolving this defect as 'Later' to be reopened in 2.2 when we will rework the whole network connection layer in Update. Is later now? :) I'm reopening because I just saw a "hang" on a socket read (I'll attach dump) which not only was not cancelable, but froze up whole UI/Display. (didn't repaint). This was starting off with RC1a platform, and trying Callisto RC1a site. Not sure, exactly, why it would hang, but if it helps, I did try both the "plain" Callisto site.xml from a browser, and tried, from a browser, the download mirros script, and both responed right away. I was on widnows xp, sp2 Created attachment 39898 [details]
thread dump during hang of update manager
FYI, as can been seen from thread dump, I was using a Sun 1.5 VM. I partially re-opened this thread, instead of a new one, so the whole 1.4 history could be recalled. I'd suggest one improvement might be made to ConnectionThreadManager so it was sensitive if it was using a 1.5 VM, then it could use setConnectTimeout API on URLConnection. As second, more minor, improvement to ConnectionThreadManager ... those 1.4 Sun properties are set in the constructor, and never reset to what they were, if any. A slightly better pattern would be to remember their current values, if any, and reset them when done. Branko ... If I'd noticed the long (old) CC list on this one I would have just oepned a new one :) ... but, I'm adding you ... I wanted to be sure the right people "saw" it ... and not sure who that might be ... except perhaps you? Can we look at this issue again somewhat? I hit this with J9 and other lovely JVMs. I think the approach David suggests maybe reasonable. However, the argument can be made that people are responsible for setting the proper system properties for timeouts per each JVM. We had a problem with JDK 1.3 in the past but with 1.4 the connection timeout was set much shorter. I don't see how JDK 1.5 would help us other than to make this timeout shorter still. In addition, I don't have the cycles for playing with the method and testing it out. Patches welcome. Seems there is a number of problems with Cancel not working. When i try to Cancel the workspace from being rebuilt it just ignores me and keeps on building it. *** Bug 147803 has been marked as a duplicate of this bug. *** *** Bug 165311 has been marked as a duplicate of this bug. *** *** Bug 198282 has been marked as a duplicate of this bug. *** *** Bug 166810 has been marked as a duplicate of this bug. *** I solved, in a dirty way, the problem by using Simple DNS Plus under Windows XP. The local DNS server returns "refused" when an external address is queried, while for internal ones (local servers, proxy) they are forwarded to the "normal" DNS servers. Anyway this is a terrible hack, the real solution is *not* doing DNS queries when behind a proxy! Considering the severity of this bug, please provide a target milstone when this defect would be fixed. This bug is against the legacy Update Manager that has been replaced in Eclipse 3.4 with a new provisioning system from Equinox. I suggest trying out recent builds, particularly 3.5 M3 or greater, and this situation should be better. We now use Apache HTTP client, which deals with frozen connections much better than the java.net HTTP client. If you are still seeing problems in 3.5 M3 or later, please open a new bug against Equinox p2. |