Community
Participate
Working Groups
Build Identifier: A websocket connection is not closed after maxIdleTime after a client disconnect. After a few minutes the socket buffer fills up and sendMessage blocks, hanging the server for all websocket clients. Reproducible: Always Steps to Reproduce: 1. Connect alot of clients to the server (eg 10.000) 2. Connect/Disconnect fast, eg by clicking refresh/forward/back in browser (reproducable in 30 seconds of clicking for me) 3. A websocket connection is not closed after maxIdleTime. After like 5 to 10 minutes the server hangs.
Created attachment 178735 [details] Stacktrace of exception thrown when server continues after it hung for 10minuts
This happens in both jetty 7.1.4.v20100610 and latest 7.2.0-SNAPSHOT
I've fixed the handling of calling ondisconnect, at least for idle timeouts. checked in at r2273 I need to add some unit tests to check for direct closes and sending after close.
r2274 added tests for close and sending after ondisconnect. All looks good now.
I just tried it with trunk r2274, but i can still reproduce the problem. It gave a socket timeout instead of a broken pipe exception now though.
Created attachment 178822 [details] java.net.SocketException: Connection timed out
Tom, does the server still hang? are you getting ondisconnect calls? can you perhaps attach a test harness that demonstrates this?
Yes the server still hangs. Ondisconnect isn't called for the connection which later causes the problem. I did some further testing and i can only reproduce the problem with a lvs loadbalancer in between(not distributing to multiple servers). Without it connections are instantly closed, probably by sending a proper disconnect from the browser. With the load balancer inbetween, the connections from the browser are not closed instantly, but after the websocket timeout. However sometimes the connection doesn't timeout causing the problems. I have no experience with test harnesses, sorry.
Seems to be an issue related to DR(direct return) routing of LVS loadbalancer. Since traffic from server goes directly to the client and not through the load balancer it doesn't see close commands/ACKS. I'm not sure if the end result problems are problems with the timeouts in the load balancer (with is not strictly TCP compliant anymore). Or jetty not handling this unusual situation correctly. I don't know if this issue can occur by normal packet loss/network problems(atleast alot more unlikely)