Community
Participate
Working Groups
Jetty has a Timeout class that AT THE TIME it was written was based on the following assumptions: + Almost all timeouts are cancelled before they expire, so cancellation must be cheap. + Most timeouts are the same value (eg maxIdleTime), so new timeouts can be added to the end of the list. + Synchronization is cheap and there are not that many multi-core machines exist. + no concurrent collections existed. + object creation and garbage collection are expensive. + accuracy is not all that important as timeouts are just for idle connections. + Idle timeouts are scheduled when a request starts and cancelled when the response completes. + Expired timeouts call jetty code that is known to be short. So the timeout mechanism that links the timeout tasks themselves into linked queues without any object creation and we have a timeout queue for each select set. So every request will grab the lock once to add itself to the timeout queue and then grab the lock again to remove itself from the queue and reschedule. This work really well when it was done, but unfortunately things are changing and the assumptions need to be updated: + Almost all IDLE timeouts are cancelled before they expire. Continuation timeouts are often let expire as part of a long polling cycle. + Most IDLE timeouts are the same value, but continuation timeouts come from the application and may vary. Websocket timeouts may also come from the application and may vary. Their is some demand for each connection to have it's own discovered IDLE timeout value. + There a lots of multicore machines and central synchronization can cause serialization/contention issues. + Good concurrent collections exist. + Object creation is really cheap. Some forms of garbage collection are also almost free (YG). + Accuracy can be important for continuation timeouts. + Idle timeouts are scheduled whenever a request or response makes progress, ie on each chunk. + Expired IDLE timeouts call jetty code that is known to be short (except SSL), but other timeouts may call application code. Do deal with these changing assumptions, Jetty-7 already changed to have a different timeout queues for IDLE and other timeouts, so the different types of timeout can be handled differently. None idle timeouts are dispatched so application code can take it's time. This is working well enough, but in anticipation of more >8 core monsters, I'm wondering if there is more that we can do. My first idea is that IDLE timeouts don't even need to be organized into a queue. Connections can keep there last activity time in a volatile long and a sweeper thread can look over all connections every few seconds checking for expired connections. I have a patch that does exactly this, and every second or two (needs to be configurable) the select set manager will copy all known endpoints into an array and dispatch a thread to iterate over the array and to expire any connections in there. So if we have a server with 20,000 connections, and 5000 requests per second, this means that instead of the timer queue being locked 15,000/s we instead set volatile longs 15,000/s This should be a good win on contention. But the sweeper thread needs an array of 20,000 connections created (or 2 10k arrays, or 4 5k arrays, depending on the number of selectors) and populated every second. The array is needed because the selectset key set is not thread safe. So I have to evaluate, if it is more efficient to keep a concurrent set of all endpoints, which can be iterated over by another thread, or if this 20k copy is harmless enough. Another good feature of this approach is that each connection can have it's own idle timeout, so that things like websocket can vary the timeout without the need to make their own timeout mechanism.
Created attachment 167695 [details] patch for idle timeout adds sweeper for idle timeouts and a idle timeout per connection. Not yet added the concurrent set of all endpoints to avoid the copy. Also need to make the sweep time configurable.
Created attachment 167822 [details] patch for idle timeout
see also http://bugs.cometd.org/browse/COMETD-113
patch committed r1764.
ready for 7.1.4
Resolved -> Closed