| Summary: | ArrayIndexOutOfBoundsException from SelectorManager | ||
|---|---|---|---|
| Product: | [RT] Jetty | Reporter: | Darin Wright <darin.eclipse> |
| Component: | server | Assignee: | Greg Wilkins <gregw> |
| Status: | CLOSED FIXED | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | CC: | janb, jesse.mcconnell, jetty-inbox, tbecker |
| Version: | 7.5.3 | ||
| Target Milestone: | 7.5.x | ||
| Hardware: | Other | ||
| OS: | Linux | ||
| Whiteboard: | |||
|
Description
Darin Wright
Hi Darin, since 7.4.2 we've applied a lot of fixes to the NIO code. Could you please retest with the latest jetty release and reopen the bug if the problem still exists? Cheers, Thomas Ok, I've upgraded to 7.5.3v20111011. I'll let you know how it goes. Ok, so using 7.5.3v20111011 the problem still occurs: java.lang.ArrayIndexOutOfBoundsException: -1 at org.eclipse.jetty.io.nio.SelectorManager.register(SelectorManager.java:157) at org.eclipse.jetty.server.nio.SelectChannelConnector.accept(SelectChannelConnector.java:101) at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:833) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Note that the problem happens after the server has been running under a heavy production load for 5 to 7 days. I'm not sure how to reproduce the problem in a test/staging environment. Still, any hints/suggestions would be welcome :-) Darin, I am just looking though this now and am at a bit of a loss as to what is going on. How heavy of a load is heavy for you for 5-7 days? Which select channel connector is this btw, is it the SelectChannelConnector or the SslSelectChannelConnector? (In reply to comment #5) >How heavy of a load is heavy for you for 5-7 days? Well... the server accepts about 2.5 billion requests per week. The load ranges from 150,000 to 400,000 requests per minute. The requests are quite simple - image GET requests that return NO_CONTENT (204). We're using the GET request to report timing metrics via a payload of request parameters. (In reply to comment #6) > Which select channel connector is this btw, is it the SelectChannelConnector or > the SslSelectChannelConnector? I create the server like this (so it's the SelectChannelConnector, I assume) : final SelectChannelConnector connector = new SelectChannelConnector(); connector.setPort(beaconPort); connector.setAcceptors(2); connector.setAcceptQueueSize(512); connector.setMaxIdleTime(30000); connector.setLowResourcesConnections(5000); connector.setLowResourcesMaxIdleTime(5000); final Server server = new Server(); server.setConnectors(new Connector[]{connector}); ExecutorThreadPool pool = new ExecutorThreadPool(100, 500, 1, TimeUnit.MINUTES); server.setThreadPool(pool); ServletContextHandler handler = new ServletContextHandler(ServletContextHandler.SESSIONS); handler.setContextPath("/"); server.setHandler(handler); However, the server does accept HTTP and HTTPS connections - should I be creating the server differently? Darin, I think I see the problem. You may be blowing max Integer value with the count of the number of connections you've had. Do you have a low ratio of request to connections? ie are most of your connections 1 request only? regards I can confirm that we have a problem with the select channel connector when the total number of connections exceeds Integer.MAX_VALUE. I've committed the fix to master branch and will deploy a snapshot build shortly. I have pushed a fixed snapshot build to https://oss.sonatype.org/content/groups/jetty-with-staging/org/eclipse/jetty/jetty-distribution/7.5.5-SNAPSHOT/ Your load estimate of 2.5B requests per week and the 2,147,483,647 value of max int suggest that you should see this problem after 5-7 days, which you do. I can't believe we've not seen this before! I guess most of our high volume sites are using persistent connections, so they delay this. But many of them have been running for months! But you must be sustaining more than 3.5 connections per second on average, which is certainly up there, at least for connection rates. (In reply to comment #8) > > I think I see the problem. You may be blowing max Integer value with the > count of the number of connections you've had. > > Do you have a low ratio of request to connections? ie are most of your > connections 1 request only? > Yes - most connections are 1 request only. (In reply to comment #10) > I have pushed a fixed snapshot build to > > https://oss.sonatype.org/content/groups/jetty-with-staging/org/eclipse/jetty/jetty-distribution/7.5.5-SNAPSHOT/ > > Your load estimate of 2.5B requests per week and the 2,147,483,647 value of max > int suggest that you should see this problem after 5-7 days, which you do. > > I can't believe we've not seen this before! I guess most of our high volume > sites are using persistent connections, so they delay this. But many of them > have been running for months! > > But you must be sustaining more than 3.5 connections per second on average, > which is certainly up there, at least for connection rates. Thanks, Greg. Nice find. Is this snapshot build something I should/can use in production? Darin, We just released 7.5.4 earlier this week so there is little in the way of major changes on master right now, some tweaking to debug logging and the like. So if you want to give it a whirl in production I don't see why it would be any worse then a release build especially since you know what your doing and if it handles a bit of test before going out I don't see any red flags. https://oss.sonatype.org/content/groups/jetty/org/eclipse/jetty/jetty-distribution/7.5.5-SNAPSHOT/ cheers Thanks, Jesse. I've deployed this Jetty build to our staging environment. If all goes well, I'll deploy to production in a few days. Will let you know how it goes (after we serve up another few billion requests). Darin, Got an update on this at all? Hi Jesse, The build is in production now. It's only been running for 2 days so far - so too soon to tell. However, one thing I did notice, is that when the server starts up, I get some exceptions like this: java.lang.NullPointerException at org.eclipse.jetty.io.nio.SelectorManager.register(SelectorManager.java:162) at org.eclipse.jetty.server.nio.SelectChannelConnector.accept(SelectChannelConnector.java:101) at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:833) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) However - they don't seem to have any lasting ill-effect. After things start up, the exceptions stop and it works OK. Darin, Jan actually fixed that bug a day or two ago so it shouldn't be present on a more recent build. No need to update for this though, that was an intermittent startup issue and was benign I believe. let us know how this goes though! jesse Darin, A few more days have passed now, so how is the production server going? If you report no problem then I will close the issue. thanks Jan The beacon has been up for 112 hours now (we had to re-start last Thursday for a configuration change). However, if we make it through the next 2 days, I think we're good. Thanks, Darin The server's been up for over 157 hours now - a little shy of one week. However, it's been under heavy load, so I think we can call this one fixed. Thanks for your help! Darin Great news Darin! |