Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 360546 - ArrayIndexOutOfBoundsException from SelectorManager
Summary: ArrayIndexOutOfBoundsException from SelectorManager
Status: CLOSED FIXED
Alias: None
Product: Jetty
Classification: RT
Component: server (show other bugs)
Version: 7.5.3   Edit
Hardware: Other Linux
: P3 normal (vote)
Target Milestone: 7.5.x   Edit
Assignee: Greg Wilkins CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-11 10:34 EDT by Darin Wright CLA
Modified: 2011-11-10 08:54 EST (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Darin Wright CLA 2011-10-11 10:34:04 EDT
I started getting these exceptions in my jetty server - eventually had to restart the server to get things working again. Not sure what caused them, but thought I'd file this bug to see if anyone else has any insight, or has seen this before:


java.lang.ArrayIndexOutOfBoundsException: -1
	at org.eclipse.jetty.io.nio.SelectorManager.register(SelectorManager.java:160)
	at org.eclipse.jetty.server.nio.SelectChannelConnector.accept(SelectChannelConnector.java:96)
	at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:830)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

The exceptions started happening and continued to happen every millisecond for several minutes before I restarted to correct the problem.
Comment 1 Thomas Becker CLA 2011-10-12 10:34:50 EDT
Hi Darin,

since 7.4.2 we've applied a lot of fixes to the NIO code. Could you please retest with the latest jetty release and reopen the bug if the problem still exists?

Cheers,
Thomas
Comment 2 Darin Wright CLA 2011-10-13 14:26:39 EDT
Ok, I've upgraded to 7.5.3v20111011. I'll let you know how it goes.
Comment 3 Darin Wright CLA 2011-10-25 18:06:17 EDT
Ok, so using 7.5.3v20111011 the problem still occurs:

java.lang.ArrayIndexOutOfBoundsException: -1
	at org.eclipse.jetty.io.nio.SelectorManager.register(SelectorManager.java:157)
	at org.eclipse.jetty.server.nio.SelectChannelConnector.accept(SelectChannelConnector.java:101)
	at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:833)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Comment 4 Darin Wright CLA 2011-10-26 13:35:04 EDT
Note that the problem happens after the server has been running under a heavy production load for 5 to 7 days. I'm not sure how to reproduce the problem in a test/staging environment. Still, any hints/suggestions would be welcome :-)
Comment 5 Jesse McConnell CLA 2011-10-26 13:37:24 EDT
Darin,

I am just looking though this now and am at a bit of a loss as to what is going on.

How heavy of a load is heavy for you for 5-7 days?
Comment 6 Jesse McConnell CLA 2011-10-26 15:26:16 EDT
Which select channel connector is this btw, is it the SelectChannelConnector or the SslSelectChannelConnector?
Comment 7 Darin Wright CLA 2011-10-27 12:12:34 EDT
(In reply to comment #5)
>How heavy of a load is heavy for you for 5-7 days?

Well... the server accepts about 2.5 billion requests per week. The load ranges from 150,000 to 400,000 requests per minute. The requests are quite simple - image GET requests that return NO_CONTENT (204). We're using the GET request to report timing metrics via a payload of request parameters.

(In reply to comment #6)
> Which select channel connector is this btw, is it the SelectChannelConnector or
> the SslSelectChannelConnector?

I create the server like this (so it's the SelectChannelConnector, I assume) :

        final SelectChannelConnector connector = new SelectChannelConnector();
        connector.setPort(beaconPort);
        connector.setAcceptors(2);
        connector.setAcceptQueueSize(512);
        connector.setMaxIdleTime(30000);
        connector.setLowResourcesConnections(5000);
        connector.setLowResourcesMaxIdleTime(5000);
        
        final Server server = new Server();
        server.setConnectors(new Connector[]{connector});
        ExecutorThreadPool pool = new ExecutorThreadPool(100, 500, 1, TimeUnit.MINUTES);
        server.setThreadPool(pool);
        ServletContextHandler handler = new ServletContextHandler(ServletContextHandler.SESSIONS);
        handler.setContextPath("/");
        server.setHandler(handler);

However, the server does accept HTTP and HTTPS connections - should I be creating the server differently?
Comment 8 Greg Wilkins CLA 2011-10-27 23:46:57 EDT
Darin,

I think I see the problem.   You may be blowing max Integer value with the count of the number of connections you've had.

Do you have a low ratio of request to connections? ie are most of your connections 1 request only?

regards
Comment 9 Greg Wilkins CLA 2011-10-28 00:09:40 EDT
I can confirm that we have a problem with the select channel connector when the total number of connections exceeds Integer.MAX_VALUE.    

I've committed the fix to master branch and will deploy a snapshot build shortly.
Comment 10 Greg Wilkins CLA 2011-10-28 01:14:10 EDT
I have pushed a fixed snapshot build to 

https://oss.sonatype.org/content/groups/jetty-with-staging/org/eclipse/jetty/jetty-distribution/7.5.5-SNAPSHOT/

Your load estimate of 2.5B requests per week and the 2,147,483,647 value of max int suggest that you should see this problem after 5-7 days, which you do.

I can't believe we've not seen this before! I guess most of our high volume sites are using persistent connections, so they delay this.  But many of them have been running for months!

But you must be sustaining more than 3.5 connections per second on average, which is certainly up there, at least for connection rates.
Comment 11 Darin Wright CLA 2011-10-28 09:06:17 EDT
(In reply to comment #8)
> 
> I think I see the problem.   You may be blowing max Integer value with the
> count of the number of connections you've had.
> 
> Do you have a low ratio of request to connections? ie are most of your
> connections 1 request only?
> 

Yes - most connections are 1 request only.


(In reply to comment #10)
> I have pushed a fixed snapshot build to 
> 
> https://oss.sonatype.org/content/groups/jetty-with-staging/org/eclipse/jetty/jetty-distribution/7.5.5-SNAPSHOT/
> 
> Your load estimate of 2.5B requests per week and the 2,147,483,647 value of max
> int suggest that you should see this problem after 5-7 days, which you do.
> 
> I can't believe we've not seen this before! I guess most of our high volume
> sites are using persistent connections, so they delay this.  But many of them
> have been running for months!
> 
> But you must be sustaining more than 3.5 connections per second on average,
> which is certainly up there, at least for connection rates.

Thanks, Greg. Nice find. Is this snapshot build something I should/can use in production?
Comment 12 Jesse McConnell CLA 2011-10-28 09:12:02 EDT
Darin,

We just released 7.5.4 earlier this week so there is little in the way of major changes on master right now, some tweaking to debug logging and the like.

So if you want to give it a whirl in production I don't see why it would be any worse then a release build especially since you know what your doing and if it handles a bit of test before going out I don't see any red flags.

https://oss.sonatype.org/content/groups/jetty/org/eclipse/jetty/jetty-distribution/7.5.5-SNAPSHOT/

cheers
Comment 13 Darin Wright CLA 2011-10-28 15:51:36 EDT
Thanks, Jesse. I've deployed this Jetty build to our staging environment. If all goes well, I'll deploy to production in a few days. Will let you know how it goes (after we serve up another few billion requests).
Comment 14 Jesse McConnell CLA 2011-11-02 15:28:34 EDT
Darin,

Got an update on this at all?
Comment 15 Darin Wright CLA 2011-11-03 12:14:46 EDT
Hi Jesse,

The build is in production now. It's only been running for 2 days so far - so too soon to tell. However, one thing I did notice, is that when the server starts up, I get some exceptions like this:


java.lang.NullPointerException
	at org.eclipse.jetty.io.nio.SelectorManager.register(SelectorManager.java:162)
	at org.eclipse.jetty.server.nio.SelectChannelConnector.accept(SelectChannelConnector.java:101)
	at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:833)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)


However - they don't seem to have any lasting ill-effect. After things start up, the exceptions stop and it works OK.
Comment 16 Jesse McConnell CLA 2011-11-03 12:23:06 EDT
Darin,

Jan actually fixed that bug a day or two ago so it shouldn't be present on a more recent build.  No need to update for this though, that was an intermittent startup issue and was benign I believe.

let us know how this goes though!

jesse
Comment 17 Jan Bartel CLA 2011-11-07 19:45:13 EST
Darin,

A few more days have passed now, so how is the production server going? If you report no problem then I will close the issue.

thanks
Jan
Comment 18 Darin Wright CLA 2011-11-08 11:17:54 EST
The beacon has been up for 112 hours now (we had to re-start last Thursday for a configuration change). However, if we make it through the next 2 days, I think we're good.

Thanks,

Darin
Comment 19 Darin Wright CLA 2011-11-10 08:39:46 EST
The server's been up for over 157 hours now - a little shy of one week. However, it's been under heavy load, so I think we can call this one fixed.

Thanks for your help!
Darin
Comment 20 Jesse McConnell CLA 2011-11-10 08:54:44 EST
Great news Darin!