Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 368992

Summary: NPE in HttpGenerator.prepareBuffers()
Product: [RT] Jetty Reporter: Missing name <bubbleguuum>
Component: serverAssignee: Greg Wilkins <gregw>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: gregw, jetty-inbox, joakim.erdfelt
Version: unspecified   
Target Milestone: 7.5.x   
Hardware: PC   
OS: Windows 7   
See Also: https://bugs.eclipse.org/bugs/show_bug.cgi?id=367608
Whiteboard:
Bug Depends on:    
Bug Blocks: 384254    
Attachments:
Description Flags
Reproduction
none
possible fix using sync none

Description Missing name CLA 2012-01-18 12:03:48 EST
Build Identifier: 

OS: Windows 7 x64
JVM: 1.6.027 64 bits
Jetty Version: 7.6.0.RC4


Using a SelectChannelConnector + ProxyServlet to serve audio streams, the NPE below happens randomly after a few seconds the servlet is sending the stream to the client. The requests stops unexpectedly shortly after that NPE.
The exact same scenario works perfectly using a SocketConnector.



java.lang.NullPointerException
	at org.eclipse.jetty.http.HttpGenerator.prepareBuffers(HttpGenerator.java:962)
	at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:836)
	at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:81)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:615)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:45)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
	at java.lang.Thread.run(Thread.java:662)


Sometimes it is shortly followed by:


[qtp2098880464-46] WARNING  - 01:01:17.268 - SelectChannelEndPoint       : handle failed
java.lang.NullPointerException
	at org.eclipse.jetty.io.ByteArrayBuffer.poke(ByteArrayBuffer.java:301)
	at org.eclipse.jetty.io.AbstractBuffer.put(AbstractBuffer.java:457)
	at org.eclipse.jetty.http.HttpGenerator.prepareBuffers(HttpGenerator.java:960)
	at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:836)
	at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:81)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:615)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:45)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)

Reproducible: Always

Steps to Reproduce:
I don't have a simple test case to reproduce it but here is  setup likely do exhibit the problem:

- set Jetty with a SelectChannelConnector
- add a ProxyServlet 
- trigger the servlet to redirect to a long stream, like an audio stream
Comment 1 Greg Wilkins CLA 2012-01-18 20:38:32 EST
first part of the fix has been applied.  It removes the high CPU condition due to the interest ops not being cleared for fully async handlers (eg ProxyServlet), that do not have a read or write blocked thread, but still cannot progress.
Comment 2 Greg Wilkins CLA 2012-01-19 01:25:14 EST
Created attachment 209723 [details]
Reproduction

This is an embedded server that can be used to demonstrate the problem.
It requires a file called stream.mp3 to be in /tmp  (any mp3 will do).

run the application and then fetch the mp3 with a throttled get like:

  wget --limit-rate=100k http://localhost:8080/stream/info

For me I had to adjust the rate to get a frequent error.  100k makes problems occur on almost every download, often in the first few seconds.

If I totally synchronize the HttpGenerator class, then I cannot reproduce, so it is almost certainly a memory barrier issue.  It would be good to come up with a more refined memory barrier approach than total synchronization.
Comment 3 Greg Wilkins CLA 2012-01-19 07:15:39 EST
Created attachment 209739 [details]
possible fix using sync

This patch synchronises HttpGenerator and appears to prevent the NPEs etc.
Comment 4 Greg Wilkins CLA 2012-01-19 20:47:24 EST
I have committed a "fix" for this.

The heart of the problem is that if a non dispatch thread is doing IO operations on the endpoint, it is possible for it to set the !_writable status separately to setting the _writeblocked status.  This can cause a select for writable status, which see's no writeblocked thread and thus dispatches a thread to handle the connection.  This results in 2 threads operating on the same connection and Boooom!

So the "fix" is to avoid the normal write operations setting !_writable unless they also set _writeBlocked and wait on the endp.     This can be achieved by preventing write() from calling the non blocking flush when it's buffer is full.
Instead, we simple commit the request and allow a subsequent call to blockWritable do the flushing.

I call this a "fix" rather than a real fix, because I do not think it has closed all the possible paths that may cause an issue.  However it does appear to have fixed the major path and has hopefully made the problem vanishingly unlikely - at least in my testing it has.

I will push a snapshot now, and this will go into RC5 and is probably sufficient for 7.6.0.   However I think this whole area needs to be reviewed to see if we can come up with a better and more complete solution.
Comment 5 Simone Bordet CLA 2012-01-20 11:26:13 EST
Greg, I added a test case for this bug, in ProxyServletTest.
It passes always for me, but fails on Jesse's box, so perhaps it'll fail on yours too (more powerful than mine).
Can you please review it ?
Comment 6 Joakim Erdfelt CLA 2012-01-20 16:07:12 EST
Bug #369291 IllegalArgumentException on DirectNIOBuffer.poke() 
and 
Bug #369292 NPE in HttpGenerator.addContent() 

were closed as duplicates of this bug.
Comment 7 Joakim Erdfelt CLA 2012-01-20 16:09:01 EST
*** Bug 369291 has been marked as a duplicate of this bug. ***
Comment 8 Greg Wilkins CLA 2012-01-24 02:40:35 EST
I've added another fragile "fix" for this.

In SCEP.flush(Buffer...), if data is not able to be written, the it only sets _writable=false if it is dispatched.  This works by further avoiding the state where a SCEP is marked as writable but not _writeBlocked when a non dispatched thread is doing a write.    It still preserves the ability to set !_writable when a dispatched thread is trying to asynchronously flush the buffer.

Websocket style connections work because they will also do a blockWritable if they are unable to flush their buffers.

However, this is still incredibly fragile and needs further consideration.
Comment 9 Greg Wilkins CLA 2012-01-30 20:59:10 EST
The fragile "fix" appears to be working for now, so  I'm closing this issue.
We will consider refactoring for more robust async writes in a future release.

For now, if any async writes fail, please open a new issue and we will analyse to further improve our understanding of what is going on here.