Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 367608

Summary: HttpGenerator crashes when used with AsyncContext (async servlet), produces corrupted HTTP chunks, timing sensitive race
Product: [RT] Jetty Reporter: Derianto Kusuma <deriantok>
Component: serverAssignee: Greg Wilkins <gregw>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: P3 CC: janb, jetty-inbox, testopenss
Version: 8.0.4   
Target Milestone: 7.5.x   
Hardware: PC   
OS: Linux   
See Also: https://bugs.eclipse.org/bugs/show_bug.cgi?id=368992
Whiteboard:
Attachments:
Description Flags
SimpleAsyncHttpServlet - the simplest scenario that reproes the bug
none
AsyncHttpServlet - an example demonstrating timing-sensitivity of the bug none

Description Derianto Kusuma CLA 2011-12-28 14:07:23 EST
Build Identifier: 8.0.4.v20111024

Hello, I noticed that when using async servlet API (completing the request using AsyncContext on another thread), HttpGenerator may crash with NPE (or even other Exceptions) when writing response chunks, causing corrupted HTTP chunks being sent.

This issue seems to be reproducible with both chunked encoding and non-chunked encoding (Content-Length specified upfront).

After testing some more, this issue seems to be very timing-sensitive.  It is the most reproducible when the moment the servlet thread finishes the servlet doGet / doPost method block, and the moment when another thread is writing to HttpServletResponse in AsyncContext object, are very close in time.  In my observation the probability of this race condition happening is closely tied to how close the two events happen in time.

I attach two files for reproing the problem:

1. SimpleAsyncHttpServlet - the simplest possible scenario that reproduces the problem.  Simply hit the servlet via browser (when reproduced, it will exhibit hanging behavior), or via curl (when reproduced, curl reports a parse error)

2. AsyncHttpServlet - to demonstrate that the issue is time sensitive.  You can use the "wait" and "isChunked" query parameters to fine-tune what the servlet will do.  Since the servlet thread is hard-coded to sleep for 1000 msec, the issues reproduce more often as "wait" param gets nearer to 1000.

Example server stack traces:

Caused by: java.lang.NullPointerException
	at org.eclipse.jetty.io.BufferUtil.putHexInt(BufferUtil.java:132)
	at org.eclipse.jetty.http.HttpGenerator.prepareBuffers(HttpGenerator.java:996)
	at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:826)
	at org.eclipse.jetty.server.HttpConnection.flushResponse(HttpConnection.java:617)
	at org.eclipse.jetty.server.HttpConnection$Output.close(HttpConnection.java:1000)
	at org.codehaus.jackson.impl.Utf8Generator.close(Utf8Generator.java:1123)
	at org.codehaus.jackson.map.ObjectWriter._configAndWriteValue(ObjectWriter.java:468)
	at org.codehaus.jackson.map.ObjectWriter.writeValue(ObjectWriter.java:363)
	at org.dk.rpc.serialization.ExtendedJacksonJsonRpcSerialization.writeResponseObject(ExtendedJacksonJsonRpcSerialization.java:399)
	at org.dk.rpc.servlet.JsonRpcBaseHttpServlet.writeResponse(JsonRpcBaseHttpServlet.java:88)


Thanks!


Reproducible: Sometimes

Steps to Reproduce:
I attach two files for reproing the problem:

1. SimpleAsyncHttpServlet - the simplest possible scenario that reproduces the problem.  Simply hit the servlet via browser (when reproduced, it will exhibit hanging behavior), or via curl (when reproduced, curl reports a parse error)

2. AsyncHttpServlet - to demonstrate that the issue is time sensitive.  You can use the "wait" and "isChunked" query parameters to fine-tune what the servlet will do.  Since the servlet thread is hard-coded to sleep for 1000 msec, the issues reproduce more often as "wait" param gets nearer to 1000.
Comment 1 Derianto Kusuma CLA 2011-12-28 14:08:30 EST
Created attachment 208832 [details]
SimpleAsyncHttpServlet - the simplest scenario that reproes the bug
Comment 2 Derianto Kusuma CLA 2011-12-28 14:09:05 EST
Created attachment 208833 [details]
AsyncHttpServlet - an example demonstrating timing-sensitivity of the bug
Comment 3 Greg Wilkins CLA 2011-12-30 16:18:51 EST
I have tried your simple servlet in 7.6.0.RC2 and it works fine, so you may want to try 8.1.0.RC2 (which is build on 7.6.0.RC2 and has the same reworked IO layer).

However I will leave this bug open as I want to improve the testing we do of the lifecycle:

   dispatch
   {
      suspend
      spawn
      {
         write
         complete
      }
   }

As I do think we have some memory barrier issues we need to address (rather than race condition).

Note that you can avoid this issue by using the pattern:

   dispatch
   {
      suspend
      spawn
      {
         resume
      }
   }
   dispatch
   {
      write
   }

but I do understand there are good reasons for the former pattern, hence we need to be 100% sure it is working.   So we'll increase our testing of it in the 7.6 and 8.1 releases
Comment 4 Derianto Kusuma CLA 2011-12-31 17:34:41 EST
I just tried both Jetty 8.1.0 RC0 and RC2.  They are definitely more stable for this case and the problem only occurs under relatively high concurrency (50-100 concurrent requests).

The double-dispatch pattern you suggested works perfectly though, even under stress.

Thank you Greg for looking into this and happy new year!
Comment 5 Greg Wilkins CLA 2012-01-19 22:42:40 EST
We have found a few more improvements with async writes.  These are in head now and will be in RC5. See 368992
Comment 6 Jan Bartel CLA 2012-04-02 00:54:19 EDT
Derianto,

Can you confirm that the more recent jetty releases (eg 7.6.2) fix the problem? If so, we can close this issue.

thanks
Jan
Comment 7 Derianto Kusuma CLA 2012-04-02 15:01:34 EDT
Hi Jan,

Release 7.6.2 doesn't seem to support Servlet 3.0 API, so I tried release 8.1.2 instead.  The issue is pretty hard to reproduce with 8.1.2, but using my stress suite I can still reproduce 1-2 failed calls with the same symptom for every 10000 calls.  The calls are very brief (around 1 ms each), and I use 20 concurrent HTTP clients to hit the service.

Thanks,
Deri
Comment 8 Greg Wilkins CLA 2012-04-11 22:30:28 EDT
Deri,

Is this with the same tests as attached or with an updated test?
If it is updated, can you attach it please.

thanks
Comment 9 Derianto Kusuma CLA 2012-04-12 23:01:13 EDT
Hi Greg, the old test (using AsyncHttpServlet) can still repro the problem.  I reproed it with Jetty 8.1.2.v20120308, and passing the query string "wait=1000&chunked=true" when hitting the servlet.

e.g. URL: http://localhost:60080/jettytest/jettytest?wait=1000&chunked=true
Comment 10 Greg Wilkins CLA 2012-08-09 20:10:47 EDT
I think we will have this issue fixed in jetty-9, but I'm not sure we will be able to fix jetty-7/8 without harming their performance.    I will comment again here when jetty-9 is in a testable state.
Comment 11 Jesse McConnell CLA 2013-01-30 11:02:13 EST
Jetty 9 is in a testable state so if you can report behavior for your scenario on M5 or the forthcoming RC0 that would be awesome
Comment 12 littlestar littlestar CLA 2013-02-25 01:55:33 EST
I have a similar problem. jetty 9 RC1.
I use jetty to transfer 100G data from client to server.
I had test 8.1.9 with the following patch, works failed again.
401382 Prevent parseAvailable from parsing next chunk when previous has not been consumedjetty
400040 NullPointerException in HttpGenerator.prepareBuffers, Adding safety check on _header use

this issue seems to be very timing-sensitive.
===========================================
13/02/25 13:37:10 WARN http.HttpParser: badMessage: java.io.IOException: bad chunk char: 46 for HttpChannelOverHttp@3ccd76be{r=181,a=ASYNCWAIT}
java.io.IOException: bad chunk char: 46
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1200)
        at org.eclipse.jetty.server.HttpConnection$Input.blockForContent(HttpConnection.java:544)
        at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:124)
Comment 13 Greg Wilkins CLA 2013-03-14 23:47:14 EDT
I believe this issue (at least the last report of it in jetty-9) was to do with how were were recycling buffers after a partial write.  This allowed a buffer to be reused and refilled with other data and then written in a corrupted chunk.

This is fixed in 9.0.0 so I will close.   Please reopen if you have any similar issues in 9.0.x