| Summary: | NPE in HttpGenerator.prepareBuffers() | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [RT] Jetty | Reporter: | Missing name <bubbleguuum> | ||||||
| Component: | server | Assignee: | Greg Wilkins <gregw> | ||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||
| Severity: | normal | ||||||||
| Priority: | P3 | CC: | gregw, jetty-inbox, joakim.erdfelt | ||||||
| Version: | unspecified | ||||||||
| Target Milestone: | 7.5.x | ||||||||
| Hardware: | PC | ||||||||
| OS: | Windows 7 | ||||||||
| See Also: | https://bugs.eclipse.org/bugs/show_bug.cgi?id=367608 | ||||||||
| Whiteboard: | |||||||||
| Bug Depends on: | |||||||||
| Bug Blocks: | 384254 | ||||||||
| Attachments: |
|
||||||||
|
Description
Missing name
first part of the fix has been applied. It removes the high CPU condition due to the interest ops not being cleared for fully async handlers (eg ProxyServlet), that do not have a read or write blocked thread, but still cannot progress. Created attachment 209723 [details] Reproduction This is an embedded server that can be used to demonstrate the problem. It requires a file called stream.mp3 to be in /tmp (any mp3 will do). run the application and then fetch the mp3 with a throttled get like: wget --limit-rate=100k http://localhost:8080/stream/info For me I had to adjust the rate to get a frequent error. 100k makes problems occur on almost every download, often in the first few seconds. If I totally synchronize the HttpGenerator class, then I cannot reproduce, so it is almost certainly a memory barrier issue. It would be good to come up with a more refined memory barrier approach than total synchronization. Created attachment 209739 [details]
possible fix using sync
This patch synchronises HttpGenerator and appears to prevent the NPEs etc.
I have committed a "fix" for this. The heart of the problem is that if a non dispatch thread is doing IO operations on the endpoint, it is possible for it to set the !_writable status separately to setting the _writeblocked status. This can cause a select for writable status, which see's no writeblocked thread and thus dispatches a thread to handle the connection. This results in 2 threads operating on the same connection and Boooom! So the "fix" is to avoid the normal write operations setting !_writable unless they also set _writeBlocked and wait on the endp. This can be achieved by preventing write() from calling the non blocking flush when it's buffer is full. Instead, we simple commit the request and allow a subsequent call to blockWritable do the flushing. I call this a "fix" rather than a real fix, because I do not think it has closed all the possible paths that may cause an issue. However it does appear to have fixed the major path and has hopefully made the problem vanishingly unlikely - at least in my testing it has. I will push a snapshot now, and this will go into RC5 and is probably sufficient for 7.6.0. However I think this whole area needs to be reviewed to see if we can come up with a better and more complete solution. Greg, I added a test case for this bug, in ProxyServletTest. It passes always for me, but fails on Jesse's box, so perhaps it'll fail on yours too (more powerful than mine). Can you please review it ? Bug #369291 IllegalArgumentException on DirectNIOBuffer.poke() and Bug #369292 NPE in HttpGenerator.addContent() were closed as duplicates of this bug. *** Bug 369291 has been marked as a duplicate of this bug. *** I've added another fragile "fix" for this. In SCEP.flush(Buffer...), if data is not able to be written, the it only sets _writable=false if it is dispatched. This works by further avoiding the state where a SCEP is marked as writable but not _writeBlocked when a non dispatched thread is doing a write. It still preserves the ability to set !_writable when a dispatched thread is trying to asynchronously flush the buffer. Websocket style connections work because they will also do a blockWritable if they are unable to flush their buffers. However, this is still incredibly fragile and needs further consideration. The fragile "fix" appears to be working for now, so I'm closing this issue. We will consider refactoring for more robust async writes in a future release. For now, if any async writes fail, please open a new issue and we will analyse to further improve our understanding of what is going on here. |