Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 355948 - Unexpected delay after each HTTP response
Summary: Unexpected delay after each HTTP response
Status: CLOSED INVALID
Alias: None
Product: Jetty
Classification: RT
Component: server (show other bugs)
Version: 7.4.4   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: 7.5.x   Edit
Assignee: Michael Gorovoy CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-26 09:17 EDT by Gili Mising name CLA
Modified: 2011-09-07 19:17 EDT (History)
3 users (show)

See Also:


Attachments
Testcase (9.71 KB, application/x-unknown)
2011-08-29 10:05 EDT, Gili Mising name CLA
no flags Details
Client testcase v2 (3.26 KB, application/x-unknown)
2011-08-29 16:49 EDT, Gili Mising name CLA
no flags Details
TCP dump using wireshark (12.61 KB, application/x-zip-compressed)
2011-09-06 11:44 EDT, Gili Mising name CLA
no flags Details
Updated logs include packet data (17.29 KB, application/x-zip-compressed)
2011-09-06 12:31 EDT, Gili Mising name CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gili Mising name CLA 2011-08-26 09:17:54 EDT
Build Identifier: 7.4.4.v20110707

Every time I place an HTTP request (using an application I wrote, not a web browser) I am getting a 500 ms delay between the time the servlet returns the response and the time the client begins receiving any bytes. I spent a lot of time trying to figure out whether the problem was in the client or server. In the end, I ended up using Fidler HTTP proxy between the two and got the following:

ClientConnected: 16:34:40.486
ClientBeginRequest: 16:34:40.486
ClientDoneRequest: 16:34:40.486
Determine Gateway: 0ms
DNS Lookup: 0ms
TCP/IP Connect: 0ms
HTTPS Handshake: 0ms
ServerConnected: 16:34:40.486
FiddlerBeginRequest: 16:34:40.486
ServerGotRequest: 16:34:40.486
ServerBeginResponse: 16:34:40.986
ServerDoneResponse: 16:34:40.986
ClientBeginResponse: 16:34:40.986
ClientDoneResponse: 16:34:40.986

        Overall Elapsed: 00:00:00.5000500

I then collected client and server logs (at a later run). Here is the server log:

2011-08-25 13:38:28.552,DEBUG,qtp7356647-12 - /devices REQUEST /devices on org.eclipse.jetty.server.nio.SelectChannelConnector$SelectChannelHttpConnection@38d4e2@10.0.1.32:80<->10.0.1.32:53882
2011-08-25 13:38:28.553,DEBUG,qtp7356647-12 - /devices scope null||/devices @ o.e.j.w.WebAppContext{/,file:/C:/Users/g.tzabari/Documents/vitex/database.server/src/main/webapp/},src/main/webapp
2011-08-25 13:38:28.553,DEBUG,qtp7356647-12 - /devices context=||/devices @ o.e.j.w.WebAppContext{/,file:/C:/Users/g.tzabari/Documents/vitex/database.server/src/main/webapp/},src/main/webapp
2011-08-25 13:38:28.553,DEBUG,qtp7356647-12 - /devices sessionManager=org.eclipse.jetty.server.session.HashSessionManager@14fb35d
2011-08-25 13:38:28.553,DEBUG,qtp7356647-12 - /devices session=null
2011-08-25 13:38:28.554,DEBUG,qtp7356647-12 - /devices servlet |/devices|null -> default
2011-08-25 13:38:28.554,DEBUG,qtp7356647-12 - /devices chain=
2011-08-25 13:38:28.554,DEBUG,qtp7356647-12 - /devices call filter Guice Filter
2011-08-25 13:38:28.555,TRACE,com.muxlab.vitex.database.server.ConnectionFilter.doFilter,qtp7356647-12 - /devices start HTTP/1.1 GET http://10.0.1.32/devices
2011-08-25 13:38:28.558,TRACE,com.muxlab.vitex.database.server.ConnectionFilter.doFilter,qtp7356647-12 - /devices end HTTP/1.1 GET http://10.0.1.32/devices
2011-08-25 13:38:28.559,DEBUG,qtp7356647-12 - /devices RESPONSE /devices  307
2011-08-25 13:38:29.072,DEBUG,qtp7356647-18 Selector0 SelectChannelConnector@0.0.0.0:80 STARTED closed org.eclipse.jetty.server.nio.SelectChannelConnector$SelectChannelHttpConnection@38d4e2@10.0.1.32:80<->10.0.1.32:53882
2011-08-25 13:38:29.074,DEBUG,qtp7356647-16 - /devices/ REQUEST /devices/ on org.eclipse.jetty.server.nio.SelectChannelConnector$SelectChannelHttpConnection@6abc19@10.0.1.32:80<->10.0.1.32:53883

Here is the client log:

Thu Aug 25 13:38:28.546 EDT 2011 sun.net.www.protocol.http.HttpURLConnection plainConnect
FINEST: ProxySelector Request for http://10.0.1.32/devices
Thu Aug 25 13:38:28.548 EDT 2011 sun.net.www.protocol.http.HttpURLConnection plainConnect
FINEST: Proxy used: DIRECT
Thu Aug 25 13:38:28 EDT.549 2011 sun.net.www.protocol.http.HttpURLConnection writeRequests
FINE: sun.net.www.MessageHeader@10ef5a55 pairs: {GET /devices HTTP/1.1: null}{Accept: application/vnd.com.muxlab.vitex.devices+json; version=1.0}{User-Agent: Java/1.7.0}{Host: 10.0.1.32}{Connection: keep-alive}
Thu Aug 25 13:38:29 EDT.063 2011 sun.net.www.protocol.http.HttpURLConnection getInputStream
FINE: sun.net.www.MessageHeader@46c7b04 pairs: {null: HTTP/1.1 307 Temporary Redirect}{Location: http://10.0.1.32/devices/}{Content-Length: 0}{Server: Jetty(7.4.4.v20110707)}
Thu Aug 25 13:38:29 EDT.064 2011 sun.net.www.protocol.http.HttpURLConnection followRedirect
FINE: Redirected from http://10.0.1.32/devices to http://10.0.1.32/devices/

Looking at the server log, notice the 500 ms delay at time 13:38:28.559, between "RESPONSE /devices" and "STARTED closed". I believe this is the problem I am seeing. I suspect Jetty is not flushing the response until "STARTED closed".

Reproducible: Always
Comment 1 Greg Wilkins CLA 2011-08-29 00:58:08 EDT
Gili,

There is something strange happening here indeed.

Can you capture the headers of the request and the response and attach them to this issue.

My guess is that the response being sent is either HTTP/1.0 or Connection:close has been specified, but for some reason the connection is not being closed.  Why the selector is closed 500ms is really strange?  Are you stopping the server?

any chance that you can put together a simple example that demonstrates this problem?
Comment 2 Gili Mising name CLA 2011-08-29 10:05:28 EDT
Created attachment 202320 [details]
Testcase
Comment 3 Gili Mising name CLA 2011-08-29 10:08:52 EDT
I'm not stopping the server. If you look at the client logs I originally attached you will notice it sends the following headers:

{GET /devices HTTP/1.1:
null}{Accept: application/vnd.com.muxlab.vitex.devices+json;
version=1.0}{User-Agent: Java/1.7.0}{Host: 10.0.1.32}{Connection: keep-alive}

As far as I can tell, we're talking about HTTP/1.1 without a Connection:close. Take a look at the testcase and let me know what you think.
Comment 4 Michael Gorovoy CLA 2011-08-29 14:06:34 EDT
Gili,

I've just tried to run your test on my Ubuntu 10.04 workstation, and it is working  correctly in this environment.

I captured the TCP conversation using Wireshark and noticed that Jersey sends a 307 Temporary Redirect response. This is caused by a missing forward slash at the end the path value. It should be specified as "devices/" in order to avoid the redirect.

Could you please change the path value try the test in your environment to confirm that you are still getting 500ms delay to ensure that it is not caused by redirect?

Thanks,
Michael
Comment 5 Gili Mising name CLA 2011-08-29 16:48:18 EDT
Michael,

Just to make sure we're comparing apples to apples. Can you please paste the corresponding logback output from your end? We need all the output from "/devices REQUEST /devices" to "STARTED closed".

The redirect is not relevant. You will get a 500ms after each response regardless of what the servlet returns. I will upload an updated client to demonstrate this. Here is the output I get:

16:46:42.830 [main] ERROR com.mycompany.jettyclient.App - start
16:46:44.877 [main] ERROR com.mycompany.jettyclient.App - after request with redirect: Hello World
16:46:45.388 [main] ERROR com.mycompany.jettyclient.App - after direct request: Hello World
16:46:45.899 [main] ERROR com.mycompany.jettyclient.App - after direct request: Hello World

Notice the 2 second delay for request with redirects (probably a result of four separate requests under the hood) and 500ms delay for requests without redirect.
Comment 6 Gili Mising name CLA 2011-08-29 16:49:51 EDT
Created attachment 202361 [details]
Client testcase v2
Comment 7 Michael Gorovoy CLA 2011-08-29 17:54:40 EDT
Here is the output for the v2 of your test client that I got.

17:40:22.873 [main] ERROR com.mycompany.jettyclient.App - start
17:40:23.013 [main] ERROR com.mycompany.jettyclient.App - after request with redirect: Hello World
17:40:23.019 [main] ERROR com.mycompany.jettyclient.App - after direct request: Hello World
17:40:23.022 [main] ERROR com.mycompany.jettyclient.App - after direct request: Hello World

I'm going to fire up my Windows XP virtual machine and try running your test on Windows, in the meanwhile i would greatly appreciate if you could answer the following questions.

* Are you running both test client and test server on the same computer?
* Are you running your test on a VM or physical machine?
* Do you have any kind of firewall and/or anti-virus software (including ones that built into Windows) that are enabled on your computer? 

Thanks,
Michael
Comment 8 Gili Mising name CLA 2011-08-29 20:02:19 EDT
> Are you running both test client and test server on the same computer?

Yes.

> Are you running your test on a VM or physical machine?

Physical machine.

> Do you have any kind of firewall and/or anti-virus software (including ones that built into Windows) that are enabled on your computer? 

The built-in firewall and ESET Antivirus are enabled. I assume you want me to try disabling them? :)
Comment 9 Michael Gorovoy CLA 2011-08-29 21:19:23 EDT
(In reply to comment #8)
> The built-in firewall and ESET Antivirus are enabled. I assume you want me to
> try disabling them? :)

Yes, please! ;)
Comment 10 Gili Mising name CLA 2011-08-30 11:09:49 EDT
Disabling ESET Antivirus and Windows Firewall didn't help. Please note I could not disable Windows Firewall in "Domain Networks" because I don't have the necessary permissions but I don't think it is relevant given the fact that we're operating on localhost.

After further investigation I noticed the problem is specific to Windows 7. I cannot reproduce the problem under Windows XP. I don't have access to Windows Vista so I'm not sure if it exhibits the same problem but I suspect it does (same TCP/IP stack).

If you configure a Windows 7 installation on your end you should be able to reproduce the problem quite easily.
Comment 11 Michael Gorovoy CLA 2011-08-30 11:44:37 EDT
Thanks a lot for narrowing down the environment.

One more question for you. What version of Java are you using, and is the version of Java on your WinXP machine the same as on Win7 machine?

-Michael
Comment 12 Gili Mising name CLA 2011-08-30 12:10:43 EDT
All machines are running Java7 GA.
Comment 13 Gili Mising name CLA 2011-08-30 12:11:31 EDT
It probably doesn't make a difference, but all machines are also 32-bit (hence a 32-bit JRE).
Comment 14 Michael Gorovoy CLA 2011-08-30 12:37:36 EDT
Could you please try running your test with the latest version of Java 6 to see if this is a Java 7 issue?
Comment 15 Gili Mising name CLA 2011-08-30 13:25:19 EDT
My Windows XP configuration initially ran JRE 1.6.0_25. I updated it to JRE 1.7.0 and got the same behavior. It might be both JRE *and* OS specific, but I doubt it.
Comment 16 Michael Gorovoy CLA 2011-08-30 13:30:54 EDT
I would greatly appreciate if you could try Windows 7 with Java 1.6.0_25.

It will take me a some time to configure development environment on Windows 7 on my laptop, because its alter ego is Ubuntu 10.04 that is my primary working config, so I would love to be absolutely sure this is not a Java 7 on Win7 issue.
Comment 17 Gili Mising name CLA 2011-08-30 15:53:30 EDT
The problem occurs on Windows 7 and JDK 1.6.0_27 (I couldn't find update 25 online).
Comment 18 Gili Mising name CLA 2011-09-05 20:06:52 EDT
Michael,

Tell me you have good news :)
Comment 19 Jan Bartel CLA 2011-09-05 20:31:35 EDT
Gili,

Can you capture a tcp trace of an entire client/server interaction and attach it to this bug report please?

thanks
Jan
Comment 20 Jan Bartel CLA 2011-09-05 20:38:59 EDT
Oh, and can you try running the client on a different machine to the server.

Jan
Comment 21 Gili Mising name CLA 2011-09-05 21:08:30 EDT
Jan,

My goal is to help you reproduce this problem on your end so you can do the necessary debugging. To that end, I've provided you with a simple testcase. Please try running it on the correct environment and let me know if you can reproduce the problem. If you cannot, I'll be more than happy to run extra tests for you. I hope you understand I don't want to spin my wheels for nothing.

There is something specific in my work environment that triggers this problem. I cannot reproduce it at home using the attached testcase. First, I plan on isolating the environmental factor. There is no point logging TCP packets until we figure out how to reproduce it consistently. The problem does not seem to be random. It either occurs 100% of the time, or not at all.
Comment 22 Jan Bartel CLA 2011-09-05 21:35:58 EDT
Gili,

Windows systems are hard to come by for us. Most Jetty developers use Ubuntu or Macs. Additionally, as you point out, this is likely specific to a particular environment that only you have access to. We appreciate the test case you provided and it will no doubt be handy, however, in order to make the most efficient use of everyone's time (and remember Jetty is an open-source project) the quickest path to you getting a resolution would be if you could provide us with as much information as possible, and this includes a tcp dump from the system experiencing the errors.


regards
Jan

(In reply to comment #21)
> Jan,
> 
> My goal is to help you reproduce this problem on your end so you can do the
> necessary debugging. To that end, I've provided you with a simple testcase.
> Please try running it on the correct environment and let me know if you can
> reproduce the problem. If you cannot, I'll be more than happy to run extra
> tests for you. I hope you understand I don't want to spin my wheels for
> nothing.
> 
> There is something specific in my work environment that triggers this problem.
> I cannot reproduce it at home using the attached testcase. First, I plan on
> isolating the environmental factor. There is no point logging TCP packets until
> we figure out how to reproduce it consistently. The problem does not seem to be
> random. It either occurs 100% of the time, or not at all.
Comment 23 Gili Mising name CLA 2011-09-05 21:36:58 EDT
How do I produce a TCP dump?
Comment 24 Jan Bartel CLA 2011-09-05 21:51:56 EDT
I believe ye-olde-favourite WireShark works on Windows:
http://www.wireshark.org/

Otherwise there's WinDump: http://www.winpcap.org/windump/

Probably plenty more with googling something like "windows tcp dump" or "windows tcp sniffer".

Jan

(In reply to comment #23)
> How do I produce a TCP dump?
Comment 25 Gili Mising name CLA 2011-09-06 11:44:10 EDT
Created attachment 202817 [details]
TCP dump using wireshark

Seems I will be eating a straw hat tonight.

The attached Wireshark logs seem to indicate that the client is at fault. I ran two tests:

1. I ran the server on a Windows 7 machine (10.0.1.32) for both tests.
2. I ran a client on a Windows XP machine (10.0.1.4) without delays. Files: client-good.txt, server-good.txt
3. I ran the client on a secondary Windows 7 machine (10.0.1.21) with delays. Files: client-bad.txt, server-bad.txt
4. Frame 201 in client-bad.txt reads "[Time delta from previous displayed frame: 0.520433000 seconds]"

I would appreciate your help figuring out where this delay is coming from. I enabled full logging on the client side but it's not clear what is going on between:

Sep 06, 2011 11:41:20.928 sun.net.www.protocol.http.HttpURLConnection writeRequests
FINE: sun.net.www.MessageHeader@9133f65 pairs: {GET /devices HTTP/1.1: null}{Accept: text/plain}{User-Agent: Java/1.7.0}{Host: localhost}{Connection: keep-alive}

and

Sep 06, 2011 11:41:21.436 sun.net.www.protocol.http.HttpURLConnection getInputStream
FINE: sun.net.www.MessageHeader@37086a4 pairs: {null: HTTP/1.1 307 Temporary Redirect}{Location: http://localhost/devices/}{Content-Length: 0}{Server: Jetty(7.4.4.v20110707)}


Any ideas?
Comment 26 Gili Mising name CLA 2011-09-06 11:47:38 EDT
Also, please double check the sharkwire logs (client-bad.txt and server-bad.txt). It's possible I misinterpreted what is going on and this is still a server-side problem.
Comment 27 Gili Mising name CLA 2011-09-06 12:31:34 EDT
Created attachment 202825 [details]
Updated logs include packet data

You can examine HTTP headers in these updated logs.
Comment 28 Gili Mising name CLA 2011-09-06 14:47:41 EDT
According to the Netbeans profiler, the client spends 2100 ms out of 2721 ms in:

java.net.SocketInputStream.read(byte[], int, int, int) invoked by
java.net.SocketInputStream.read(byte[], int, int) invoked by
java.net.BufferedInputStream.fill() invoked by
java.net.BufferedInputStream.read1(byte[], int, int) invoked by
java.net.BufferedInputStream.read(byte[], int, int) invoked by
sun.net.www.http.HttpClient.parseHTTPHeader(MessageHeader, ProgressSource, HttpURLConnection) invoked by
sun.net.www.http.HttpClient.parseHTTP(MessageHeader, ProgressSource, HttpURLConnection) invoked by
sun.net.www.http.HttpURLConnection.getInputStream() invoked by
sun.net.www.http.HttpURLConnection.getResponseCode()

On the one hand, Wireshark seems to indicate the client is wasting 500 ms after receiving a response. On the other hand, the client profiler seems to indicate the client is unable to proceed because it's waiting on bytes. Something doesn't make sense.
Comment 29 Gili Mising name CLA 2011-09-06 16:44:06 EDT
Disabling the "Base Filtering Engine" service and all its dependencies fixes the problem. What doesn't make sense is that when I re-enable all the relevant services the problem doesn't come back. It's not clear which individual service is causing TCP delays, why and how to fix it.
Comment 30 Jan Bartel CLA 2011-09-06 21:45:15 EDT
Gili,

Looking at this post:
http://forums.openvpn.net/topic7965.html

Talks about ESET that was introducing a 500ms delay to filtering http packets ... could this be it?

Jan
Comment 31 Gili Mising name CLA 2011-09-06 23:03:33 EDT
Hi Jan,

That's one of the first things I tried disabling but it didn't seem to have any effect. What's worse, I have ESET Antivirus at home and it doesn't exhibit this problem. I'll try uninstalling it at work tomorrow and seeing if it makes a difference.
Comment 32 Gili Mising name CLA 2011-09-07 10:44:34 EDT
I figured it out.

1. It turns out that disabling ESET AntiVirus using Setup -> "Temporarily disable Antivirus and antispyware protection" doesn't disable all anti-virus components. When I uninstalled the anti-virus the network latency disappeared.

2. This bug seems to be fixed in the latest version of ESET Antivirus. Upgrading from version 4.0.474.0 to 4.2.71.2 fixed it for us.

Thank you everyone for your help. I'm happy we got to the bottom of this :)
Comment 33 Jan Bartel CLA 2011-09-07 19:17:06 EDT
Glad its resolved. Closing this issue.