Bug 291530 - corrupt pages often display
corrupt pages often display
Status: RESOLVED FIXED
Product: Community
Classification: Eclipse Foundation
Component: Website
unspecified
PC Windows XP
: P1 major (vote)
: ---
Assigned To: phoenix.ui CLA Friend
:
: 291800 (view as bug list)
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2009-10-06 16:00 EDT by David Williams CLA Friend
Modified: 2009-10-20 12:35 EDT (History)
4 users (show)

See Also:


Attachments
example screen capture (113.39 KB, image/png)
2009-10-06 16:00 EDT, David Williams CLA Friend
no flags Details
Screenshot (42.39 KB, image/jpeg)
2009-10-07 09:55 EDT, Denis Roy CLA Friend
no flags Details
CSS Files are being reset as well (645.81 KB, image/bmp)
2009-10-07 14:28 EDT, Nathan Gervais CLA Friend
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Williams CLA Friend 2009-10-06 16:00:14 EDT
Created attachment 148936 [details]
example screen capture

starting yesterday evening (not that I looked before), I've noticed some junky
pages are often displayed at eclipse.org, such as this (temporary) URL

http://www.eclipse.org/tools/planning/EclipseSimultaneousRelease.php 

I'll attach a more recent attempt, obtained as I've started to develop some
'helios' pages. Note the attached page says "404, not found" which I suspect is
correct at the time I tried, but by the time you look at this, the file might
be there. This bug report isn't about the 404 ... it is about the broken layout
of all the standard navigation links. It is as if some of the php or css files
are not being retrieved. 

http://www.eclipse.org/helios/planning
Comment 1 Denis Roy CLA Friend 2009-10-07 09:55:05 EDT
Created attachment 148992 [details]
Screenshot

I'm getting this too.  Just browsing www gives me reset connections.  I'm not sure if it's happening on other sites yet.
Comment 2 Denis Roy CLA Friend 2009-10-07 10:01:31 EDT
This is the same 'Connection Reset By peer' that we were having with rsync and CVS.  I wonder if there is a connection limit on the firewall for www.  We always hit these stupid limits.

$ wget --delete-after -S http://www.eclipse.org/
--2009-10-07 10:03:22--  http://www.eclipse.org/
Resolving www.eclipse.org... 206.191.52.46
Connecting to www.eclipse.org|206.191.52.46|:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.

--2009-10-07 10:03:23--  (try: 2)  http://www.eclipse.org/
Connecting to www.eclipse.org|206.191.52.46|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Wed, 07 Oct 2009 14:03:22 GMT
  Server: Apache
  Expires: Mon, 26 Jul 1997 05:00:00 GMT
  Last-Modified: Wed, 07 Oct 2009 14:03:23 GMT
  cache-Control: no-store, no-cache, must-revalidate
  cache-Control: post-check=0, pre-check=0
  Pragma: no-cache
  X-NodeID: (null)
  Vary: Accept-Encoding
  Connection: close
  Content-Type: text/html
Length: unspecified [text/html]
Saving to: `index.html'

    [ <=>                                                                                                                                                                                                ] 20,917      --.-K/s   in 0.06s

2009-10-07 10:03:23 (318 KB/s) - `index.html' saved [20917]
Comment 3 Denis Roy CLA Friend 2009-10-07 12:00:33 EDT
FWIW, I tested each www-vm individially from the local net, and none of them are resetting connections.

From home, however, I was able to catch this happen while tcpdump was running.  Note the R(eset) response from www:

12:00:06.228089 IP 192.168.0.101.33000 > www.eclipse.org.http: S 1306454854:1306454854(0) win 5840 <mss 1460,sackOK,timestamp 12712075 0,nop,wscale 7>
E..<..@.@.b....e..4....PM..F...................                                                                                                       
............                                                                                                                                          
12:00:06.306877 IP www.eclipse.org.http > 192.168.0.101.33000: S 1852828261:1852828261(0) ack 1306454855 win 0 <mss 1380>                             
E..,.d....yl..4....e.P..no.eM..G`....7.....d....^z                                                                                                    
12:00:06.306933 IP 192.168.0.101.33000 > www.eclipse.org.http: . ack 1 win 5840                                                                       
E..(..@.@.c....e..4....PM..Gno.fP.......                                                                                                              
12:00:06.333573 IP www.eclipse.org.http > 192.168.0.101.33000: R 1852828262:1852828262(0) win 0                                                       
E..(\s@.s..a..4....e.P..no.fM..GP..............'|#
Comment 4 Denis Roy CLA Friend 2009-10-07 14:10:55 EDT
Hmmm... 

CSS11503# show sticky-stats

Sticky Statistics - SFM Slot 1, Subslot 1:

   Total number of new sticky entries is 2517698
   Total number of sticky table hits is 47382594
   Total number of sticky rejects (no entry) is 0
   Total number of sticky collision is 0

   Total number of available sticky entries is 0  <-- uh oh
   Total number of used sticky entries is 131071
Comment 5 Nathan Gervais CLA Friend 2009-10-07 14:28:40 EDT
Created attachment 149022 [details]
CSS Files are being reset as well

As im sure your aware its seems any file being served is experiencing this connection reset issues.   www.eclipse.org is showing random page stylings based on which .css file you are getting or not getting.  Attached is a screenshot when header.css isn't being returned
Comment 6 Denis Roy CLA Friend 2009-10-07 14:34:18 EDT
Yep.
Comment 7 Denis Roy CLA Friend 2009-10-08 10:55:58 EDT
For some reason, today it seems so much better.
Comment 8 Wayne Beaton CLA Friend 2009-10-08 11:17:34 EDT
I had to reload the /resources page four times before it looked right today.
Comment 9 Denis Roy CLA Friend 2009-10-08 13:36:15 EDT
*** Bug 291800 has been marked as a duplicate of this bug. ***
Comment 10 Karl Matthias CLA Friend 2009-10-13 12:23:27 EDT
I'm back from vacation and I'm looking at this.
Comment 11 Karl Matthias CLA Friend 2009-10-13 12:37:11 EDT
(In reply to comment #4)
> Hmmm... 
> 
> CSS11503# show sticky-stats
> 
> Sticky Statistics - SFM Slot 1, Subslot 1:
> 
>    Total number of new sticky entries is 2517698
>    Total number of sticky table hits is 47382594
>    Total number of sticky rejects (no entry) is 0
>    Total number of sticky collision is 0
> 
>    Total number of available sticky entries is 0  <-- uh oh
>    Total number of used sticky entries is 131071

I don't know for sure yet, but I think this is actually ok.  The fact that the total number of sticky rejects is 0 means that it's recycling them as needed.  I'm thinking this is more likely a timeout issue for HTTP connections like we were having from the rsync and cvs connections.  The default timeout for HTTP packets is 8 seconds (down from 16 for most other protocols).  If we're seeing problems there then possibly this is too low, as well.  I didn't change it before because there were no complaints, but I'm going to up it to 32 now and see if that helps.  After this we will have changed all the major protocols, so I wouldn't expect this to continue for other things. --> If <-- this is the problem.
Comment 12 Karl Matthias CLA Friend 2009-10-13 12:47:41 EDT
Ok, changes have been applied to all http and https services (including SVN).  Let's see how this goes.  I don't see any connection limits being hit on the ASA, so I think it's just the CSS this time...
Comment 13 Nathan Gervais CLA Friend 2009-10-13 13:04:27 EDT
(In reply to comment #12)
> Ok, changes have been applied to all http and https services (including SVN). 
> Let's see how this goes.  I don't see any connection limits being hit on the
> ASA, so I think it's just the CSS this time...

To avoid any confusion CSS is not Cascading Style Sheets correct?
Comment 14 Karl Matthias CLA Friend 2009-10-13 15:58:22 EDT
(In reply to comment #13)
> (In reply to comment #12)
> > Ok, changes have been applied to all http and https services (including SVN). 
> > Let's see how this goes.  I don't see any connection limits being hit on the
> > ASA, so I think it's just the CSS this time...
> 
> To avoid any confusion CSS is not Cascading Style Sheets correct?

Right.  Cisco CSS = Content Service Switch.

So Denis was as usual (don't go getting a big head, now! :) ) correct about the connection limits.  There were some messages about MSS exceeded in the logs and they only got worse when I increased the CSS limit.  I've now upped them by 33% and in my own testing it has made a big difference.  Let me know if you still see this happening now.
Comment 15 Karl Matthias CLA Friend 2009-10-13 16:07:00 EDT
(In reply to comment #14)
> (In reply to comment #13)
> > (In reply to comment #12)
> > > Ok, changes have been applied to all http and https services (including SVN). 
> > > Let's see how this goes.  I don't see any connection limits being hit on the
> > > ASA, so I think it's just the CSS this time...
> > 
> > To avoid any confusion CSS is not Cascading Style Sheets correct?
> 
> Right.  Cisco CSS = Content Service Switch.
> 
> So Denis was as usual (don't go getting a big head, now! :) ) correct about the
> connection limits.  There were some messages about MSS exceeded in the logs and
> they only got worse when I increased the CSS limit.  I've now upped them by 33%
> and in my own testing it has made a big difference.  Let me know if you still
> see this happening now.

FYI I'm not sure the MSS messages are related, but they seem to occur in higher number when there is a problem.  I don't see the actual connection limit messages in the log at this time.
Comment 16 Denis Roy CLA Friend 2009-10-19 14:34:03 EDT
I have not seen this since... Are we good?
Comment 17 Karl Matthias CLA Friend 2009-10-19 18:18:53 EDT
I think we're good as far as I can tell.
Comment 18 David Williams CLA Friend 2009-10-19 19:41:27 EDT
I've not seen any issues recently, so I'd say we can close this.
Comment 19 David Williams CLA Friend 2009-10-19 19:55:36 EDT
Oh, and, thank you very much. 

Is there any "feedback" to give to Cisco?
Comment 20 David Williams CLA Friend 2009-10-20 01:02:01 EDT
I actually saw again this evening. It was shortly after I updated a page.

After a few minutes, it displayed correctly and then did not reoccur. 

Just thought I'd document it ... not worth reopening for one case.
Comment 21 Karl Matthias CLA Friend 2009-10-20 12:03:37 EDT
David, was this page on www and did it consist of more than one file?  Thanks
Comment 22 David Williams CLA Friend 2009-10-20 12:17:41 EDT
(In reply to comment #21)
> David, was this page on www and did it consist of more than one file?  Thanks

Yes, I think 
http://www.eclipse.org/webtools/development/index_pmc_call_notes.php 
and 
http://www.eclipse.org/webtools/development/pmc_call_notes/pmcMeeting.php?meetingDate=2009-10-20

I'm about to "touch" the later URL (to update with meeting minutes) so will see if it recurs with a new change. (but was completely new content I first saw it).
Comment 23 David Williams CLA Friend 2009-10-20 12:26:45 EDT
updates just now worked fine.
Comment 24 Karl Matthias CLA Friend 2009-10-20 12:35:10 EDT
Thanks David.  Since this was on a new page, given that the pages are updated one file at a time, it's possible to hit a page that is new and any secondary files have not been written yet.  Obviously it's a race condition that's hard to avoid with data being synchronized across several servers.  Let's call this good for now, then, unless we start to see this with existing pages again.  Thanks!