Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 256423

Summary: Apache failing to hup correctly after logrotate
Product: Community Reporter: Denis Roy <denis.roy>
Component: ServersAssignee: Eclipse Webmaster <webmaster>
Status: RESOLVED WORKSFORME QA Contact:
Severity: blocker    
Priority: P3 CC: gunnar, karl.matthias, kim.moir, richard.gronback, wgp010
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Whiteboard:

Description Denis Roy CLA 2008-11-25 07:22:00 EST
Leaving httpd processes in zombie (defunct) state.
Comment 1 Denis Roy CLA 2008-11-25 07:22:30 EST
*** Bug 256419 has been marked as a duplicate of this bug. ***
Comment 2 Denis Roy CLA 2008-11-25 07:22:42 EST
*** Bug 256196 has been marked as a duplicate of this bug. ***
Comment 3 Denis Roy CLA 2008-11-25 09:16:45 EST
This appears to be a PHP bug.  By the looks of the bug report, chances of it being fixed are slim to none:

http://bugs.php.net/bug.php?id=44309
Comment 4 Gustavo de Paula CLA 2008-11-25 12:27:31 EST
on dsdp mtj svn the following error is shown now:

Some of selected resources were not committed.
svn: Commit failed (details follow):
svn: Can't open file '/home/data/svn/dsdp/org.eclipse.mtj/db/write-lock': Permission denied
svn: MERGE of '/svnroot/dsdp/org.eclipse.mtj/trunk/releng/org.eclipse.mtj.releng': 409 Conflict (https://dev.eclipse.org)
Comment 5 Karl Matthias CLA 2008-11-25 12:30:08 EST
Denis, Matt and I also found this when you were out of town and three of the nodes crashed at various times.  We also tracked it down to a PHP bug at that time.  Pretty much sucks.  I remember vividly because with backoffs Nagios wouldn't page until about 4:30am my time, just about right for ruining a night of sleep. ;)

I really thought we had discussed it when you got back.  Sorry if we didn't.
Comment 6 Karl Matthias CLA 2008-11-25 12:30:34 EST
Make that 1:30. :)
Comment 7 Denis Roy CLA 2008-11-25 13:31:08 EST
We had discussed this, but the problem magically went away, so I simply didn't believe you  :)  I do now.

Until we can solve this, rasputin will kill -9 the httpd processes and restart them.  It works manually.
Comment 8 Eclipse Webmaster CLA 2008-11-25 14:49:17 EST
So after the last time this happened I added a 'killproc' option for the apache process to try and force a restart.  However a little while later I added some extra code to skip specific service checks when those services had reached their 'flap' limit.  Catch is < is not > and so once the service had failed once it was skipping new failures and failing to restart.  I've pulled that code and restarted rasputin.  Which may also explain the strange 'reappearance' of this issue.

-M.

Comment 9 Gustavo de Paula CLA 2008-11-25 15:05:22 EST
i still can't commit on MTJ SVN. is it still the same issue? the error is on comment 4

(In reply to comment #8)
> So after the last time this happened I added a 'killproc' option for the apache
> process to try and force a restart.  However a little while later I added some
> extra code to skip specific service checks when those services had reached
> their 'flap' limit.  Catch is < is not > and so once the service had failed
> once it was skipping new failures and failing to restart.  I've pulled that
> code and restarted rasputin.  Which may also explain the strange 'reappearance'
> of this issue.
> 
> -M.
> 

Comment 10 Eclipse Webmaster CLA 2008-11-25 16:16:13 EST
(In reply to comment #9)
> i still can't commit on MTJ SVN. is it still the same issue? the error is on
> comment 4

The issue with SVN is not directly connected with the issue with apache. The SVN issue can be tracked on bug 256436.

-M.
Comment 11 Eclipse Webmaster CLA 2009-03-06 14:55:59 EST
Closing as we haven't seen this issue after the last comment.

-M.
Comment 12 Denis Roy CLA 2009-05-07 16:17:29 EDT
Moving all these to Servers.