Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 454272

Summary: build.eclipse.org pegged out at 225.28 131.87 62.47
Product: Community Reporter: David Williams <david_williams>
Component: ServersAssignee: Eclipse Webmaster <webmaster>
Status: VERIFIED FIXED QA Contact:
Severity: blocker    
Priority: P3 CC: denis.roy, milesparker
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Whiteboard:
Bug Depends on:    
Bug Blocks: 454311    

Description David Williams CLA 2014-12-05 12:15:07 EST
For some reason, the "load" on build.eclipse.org is so high, I can not even get "top" to come up to tell me what's using all the resources. 

Hope it's not one of mine. :) 

And, getting worse as I write this bug:  307.85 190.67 92.18

Marked as blocker, since there are some things I need to do on "build.eclipse.org" and as it is, I can not execute anything.
Comment 1 Denis Roy CLA 2014-12-05 12:21:27 EST
21307 mparker   20   0 97832  54m  26m D      6  0.2   0:00.39 java                                                                                                                                                                          
32146 mparker   20   0  158m  55m  26m D      5  0.2   0:00.40 java                                                                                                                                                                          
31134 mparker   20   0 97976  54m  27m D      4  0.2   0:00.54 java                                                                                                                                                                          
 9681 mparker   20   0 97976  54m  26m D      3  0.2   0:00.51 java                                                                                                                                                                          
20981 mparker   20   0 97976  54m  26m D      3  0.2   0:00.66 java                                                                                                                                                                          
21418 mparker   20   0  164m  61m  27m D      3  0.3   0:00.72 java                                                                                                                                                                          
31827 mparker   20   0 98176  55m  27m D      3  0.2   0:00.55 java                                                                                                                                                                          
31336 mparker   20   0  165m  61m  27m D      3  0.3   0:00.71 java                                                                                                                                                                          
31524 mparker   20   0  101m  60m  27m D      3  0.3   0:00.80 java                                                                                                                                                                          
20454 mparker   20   0 98176  55m  27m D      2  0.2   0:00.52 java                                                                                                                                                                                                                                                        
32147 mparker   20   0  159m  56m  27m D      1  0.2   0:00.54 java                                                                                                                                                                          
10206 mparker   20   0 98192  55m  27m D      0  0.2   0:00.48 java                                                                                                                                                                          
31939 mparker   20   0 98192  54m  26m D      0  0.2   0:00.61 java
Comment 2 Denis Roy CLA 2014-12-05 12:53:08 EST
I think it was losing its connection to NFS server 'wilma'.  Fixed now.
Comment 3 David Williams CLA 2014-12-05 12:56:21 EST
Confirming, and glad it could be "fixed" without rebooting! Which was about the only advice I found on "D State" via web searches. :)
Comment 4 Denis Roy CLA 2014-12-05 12:57:22 EST
D state is usually a hard block waiting for a device.
Comment 5 Denis Roy CLA 2014-12-05 13:00:17 EST
Just for info, we mount our NFS mounts as "hard" meaning the NFS client will simply hang if it cannot reach the server (leading to processes in D state). ALternative is "soft" where the client will return an I/O error and continue (which is no good).
Comment 6 David Williams CLA 2014-12-05 13:51:59 EST
(In reply to Denis Roy from comment #5)
> Just for info, we mount our NFS mounts as "hard" meaning the NFS client will
> simply hang if it cannot reach the server (leading to processes in D state).
> ALternative is "soft" where the client will return an I/O error and continue
> (which is no good).

Well, we had a build fail due to IO Errors, at about 10:37 and "just tried to continue" and from the logs, it appears that the "primary" disks it needs "just disappeared" ... so, hope your comment is not literally true in all cases (because otherwise that would mean we have some other error! :) 

I am not the least bit concerned, and do not need you to investigate, but if you wanted to look at the logs (which, just contain the "Java exceptions") I saved a copy (since, I'll be starting a new build, which replaces the originals soon). They are under 
/opt/public/eclipse/builds

mb4M.out.temphold.log <== huge, with the errors at the end. 
mb4M.err.temphold.log <== small, with one error listed. 

But, again, I'm fine just "trying again" and am not concerned -- well, unless it happens again during the re-try :)
Comment 7 Miles Parker CLA 2014-12-05 14:11:17 EST
Ok, so I haven't been on ssh to build.eclipse.org for nearly a year now. I have no idea what those jobs are, unless they were manually triggered hudson jobs, and I haven't triggered any hudson jobs for at a month or two now.

Denis, any clue as to what these processes were actually doing?
Comment 8 Miles Parker CLA 2014-12-05 14:15:13 EST
(In reply to Miles Parker from comment #7)
> Ok, so I haven't been on ssh to build.eclipse.org for nearly a year now>

(I mean, I just was now, to try to see what the heck is going on. :))
Comment 9 Denis Roy CLA 2014-12-05 14:38:55 EST
I think it's just those one-minute cron jobs that tend to pile up when the server is severely bogged down.  If nothing checks to see if the previous job is running, it keeps spawning them every minute until the server runs out of memory.

I've changed the one-minute job to */20 (every 20 minutes).
Comment 10 Miles Parker CLA 2014-12-05 14:42:15 EST
I don't actually even need those cron jobs anymore. I'll kill them, though I have to be able to get into build.eclipse.org first and that's been a bit tricky.
Comment 11 Denis Roy CLA 2014-12-05 15:29:47 EST
> I don't actually even need those cron jobs anymore.

Oh, you java programmers, always relying on something else to collect the garbage  =)

I've removed both jobs for you.
Comment 12 Miles Parker CLA 2014-12-05 16:05:32 EST
:)