| Summary: | build.eclipse.org pegged out at 225.28 131.87 62.47 | ||
|---|---|---|---|
| Product: | Community | Reporter: | David Williams <david_williams> |
| Component: | Servers | Assignee: | Eclipse Webmaster <webmaster> |
| Status: | VERIFIED FIXED | QA Contact: | |
| Severity: | blocker | ||
| Priority: | P3 | CC: | denis.roy, milesparker |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Bug Depends on: | |||
| Bug Blocks: | 454311 | ||
|
Description
David Williams
21307 mparker 20 0 97832 54m 26m D 6 0.2 0:00.39 java 32146 mparker 20 0 158m 55m 26m D 5 0.2 0:00.40 java 31134 mparker 20 0 97976 54m 27m D 4 0.2 0:00.54 java 9681 mparker 20 0 97976 54m 26m D 3 0.2 0:00.51 java 20981 mparker 20 0 97976 54m 26m D 3 0.2 0:00.66 java 21418 mparker 20 0 164m 61m 27m D 3 0.3 0:00.72 java 31827 mparker 20 0 98176 55m 27m D 3 0.2 0:00.55 java 31336 mparker 20 0 165m 61m 27m D 3 0.3 0:00.71 java 31524 mparker 20 0 101m 60m 27m D 3 0.3 0:00.80 java 20454 mparker 20 0 98176 55m 27m D 2 0.2 0:00.52 java 32147 mparker 20 0 159m 56m 27m D 1 0.2 0:00.54 java 10206 mparker 20 0 98192 55m 27m D 0 0.2 0:00.48 java 31939 mparker 20 0 98192 54m 26m D 0 0.2 0:00.61 java I think it was losing its connection to NFS server 'wilma'. Fixed now. Confirming, and glad it could be "fixed" without rebooting! Which was about the only advice I found on "D State" via web searches. :) D state is usually a hard block waiting for a device. Just for info, we mount our NFS mounts as "hard" meaning the NFS client will simply hang if it cannot reach the server (leading to processes in D state). ALternative is "soft" where the client will return an I/O error and continue (which is no good). (In reply to Denis Roy from comment #5) > Just for info, we mount our NFS mounts as "hard" meaning the NFS client will > simply hang if it cannot reach the server (leading to processes in D state). > ALternative is "soft" where the client will return an I/O error and continue > (which is no good). Well, we had a build fail due to IO Errors, at about 10:37 and "just tried to continue" and from the logs, it appears that the "primary" disks it needs "just disappeared" ... so, hope your comment is not literally true in all cases (because otherwise that would mean we have some other error! :) I am not the least bit concerned, and do not need you to investigate, but if you wanted to look at the logs (which, just contain the "Java exceptions") I saved a copy (since, I'll be starting a new build, which replaces the originals soon). They are under /opt/public/eclipse/builds mb4M.out.temphold.log <== huge, with the errors at the end. mb4M.err.temphold.log <== small, with one error listed. But, again, I'm fine just "trying again" and am not concerned -- well, unless it happens again during the re-try :) Ok, so I haven't been on ssh to build.eclipse.org for nearly a year now. I have no idea what those jobs are, unless they were manually triggered hudson jobs, and I haven't triggered any hudson jobs for at a month or two now. Denis, any clue as to what these processes were actually doing? (In reply to Miles Parker from comment #7) > Ok, so I haven't been on ssh to build.eclipse.org for nearly a year now> (I mean, I just was now, to try to see what the heck is going on. :)) I think it's just those one-minute cron jobs that tend to pile up when the server is severely bogged down. If nothing checks to see if the previous job is running, it keeps spawning them every minute until the server runs out of memory. I've changed the one-minute job to */20 (every 20 minutes). I don't actually even need those cron jobs anymore. I'll kill them, though I have to be able to get into build.eclipse.org first and that's been a bit tricky. > I don't actually even need those cron jobs anymore.
Oh, you java programmers, always relying on something else to collect the garbage =)
I've removed both jobs for you.
:) |