| Summary: | [build.eclipse.org] intermittent ssh connection failures during jarsigning | ||
|---|---|---|---|
| Product: | Community | Reporter: | Nick Boldt <nboldt> |
| Component: | Servers | Assignee: | Eclipse Webmaster <webmaster> |
| Status: | RESOLVED WONTFIX | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | CC: | kim.moir |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Whiteboard: | |||
-timestamp:
[echo] 12:30:10
Is that 12:30:10 PM local time?
(In reply to comment #1) > -timestamp: > [echo] 12:30:10 > Is that 12:30:10 PM local time? Yes, that's 00:30h EST, about an hour before I opened this bug. That specific instance is the 12:30am crunch. There's super heavy load as the downloads stats tables are rotated. I also get some NFS timeout messages around that time too: Feb 26 00:29:07 build kernel: nfs: server nfsmaster not responding, timed out I also see a noticeable dip in the bandwidth for a few minutes while the servers crunch. Your best bet is to catch the error and try again, or avoid doing stuff around 12:30. > Your best bet is to catch the error and try again, or avoid doing stuff around
> 12:30.
I was thinking my 'check for signed zip' test ought to be smarter anyway, so I'll fix that at my end.
Since this is a scheduled event and not a server snafu, I'll close this as WONTFIX, and avoid the 00:30 crunch in future.
Thanks for the info.
Denis, by dip in bandwidth in comment #3, I assume you mean that the there is little available bandwidth at this time for cvs checkouts from eclipse.org. How long does this last after 12:30am? The reason I'm asking is that our builds have been failing with cvs timeouts the last few nights and if this server maintenance is the culprit, I'll run them earlier. (In reply to comment #5) > Denis, by dip in bandwidth in comment #3, I assume you mean that the there is > little available bandwidth at this time for cvs checkouts from eclipse.org. No, I mean the NFS server is so ridiculously busy that servers wait for it, causing a dip in bandwidth as everything slows down waiting for disk time. Midnight to about 12:45am seems the be the absolute busiest time of the day. If you can have your builds completed before that, that would be great. Thanks, I'll change our build time. You might want to announce this to other committers so that other teams don't run into the same issue. (In reply to comment #7) > Thanks, I'll change our build time. You might want to announce this to other > committers so that other teams don't run into the same issue. Denis: I can blog this along with the multiple-queue signing (bug 220037) & some other signing tips, once I've tested it out. But I don't want to steal your thunder, if you'd prefer an email broadcast or your own blog. Let me know either way. |
The process below is to push a file to build.eclipse.org and wait until jarsigner is done and the file reappears; when it does, the ls command returns something other than "No such file or directory" and the build can continue. Unfortunately, sometimes I get these ssh auth failures instead. Any idea why this happens? -- waitForChangedAttribs: -timestamp: [echo] 12:24:00 compareAttribs: [exec] Result: 2 [echo] original: ${originalAttribs} [echo] polled: ls: emf-sdo-xsd-Master-runWithSun.zip: No such file or directory writeDiffResult: waitForChangedAttribs: -timestamp: [echo] 12:26:01 compareAttribs: [exec] Result: 2 [echo] original: ${originalAttribs} [echo] polled: ls: emf-sdo-xsd-Master-runWithSun.zip: No such file or directory writeDiffResult: waitForChangedAttribs: -timestamp: [echo] 12:28:03 compareAttribs: [exec] Result: 2 [echo] original: ${originalAttribs} [echo] polled: ls: emf-sdo-xsd-Master-runWithSun.zip: No such file or directory writeDiffResult: waitForChangedAttribs: -timestamp: [echo] 12:30:10 compareAttribs: [exec] Result: 255 [echo] original: ${originalAttribs} [echo] polled: Permission denied, please try again. [echo] Permission denied, please try again. [echo] Received disconnect from 206.191.52.57: 2: Too many authentication failures for nickb writeDiffResult: waitForChangedAttribs: [echo] copy zip back to build machine [exec] Result: 1 [echo] delete temp files on build.eclipse.org packMasterZip: