| Summary: | Cannot build on build2 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Community | Reporter: | Nicolas Bros <nicolas.bros> | ||||||
| Component: | CI-Jenkins | Assignee: | CI Admin Inbox <ci.admin-inbox> | ||||||
| Status: | RESOLVED WONTFIX | QA Contact: | |||||||
| Severity: | major | ||||||||
| Priority: | P3 | CC: | d_a_carver, stepper, webmaster | ||||||
| Version: | unspecified | ||||||||
| Target Milestone: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | All | ||||||||
| Whiteboard: | |||||||||
| Bug Depends on: | 315643 | ||||||||
| Bug Blocks: | |||||||||
| Attachments: |
|
||||||||
|
Description
Nicolas Bros
Created attachment 169104 [details]
I get this stacktrace in the job's configuration
And I get this one at the beginning of a build: Started by user estepper ln -s 2010-05-19_08-56-50 /opt/users/hudsonbuild/.hudson/jobs/emf-cdo-integration/builds/455 failed: -1 Building remotely on build2 remote file operation failed: /opt/users/hudsonbuild/workspace/emf-cdo-integration at hudson.remoting.Channel@294d294d:build2 Archiving artifacts ERROR: Failed to archive artifacts: result/site.p2/** java.io.IOException: SSH channel is closed. (Close requested by remote) at com.trilead.ssh2.channel.ChannelManager.sendData(ChannelManager.java:383) at com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:63) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1847) at java.io.ObjectOutputStream$BlockDataOutputStream.writeByte(ObjectOutputStream.java) at java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1546) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:335) at hudson.remoting.Channel.send(Channel.java:417) at hudson.remoting.Request.call(Request.java:105) at hudson.remoting.Channel.call(Channel.java:551) at hudson.EnvVars.getRemote(EnvVars.java:196) at hudson.model.Computer.getEnvironment(Computer.java:736) at hudson.model.Run.getEnvironment(Run.java:1643) at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:663) at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:116) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:582) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:560) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:550) at hudson.model.Build$RunnerImpl.post2(Build.java:152) at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:528) at hudson.model.Run.run(Run.java:1267) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:122) Yeah, it appears that the build2 can't see the NFS drive again. Webmasters will need to take care of this. This is not NFS-related. java.io.IOException: SSH channel is closed. (Close requested by remote) Not sure what would cause this... According to the Hudson devs, they suggested checking the Slaves log file: https://build.eclipse.org/hudson/computer/build2/log But this didn't seem to have any information that would be useful in this case. Denis seems this bug report might have some work arounds for this particular issue. Seems to only happen if the SSH Hudson Plugin is used, if you execute via a command it at least for one user worked better. http://issues.hudson-ci.org/browse/HUDSON-3466?page=com.atlassian.streams.streams-jira-plugin%3Aactivity-stream-issue-tab I too have this problem with build2 and the job virgo.web.snapshot. I need this to run on build2 for the Sub JDK, and it attempts to run on master as expected. However,. on build2 it fails to start the job with the same error as reported here. Started by user spowell ln -s 2010-05-20_04-47-22 /opt/users/hudsonbuild/.hudson/jobs/virgo.web.snapshot/builds/7 failed: -1 Building remotely on build2 remote file operation failed: /opt/users/hudsonbuild/workspace/virgo.web.snapshot at hudson.remoting.Channel@c460c46:build2 with similar errors later on reported from the Archive steps (which of course fail anyway). The file error happens very early on; and other jobs of mine (with very similar setup) run fine in build2. I cannot see the 'workaround's in the reference JIRA issue (which is closed 'won't fix'). (In reply to comment #7) > I cannot see the 'workaround's in the reference JIRA issue (which is closed > 'won't fix'). One user commented (last comment on the JIRA issue) that the bug happened only when using an option called "Launch slave agents on Linux machines via SSH" but it worked for him when using "Launch slave via execution of command on the Master" and giving the ssh command there. (In reply to comment #8) > (In reply to comment #7) > > I cannot see the 'workaround's in the reference JIRA issue (which is closed > > 'won't fix'). > > One user commented (last comment on the JIRA issue) that the bug happened only > when using an option called "Launch slave agents on Linux machines via SSH" but > it worked for him when using "Launch slave via execution of command on the > Master" and giving the ssh command there. Thank you. I do not have any control over how slave agents are launched and only see this on one of my jobs which are launched on build2. Can this be anything to do with it? (In reply to comment #9) I take it back -- all my build2 jobs are now failing to build with the same error. The slave was in an odd state (I seem to be saying that a lot about Hudson) so I restarted it. Can you try your build(s) now? It now works again, thank you! Created attachment 169356 [details] build2 using SSH slave plugin Currently build2 is configured using the SSH Slave plugin. The alternative is to write a custom script that launches the slave, or launch the slave using the JNLP method. http://wiki.hudson-ci.org/display/HUDSON/Distributed+builds#Distributedbuilds-Differentwaysofstartingslaveagents The SSH Slave plugin itself was last updated on May 10. http://wiki.hudson-ci.org/display/HUDSON/SSH+Slaves+plugin So at this time, not much should prevent building on build2, right? I could make a nightly build, but integration builds still fail due to signing (see bug 313722). The same error is happening again. Could someone please restart build2? The 500 error is a problem with the master build server, not the slave. That is being tracked in another bug for which I cannot remember the number. I've set the master in shutdown mode, and it will be restarted once the jobs are done. It happened again: https://build.eclipse.org/hudson/job/emf-cdo-integration/498/console ;-( slave.jar decided to take a vacation. I restarted it. It happened again ;-( It looks like this error is coming back regularly. Denis, could you try the workaround suggested in Comment 13? > Denis, could you try the workaround suggested in Comment 13? I've read comment 13, and it seems to pertain to _launching_ the slave. In our case, the slave launches just fine. It just seems to die/crash/go away after a certain period. For now, I've restarted it and I'll watch the log in case it dumps anything interesting. (In reply to comment #6) > Denis seems this bug report might have some work arounds for this particular > issue. Seems to only happen if the SSH Hudson Plugin is used, if you execute > via a command it at least for one user worked better. > > http://issues.hudson-ci.org/browse/HUDSON-3466?page=com.atlassian.streams.streams-jira-plugin%3Aactivity-stream-issue-tab Denis in the above comment #6, I link to the original issue that this particular symptom was occuring on. In the issue, they were having the same problem. After a while the Slave would mysteriously stop communicating to the slave. They were using the SSH plugin. After switching away from the SSH plugin and launching the slaves either through JNLP or through a launching script, the slaves were much more stable. See link in comment #6 and alternatives to launching slaves in comment #13. I've updated build2 to be a JNLP slave, and created a simple startup script to start connect the slave. If this works I'll look at creating an actual script to start the slave at boot time. -M. I tried again to build on build2. My build stayed stuck at the beginning, so I had to kill it. In the log, I have: Started by user nbros ln -s 2010-05-29_07-22-47 /opt/users/hudsonbuild/.hudson/jobs/cbi-modisco-nightly/builds/688 failed: -1 Building remotely on build2 SCM check out aborted Archiving artifacts ERROR: Publisher hudson.tasks.ArtifactArchiver aborted due to exception java.lang.InterruptedException ... It was stuck on "Building remotely on build2" the whole time (20 minutes) before I cancelled it. Since I see "SCM check out aborted" right afterwards, I deduce it was stuck on the SCM check out. My build did run fine today. To re-test the checkout phase I wanted to wipe out the job workspace and build again. But even trying to look into the workspace with a browser gets stuck before it is shown: https://build.eclipse.org/hudson/job/emf-cdo-integration/ws/ I'm watching the console for the hudson slave job and here's something that seems odd(to me): Jun 1, 2010 10:54:50 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Jun 1, 2010 10:56:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [https://build.eclipse.org/hudson/] Jun 1, 2010 10:56:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to build.eclipse.org:53106 Jun 1, 2010 10:56:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Jun 1, 2010 10:56:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Password: Anybody know why it's asking for a password? -M. (In reply to comment #27) > I'm watching the console for the hudson slave job and here's something that > seems odd(to me): > > Jun 1, 2010 10:54:50 AM hudson.remoting.jnlp.Main$CuiListener status > INFO: Terminated > Jun 1, 2010 10:56:00 AM hudson.remoting.jnlp.Main$CuiListener status > INFO: Locating server among [https://build.eclipse.org/hudson/] > Jun 1, 2010 10:56:00 AM hudson.remoting.jnlp.Main$CuiListener status > INFO: Connecting to build.eclipse.org:53106 > Jun 1, 2010 10:56:00 AM hudson.remoting.jnlp.Main$CuiListener status > INFO: Handshaking > Jun 1, 2010 10:56:00 AM hudson.remoting.jnlp.Main$CuiListener status > INFO: Connected > Password: > > Anybody know why it's asking for a password? > > -M. Do you have Xvnc setup on the slave? Has a password been set for the slave server? Vnc is installed, but no password has been set for the slave. -M. It happened again (or something new?): Started by user estepper ln -s 2010-06-02_02-22-46 /opt/users/hudsonbuild/.hudson/jobs/emf-cdo-integration/builds/516 failed: -1 Building remotely on build2 remote file operation failed: /opt/users/hudsonbuild/workspace/emf-cdo-integration at hudson.remoting.Channel@6f8b6f8b:build2 Archiving artifacts ERROR: Failed to archive artifacts: result/site.p2/** java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:104) at java.net.SocketOutputStream.write(SocketOutputStream.java:148) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:77) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:121) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1848) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1756) at java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1258) at java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1212) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1396) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1159) at java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:335) at hudson.remoting.Channel.send(Channel.java:417) at hudson.remoting.Request.call(Request.java:110) at hudson.remoting.Channel.call(Channel.java:551) at hudson.EnvVars.getRemote(EnvVars.java:196) at hudson.model.Computer.getEnvironment(Computer.java:736) at hudson.model.Run.getEnvironment(Run.java:1643) at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:663) at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:116) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:582) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:563) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:550) at hudson.model.Build$RunnerImpl.post2(Build.java:152) at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:528) at hudson.model.Run.run(Run.java:1267) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:122) Recording test results ERROR: Publisher hudson.tasks.junit.JUnitResultArchiver aborted due to exception java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:104) at java.net.SocketOutputStream.write(SocketOutputStream.java:148) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:77) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:121) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1848) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1756) at java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1258) at java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1212) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1396) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1159) at java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:335) at hudson.remoting.Channel.send(Channel.java:417) at hudson.remoting.Request.call(Request.java:110) at hudson.remoting.Channel.call(Channel.java:551) at hudson.EnvVars.getRemote(EnvVars.java:196) at hudson.model.Computer.getEnvironment(Computer.java:736) at hudson.model.Run.getEnvironment(Run.java:1643) at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:663) at hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:117) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:582) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:563) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:550) at hudson.model.Build$RunnerImpl.post2(Build.java:152) at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:528) at hudson.model.Run.run(Run.java:1267) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:122) Emma: looking for coverage reports in the provided path: result/coverageReport.xml ERROR: Publisher hudson.plugins.emma.EmmaPublisher aborted due to exception hudson.util.IOException2: remote file operation failed: /opt/users/hudsonbuild/workspace/emf-cdo-integration/result/coverageReport.xml at hudson.remoting.Channel@6f8b6f8b:build2 at hudson.FilePath.act(FilePath.java:743) at hudson.FilePath.act(FilePath.java:729) at hudson.FilePath.exists(FilePath.java:997) at hudson.plugins.emma.EmmaPublisher.locateCoverageReports(EmmaPublisher.java:71) at hudson.plugins.emma.EmmaPublisher.perform(EmmaPublisher.java:105) at hudson.tasks.BuildStepMonitor$3.perform(BuildStepMonitor.java:36) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:582) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:563) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:550) at hudson.model.Build$RunnerImpl.post2(Build.java:152) at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:528) at hudson.model.Run.run(Run.java:1267) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:122) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:104) at java.net.SocketOutputStream.write(SocketOutputStream.java:148) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:77) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:121) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1848) at java.io.ObjectOutputStream$BlockDataOutputStream.writeByte(ObjectOutputStream.java:1885) at java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1546) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:335) at hudson.remoting.Channel.send(Channel.java:417) at hudson.remoting.Request.call(Request.java:110) at hudson.remoting.Channel.call(Channel.java:551) at hudson.FilePath.act(FilePath.java:736) ... 14 more Here's what I saw on the console:
Jun 2, 2010 2:22:04 AM hudson.remoting.Engine$2 onDead
INFO: Ping failed. Terminating the socket.
Jun 2, 2010 2:22:04 AM hudson.remoting.Channel$ReaderThread run
SEVERE: I/O error in channel channel
java.net.SocketException: Socket closed
at java.net.SocketInputStream.read(SocketInputStream.java:162)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:235)
at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2198)
at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2488)
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2498)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1273)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:856)
Jun 2, 2010 2:22:04 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Jun 2, 2010 2:22:14 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [https://build.eclipse.org/hudson/]
Jun 2, 2010 2:22:14 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to build.eclipse.org:53106
Jun 2, 2010 2:22:14 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Jun 2, 2010 2:22:14 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: The server rejected the connection: build2 is already connected to this master. Rejecting this connection.
java.lang.Exception: The server rejected the connection: build2 is already connected to this master. Rejecting this connection.
at hudson.remoting.Engine.run(Engine.java:191)
hudsonbuild@build2:~> ps -aef |grep slave
55011 29494 17131 0 08:56 pts/0 00:00:00 grep slave
hudsonbuild@build2:~> ps -aef |grep jnlp
55011 29758 17131 0 08:56 pts/0 00:00:00 grep jnlp
I've restarted the slave client, and setup the vnc password.
-M.
(In reply to comment #29) > Vnc is installed, but no password has been set for the slave. > > -M. To get rid of the password prompt the slave should have its password set for vnc. vncpasswd Could be related to: http://issues.hudson-ci.org/browse/HUDSON-6566 You aren't using the EMMA code coverage plugin for Hudson are you? (In reply to comment #31) > Here's what I saw on the console: > > Jun 2, 2010 2:22:04 AM hudson.remoting.Engine$2 onDead > INFO: Ping failed. Terminating the socket. > Jun 2, 2010 2:22:04 AM hudson.remoting.Channel$ReaderThread run > SEVERE: I/O error in channel channel > java.net.SocketException: Socket closed > at java.net.SocketInputStream.read(SocketInputStream.java:162) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:235) > at > java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2198) > at > java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2488) > at > java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2498) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1273) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) > at hudson.remoting.Channel$ReaderThread.run(Channel.java:856) > Jun 2, 2010 2:22:04 AM hudson.remoting.jnlp.Main$CuiListener status > INFO: Terminated > Jun 2, 2010 2:22:14 AM hudson.remoting.jnlp.Main$CuiListener status > INFO: Locating server among [https://build.eclipse.org/hudson/] > Jun 2, 2010 2:22:14 AM hudson.remoting.jnlp.Main$CuiListener status > INFO: Connecting to build.eclipse.org:53106 > Jun 2, 2010 2:22:14 AM hudson.remoting.jnlp.Main$CuiListener status > INFO: Handshaking > Jun 2, 2010 2:22:14 AM hudson.remoting.jnlp.Main$CuiListener error > SEVERE: The server rejected the connection: build2 is already connected to this > master. Rejecting this connection. > java.lang.Exception: The server rejected the connection: build2 is already > connected to this master. Rejecting this connection. > at hudson.remoting.Engine.run(Engine.java:191) > hudsonbuild@build2:~> ps -aef |grep slave > 55011 29494 17131 0 08:56 pts/0 00:00:00 grep slave > hudsonbuild@build2:~> ps -aef |grep jnlp > 55011 29758 17131 0 08:56 pts/0 00:00:00 grep jnlp > > I've restarted the slave client, and setup the vnc password. > > -M. > You aren't using the EMMA code coverage plugin for Hudson are you?
Well, I am!
(In reply to comment #34) > > You aren't using the EMMA code coverage plugin for Hudson are you? > > Well, I am! Try turning off the EMMA Code Coverage report plugin for Hudson for that job, and see what happens. I can try it, but I would like to mention that I have evidence of tons of good build *with* this plugin enabled for my job. My build (modisco-nightly) now hangs at the beginning when I try to build on the slave. Maybe build2 needs a restart? It now works again. No need to restart after all. Marking as resolved, since the new hudson.eclipse.org is up and running. |