Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 328286

Summary: network issues on windows7tests
Product: Community Reporter: Kim Moir <kim.moir>
Component: CI-JenkinsAssignee: Eclipse Webmaster <webmaster>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on:    
Bug Blocks: 295393    

Description Kim Moir CLA 2010-10-20 15:38:47 EDT
It looks like windows7tests can't check out code from dev.eclipse.org.  Is there a network issue going on, or does the slave need to be rebooted?

I terminated this build because it couldn't check out the code.  It usually takes about 30 seconds to do this...

https://hudson.eclipse.org/hudson/job/eclipse-JUnit/72/console
Comment 1 Eclipse Webmaster CLA 2010-10-20 16:35:48 EDT
I'm restarting the slave.  I was able to check out the basebuilder module fine on the command line(that sounds wrong but 'in the dos window' isn't any better).

-M.
Comment 2 Kim Moir CLA 2010-10-20 17:04:10 EDT
I changed to build to just be the cvs checkout

https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-JUnit/76/console

It still has the same issue - can't check out the projects from cvs
Comment 3 Kim Moir CLA 2010-10-21 13:57:17 EDT
The build was still proceeding today after failling to checkout.  I just tried to run another one and here is the error message I see

Looks like the slave is still unhappy :-(
	
FailedConsole Output
View as plain text

Started by user kmoir
Building remotely on windows7tests
hudson.util.IOException2: remote file operation failed: c:\Users\HUDSONBUILD\hudson\workspace\eclipse-JUnit at hudson.remoting.Channel@4de5a0cb:windows7tests
	at hudson.FilePath.act(FilePath.java:749)
	at hudson.FilePath.act(FilePath.java:735)
	at hudson.FilePath.mkdirs(FilePath.java:801)
	at hudson.model.AbstractProject.checkout(AbstractProject.java:1059)
	at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:479)
	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:411)
	at hudson.model.Run.run(Run.java:1273)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:129)
Caused by: java.net.SocketException: Broken pipe
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
	at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1838)
	at java.io.ObjectOutputStream$BlockDataOutputStream.writeByte(ObjectOutputStream.java:1876)
	at java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1537)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:329)
	at hudson.remoting.Channel.send(Channel.java:419)
	at hudson.remoting.Request.call(Request.java:105)
	at hudson.remoting.Channel.call(Channel.java:557)
	at hudson.FilePath.act(FilePath.java:742)
	... 9 more
ERROR: Publisher hudson.tasks.Mailer aborted due to exception
Comment 4 Kim Moir CLA 2010-10-21 15:05:49 EDT
One build worked, now I'm getting timeouts again

cvs [checkout aborted]: connect to dev.eclipse.org(172.25.25.51):2401 failed: Connection timed out
FATAL: CVS failed. exit code=1
Comment 5 Eclipse Webmaster CLA 2010-10-21 15:26:27 EDT
Hmm, that error was probably thrown as I was tweaking the DNS rules to make sure it was pointing to the right server.

-M.
Comment 6 Kim Moir CLA 2010-10-21 17:13:43 EDT
Thanks, it seems be be working now

I have a question.  Running my build, if I echo %JAVA_HOME% it points to a unix install directory which seems weird because it's a windows install

c:\Users\HUDSONBUILD\hudson\workspace\eclipse-JUnit>echo JAVA_HOME /shared/common/sun-jdk1.6.0_21_x64 
JAVA_HOME /shared/common/sun-jdk1.6.0_21_x64
Comment 7 Kim Moir CLA 2010-10-21 17:46:17 EDT
Another issue

Could you copy all the vms on this machine to a directory that doesn't have a space in it like "Program Files" does. (https://bugs.eclipse.org/bugs/show_bug.cgi?id=296290#c30)

The issue is that we run the p2.director like this to install the test bundles

<java jar="${launcherPath}" failonerror="false" dir="${eclipse-home}" timeout="900000" fork="true" output="${basedir}/director.log" resultproperty="directorcode">
			<arg line="-vm ${java.home}/bin/java" />
			<arg line="-application org.eclipse.equinox.p2.director" />
			<arg line="-consoleLog" />
			<arg line="-flavor tooling" />
			<arg line="-installIUs ${testPlugin},org.eclipse.test,org.eclipse.ant.optional.junit,org.eclipse.test.performance,org.eclipse.test.performance.win32,org.easymock" />
			<arg line="-p2.os ${os}" />
			<arg line="-p2.ws ${ws}" />
			<arg line="-p2.arch ${arch}" />
			<arg line="-roaming" />
			<arg line="-profile SDKProfile" />
			<arg line="-repository file:${repoLocation}" />
			<arg line="-destination ${eclipse-home}" />
			<arg line="-bundlepool ${eclipse-home}" />
		</java>

and it sees any space as the indicator of a new argument. I can't find an way to escape this that works.
Comment 8 Eclipse Webmaster CLA 2010-10-22 10:23:32 EDT
> I have a question.  Running my build, if I echo %JAVA_HOME% it points to a unix
> install directory which seems weird because it's a windows install

That's the setting from the master.  If you tell me which Java install should be 'JAVA_HOME' on the windows slave I'll set it's environment to match.

(In reply to comment #7)

> Could you copy all the vms on this machine to a directory that doesn't have a
> space in it like "Program Files" does.

Ok, I've copied them into c:\java (and changed the slave configure to match).

-M.
Comment 9 Kim Moir CLA 2010-10-22 12:03:08 EDT
Thanks!

>>That's the setting from the master.  If you tell me which Java install should
be 'JAVA_HOME' on the windows slave I'll set it's environment to match.

The jdk1.6.0_20 install please.

Can you also remove the "c:\program files" java location from the PATH environment variable and replace it with the "c:\java" one.  

thanks, this is very helpful
Comment 10 Eclipse Webmaster CLA 2010-10-22 13:25:30 EDT
Done.

-M.
Comment 11 Kim Moir CLA 2010-10-22 15:04:25 EDT
Thanks for the changes!

The network problems wrt to dev.eclipse.org have appeared again. The build just hangs...

https://hudson.eclipse.org/hudson/job/eclipse-JUnit/97/console
Comment 12 Kim Moir CLA 2010-10-25 11:21:20 EDT
The windows slave seems to be offline again, could you please bring it back online.
Comment 13 Eclipse Webmaster CLA 2010-10-25 11:25:16 EDT
I've restarted the slave.  Not sure what's the source of the network issues though.

-M.
Comment 14 Kim Moir CLA 2010-10-26 15:19:32 EDT
I've successfully run our JUnit tests on the windows slave. Thanks for your help!  One question:  How much CPU and memory does the Windows 7 virtual instance have?  If I run jobs on both Windows hudson instances, I assume we still have the same amount of resources and this won't make things faster.  Is that correct? (The tests take about as long to complete as they do on our test box here so I'd like to make them faster).
Comment 15 Eclipse Webmaster CLA 2010-10-26 15:31:09 EDT
Right now it has 2 cpus and 2G of ram.  More ram I can provide, but I don't have any cpus to spare.

-M.
Comment 16 Kim Moir CLA 2010-10-26 15:53:32 EDT
Today at IBM we have two Windows JUnit machines.  Each have 3GB of RAM.  So if the eclipse.org Windows 7 install could have >= 6GB of RAM, we could see how much faster the tests run.  I assume this requires bringing down the Hudson install so if you do this could you wait until my current job has finished since I'd like to check the test results.  They have been running for four hours already :-)
Comment 17 Eclipse Webmaster CLA 2010-10-27 09:52:12 EDT
I've upped the ram to 8G and restarted things.

-M.
Comment 18 Kim Moir CLA 2010-10-28 16:58:00 EDT
Seems to be having a network issue again

https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-JUnit/116/console
Comment 19 Kim Moir CLA 2010-11-05 11:09:02 EDT
Windows slave seems to be having a network issue again - could you restart it?

https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-JUnit/124/console
Comment 20 Eclipse Webmaster CLA 2010-11-05 11:46:54 EDT
I've restarted the slave.

-M.
Comment 21 Kim Moir CLA 2010-11-19 16:32:34 EST
This machine seems hung again. I can't see the workspace (just hangs) and a new job I started also just hangs and doesn't actually list any activity in the console output. Can you restart it? thanks :-)
Comment 22 Eclipse Webmaster CLA 2010-11-22 13:07:28 EST
I've restarted the machine.

-M.
Comment 23 Kim Moir CLA 2010-11-22 15:24:26 EST
I just checked it and it says it's offline :-(
Comment 24 Eclipse Webmaster CLA 2010-11-22 16:21:22 EST
OK that's irritating.  Apparently if hudson thinks the slave has timed out but not 'disconnected' the auto-service startup fails.

It's back now.

-M.
Comment 25 Kim Moir CLA 2010-11-22 17:29:27 EST
Is there a process that's holding on to the files?

https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-JUnit/130/console

It looks like the new build can't delete the old files in the the hudson workspace
Comment 26 Eclipse Webmaster CLA 2010-11-23 13:52:27 EST
(In reply to comment #25)
> Is there a process that's holding on to the files?

I don't see any.  I can stop the slave service clear the workspace and start it again if you'd like.

-M.
Comment 27 Kim Moir CLA 2010-11-23 13:59:12 EST
please do, thanks Matt.
Comment 28 Eclipse Webmaster CLA 2010-11-23 14:30:19 EST
Done.

-M.
Comment 29 Eclipse Webmaster CLA 2011-01-03 15:17:40 EST
Has this been resolved?

-M.
Comment 30 Kim Moir CLA 2011-01-03 20:14:13 EST
Well, the windows slaves still seem to go offline every so often.  Not sure what the root cause of this is.....
Comment 31 Kim Moir CLA 2011-01-06 14:51:49 EST
Two things:

The windows hudson instance is unresponsive again. Could you please restart it.

Also, please rename the job eclipse-JUnit to eclipse-JUnit-Windows

thanks :-)
Comment 32 Eclipse Webmaster CLA 2011-01-06 16:15:26 EST
I didn't restart the host, just the hudson service. and it seems to be back now.

I've renamed the job for you.

-M.
Comment 33 Kim Moir CLA 2011-02-23 21:28:06 EST
I think this can be closed. The windows machine has been very stable lately.  Stay that way please :-)