Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 366672

Summary: "remote" access problems and classcast exceptions on slaves
Product: Community Reporter: David Williams <david_williams>
Component: CI-JenkinsAssignee: Eclipse Webmaster <webmaster>
Status: RESOLVED WONTFIX QA Contact:
Severity: blocker    
Priority: P3 CC: denis.roy, john.arthorne, matthias.sohn, mheitz
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Whiteboard:

Description David Williams CLA 2011-12-14 03:27:01 EST
As reported on cross-project list, early early morning of 12/14, both slave 1 and fastlane do not seem to be working. 

Such as job 

https://hudson.eclipse.org/hudson/view/Repository%20Aggregation/job/juno.runAggregator/177/console

failed with 

FATAL: cannot assign instance of hudson.scm.CVSSCM$1 to field hudson.FilePath$FileCallableWrapper.callable of type hudson.FilePath$FileCallable in instance of hudson.FilePath$FileCallableWrapper

job 
https://hudson.eclipse.org/hudson/view/Repository%20Aggregation/job/juno.runAggregator/178/console

failed with 
hudson.util.IOException2: remote file operation failed: /opt/users/hudsonbuild/workspace/juno.runAggregator at hudson.remoting.Channel@547dd60c:Fastlane
	at hudson.FilePath.act(FilePath.java:754)

while "master" still seems to be working, I'm marking as a "blocker" since I doubt everyone using "master" is a viable work around. 

Just wanted to be sure the issue was captured in a bugzilla, for tracking.
Comment 1 David Williams CLA 2011-12-14 08:11:31 EST
at least some people seem to be building on slave 1 this morning ... so, not sure if temporarily problem ... or something specific to the "job" I run? In either case, seems more a "normal" bug instead of "blocker". 

(Eventually, later today, I could try mine again on slave 1, if that helps narrow things down).
Comment 2 David Williams CLA 2011-12-14 10:33:41 EST
Yes, I think this is (still) a blocking issue. I tried moving back to slave-1 and the job still fails immediately. 

From the short stack track, I can not imagine its anything out of the ordinary that my particular job is doing. Maybe some "cvs connection pool" is full/blocked or something ... but, nothing under my control. 



https://hudson.eclipse.org/hudson/view/Repository%20Aggregation/job/juno.runAggregator/182/console


Started by an SCM change
Started by an SCM change
Started by an SCM change
Building remotely on hudson-slave1
FATAL: cannot assign instance of hudson.scm.CVSSCM$1 to field hudson.FilePath$FileCallableWrapper.callable of type hudson.FilePath$FileCallable in instance of hudson.FilePath$FileCallableWrapper
java.lang.ClassCastException: cannot assign instance of hudson.scm.CVSSCM$1 to field hudson.FilePath$FileCallableWrapper.callable of type hudson.FilePath$FileCallable in instance of hudson.FilePath$FileCallableWrapper
	at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2032)
	at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1212)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1953)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
	at hudson.remoting.UserRequest.deserialize(UserRequest.java:178)
	at hudson.remoting.UserRequest.perform(UserRequest.java:98)
	at hudson.remoting.UserRequest.perform(UserRequest.java:48)
	at hudson.remoting.Request$2.run(Request.java:283)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)
Comment 3 Eclipse Webmaster CLA 2011-12-14 10:43:05 EST
Both Fastlane and Slave1 have been restarted.

-M.
Comment 4 David Williams CLA 2011-12-14 11:07:38 EST
(In reply to comment #3)
> Both Fastlane and Slave1 have been restarted.
> 
> -M.

I just tried again, and still failed immediately, with an error that appears to be unrelated to anything I do. 

https://hudson.eclipse.org/hudson/view/Repository%20Aggregation/job/juno.runAggregator/183/console 


Started by user david_williams
Building remotely on hudson-slave1
hudson.util.IOException2: remote file operation failed: /opt/users/hudsonbuild/workspace/juno.runAggregator at hudson.remoting.Channel@5f9b5648:hudson-slave1
	at hudson.FilePath.act(FilePath.java:754)
	at hudson.FilePath.act(FilePath.java:740)
	at hudson.scm.CVSSCM.isUpdatable(CVSSCM.java:439)
	at hudson.scm.CVSSCM.checkout(CVSSCM.java:310)
	at hudson.model.AbstractProject.checkout(AbstractProject.java:1229)
	at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:507)
	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:424)
	at hudson.model.Run.run(Run.java:1367)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:145)
Caused by: java.io.IOException: Remote call on hudson-slave1 failed
	at hudson.remoting.Channel.call(Channel.java:659)
	at hudson.FilePath.act(FilePath.java:747)
	... 10 more
Caused by: java.lang.LinkageError: loader (instance of  hudson/remoting/RemoteClassLoader): attempted  duplicate class definition for name: "hudson/model/AbstractProject"
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:466)
	at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:151)
	at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2427)
	at java.lang.Class.getDeclaredMethod(Class.java:1935)
	at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1382)
	at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:52)
	at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:438)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:413)
	at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:310)
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:547)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1583)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1496)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1732)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1947)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1947)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
	at hudson.remoting.UserRequest.deserialize(UserRequest.java:178)
	at hudson.remoting.UserRequest.perform(UserRequest.java:98)
	at hudson.remoting.UserRequest.perform(UserRequest.java:48)
	at hudson.remoting.Request$2.run(Request.java:283)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)
Sending e-mails to: david_williams@us.ibm.com
[DEBUG] Skipping watched dependency update for build: juno.runAggregator #183 due to result: FAILURE
Finished: FAILURE
Comment 5 David Williams CLA 2011-12-14 13:14:05 EST
Still failing. 

https://hudson.eclipse.org/hudson/view/Repository%20Aggregation/job/juno.runAggregator/183/console

I'm ... guessing ... you expect me to move back to "master" for now? That you don't think a restart (of slaves and master) would help. 

One thought ... since the exception mentions "remote file operation" and CVSSCM

Building remotely on hudson-slave1
hudson.util.IOException2: remote file operation failed: /opt/users/hudsonbuild/workspace/juno.runAggregator at hudson.remoting.Channel@5f9b5648:hudson-slave1
	at hudson.FilePath.act(FilePath.java:754)
	at hudson.FilePath.act(FilePath.java:740)
	at hudson.scm.CVSSCM.isUpdatable(CVSSCM.java:439)
	at hudson.scm.CVSSCM.checkout(CVSSCM.java:310)


the recent changes (or others) would not effect our ability to get to cvs using :local: would it? I do use some "rewrite" rules to use :local: when running on build.eclipse.org, which might explain why I fail and other's don't? (I'd offer to test it ... but, hate to be in "test mode" on last day of M4 :( 

I'll move back to "master" for now, and hope it can keep up.
Comment 6 Denis Roy CLA 2011-12-14 13:26:59 EST
> I'm ... guessing ... you expect me to move back to "master" for now? That you
> don't think a restart (of slaves and master) would help. 

I believe Matt has scheduled a complete restart.

> the recent changes (or others) would not effect our ability to get to cvs using
> :local: would it? 

To Hudson, "remote" means "a command run on another computer, though an SSH connection"   At this point, I believe the master's persistent SSH link with its slaves is borked.

Stay tuned.
Comment 7 David Williams CLA 2011-12-14 15:59:01 EST
(In reply to comment #6)

> 
> Stay tuned.

Even after Hudson restart, failing with same error on slave 1. Back to master I guess. 

https://hudson.eclipse.org/hudson/view/Repository%20Aggregation/job/juno.runAggregator/184/console

Started by user david_williams
Building remotely on hudson-slave1
hudson.util.IOException2: remote file operation failed: /opt/users/hudsonbuild/workspace/juno.runAggregator at hudson.remoting.Channel@33264c47:hudson-slave1
	at hudson.FilePath.act(FilePath.java:754)
	at hudson.FilePath.act(FilePath.java:740)
	at hudson.scm.CVSSCM.isUpdatable(CVSSCM.java:439)
	at hudson.scm.CVSSCM.checkout(CVSSCM.java:310)
	at hudson.model.AbstractProject.checkout(AbstractProject.java:1229)
	at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:507)
	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:424)
	at hudson.model.Run.run(Run.java:1367)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:145)
Caused by: java.io.IOException: Remote call on hudson-slave1 failed
	at hudson.remoting.Channel.call(Channel.java:659)
	at hudson.FilePath.act(FilePath.java:747)
	... 10 more
Comment 8 Eclipse Webmaster CLA 2011-12-14 16:05:17 EST
This problem sounds like http://issues.hudson-ci.org/browse/HUDSON-6604, which is marked as fixed. 

I also see that we've run into this before(https://bugs.eclipse.org/bugs/show_bug.cgi?id=362929 ) and that a restart did the trick.

-M.
Comment 9 David Williams CLA 2011-12-14 16:14:00 EST
(In reply to comment #8)
> This problem sounds like http://issues.hudson-ci.org/browse/HUDSON-6604, which
> is marked as fixed. 

Its marked as "resolved" due to "Cannot Reproduce", is the way I read it. 

> 
> I also see that we've run into this
> before(https://bugs.eclipse.org/bugs/show_bug.cgi?id=362929 ) and that a
> restart did the trick.
> 

Yeah, bug also mentions some restarts often fixes it, for a while ... if its the same thing.
Comment 10 Matthias Sohn CLA 2011-12-22 19:50:29 EST
Hit the same problem here
https://hudson.eclipse.org/hudson/job/jgit/921/console
Comment 11 David Williams CLA 2011-12-22 20:18:50 EST
changing title to be a little more descriptive than "2 slaves down" (which was originally the concrete problem).
Comment 12 Eclipse Webmaster CLA 2011-12-23 09:22:36 EST
Well my conversation with the JBoss folks indicates that slaves eventually space out and need to be restarted.  I've also turned the executor count down further based on their comments(specifically that more than 2 seems to aggravate this kind of problem (their experience at least)).  If things get 'better' I'll look into 'splitting' slave1 into 2-3 slaves.

-M.
Comment 13 Denis Roy CLA 2013-10-18 16:31:41 EDT
We are not going to pursue this further, as HIPP removed the need for multiple slaves on one monolithic infra.
Comment 14 Denis Roy CLA 2013-10-18 16:33:27 EDT
*** Bug 370733 has been marked as a duplicate of this bug. ***