Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 334069

Summary: Polling error against scm
Product: Community Reporter: Chris Frost <eclipse>
Component: CI-JenkinsAssignee: Eclipse Webmaster <webmaster>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: violeta.georgieva, zteve.powell
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Mac OS X - Carbon (unsup.)   
Whiteboard:

Description Chris Frost CLA 2011-01-12 06:23:06 EST
Hi,

One of our build is building every 30 minutes (the polling interval) due to a source control change but we aren't making any. If you look at the scm hash it is the same for every build over the last few hours. 

This one,
https://hudson.eclipse.org/hudson/job/virgo.gemini-web-container.snapshot/

I know Hudson was rebooted in the last few hours but it doesn't seem to have had an effect.

Chris.
Comment 1 Steve Powell CLA 2011-01-13 11:16:00 EST
virgo.gemini-web-container.snapshot is building every 30 mins (Here is the GIT POLLING LOG):

Started on Jan 13, 2011 11:00:17 AM
Using strategy: Default
[poll] Last Build : #222
[poll] Last Built Revision: Revision d59a4328e9c63df5f9886ea0f0ea497fe76db78b (origin/master)
Last build was not on tied node, forcing rebuild.
Done. Took 0.49 sec
Changes found

I don't know what the problem is, but can you fix it please?

I note that the build machine is hudson-slave2 -- I'm about to change it to build2 -- but I don't know what this 'tied node' stuff is about.  I would assume that if I can schedule a build to a machine then I should be tied to it for the purposes of SCM polling???
Comment 2 Steve Powell CLA 2011-01-13 11:40:03 EST
Hi The polling log is back to normal and the job has stopped being kicked off every 30 mins.

The only change was to change the config of the job to be build2 instead of hudson-slave2.

The reason we often switch to hudson-slave2  (an explicit machine) is that sometimes one of the slaves goes bad and all jobs sent to it fail. however, the job scheduler doesn't realise this, and continues to send builds to it -- so an unpredictable set of our builds fail (all of them, if we're not lucky).  We change build2 to hudson-slave2 (or 1 if 2 is the problem) and our builds run OK.

It appears that the build2 node (it is a fictional node) is used to locate the workspace.  Thus the git polling cannot determine the previous build -- or access the workspace -- or something.

1) The particular build machine in the configuration ought not to influence the git polling mechanism -- the last build location ought to be available (and used).

2) It ought to be possible to trigger a build AND SPECIFY THE BUILD NODE TO USE at the time of triggering, and without modifying the configuration.

Thank you.
Comment 3 Chris Frost CLA 2011-01-19 05:59:40 EST
Steve, any objection to this issue closing as the polling issue is no longer occurring.

Chris.
Comment 4 Steve Powell CLA 2011-01-19 09:51:17 EST
Chris, I'd like the webmaster/buildmeister to at least LOOK at this problem -- there are questions it would be nice to have the answers to in here.
Comment 5 Eclipse Webmaster CLA 2011-01-19 14:59:03 EST
Well I've been through the logs and there isn't a whole lot there.  I see lots of:

Jan 13, 2011 2:30:17 AM hudson.triggers.SCMTrigger$Runner run
INFO: SCM changes detected in virgo.gemini-web-container.snapshot. Triggering  #206
Jan 13, 2011 2:35:26 AM hudson.model.Run run
INFO: virgo.gemini-web-container.snapshot #206 main build action completed: SUCCESS
...
Jan 13, 2011 3:30:17 AM hudson.triggers.SCMTrigger$Runner run
INFO: SCM changes detected in virgo.gemini-web-container.snapshot. Triggering  #208
Jan 13, 2011 3:35:30 AM hudson.model.Run run
INFO: virgo.gemini-web-container.snapshot #208 main build action completed: SUCCESS

So it wasn't logging any errors.

Perhaps the parametrized build trigger options would let you automate the 'switch' between the 'group'(build2) and the real machines.

Beyond that this seems to be an issue with hudson itself, so reporting this bug to them would probably be the best solution.

-M.
Comment 6 Steve Powell CLA 2011-01-20 06:41:16 EST
(In reply to comment #5)
As reported below (Comment #1) the GIT POLLING LOG had this in it:

    Started on Jan 13, 2011 11:00:17 AM
    Using strategy: Default
    [poll] Last Build : #222
    [poll] Last Built Revision: Revision d59a4328e9c63df5f9886ea0f0ea497fe76db78b
    (origin/master)
    Last build was not on tied node, forcing rebuild.
    Done. Took 0.49 sec
    Changes found

and this is still there -- the job is now building every 30 mins.
Can you tell us what the line:

    Last build was not on tied node, forcing rebuild.

means, please?  And why did you not see this in your logs?

This is the trigger for the multiple (successful) builds every 30 mins which you log trawl identifies.  If you think this is a Hudson bug, then can you report it please?
Comment 7 Eclipse Webmaster CLA 2011-01-21 16:01:25 EST
(In reply to comment #6)
> Can you tell us what the line:
> 
>     Last build was not on tied node, forcing rebuild.
> 
> means, please? 

Well the git plugin for hudson has this to say:

//If this project is tied onto a node, it's built always there. On other cases,
//polling is done on the node which did the last build.
//
if (label != null && label.isSelfLabel()) {
   if(label.getNodes().iterator().next() != project.getLastBuiltOn()) {
     listener.getLogger().println("Last build was not on tied node, forcing
rebuild.");
      return true;
   }

So my guess at what's happening is that when you 'switched' from using a
specific slave to the 'generic' container, the code couldn't determine where the
last build was, logged this message, then failed to correctly build the
job.

> And why did you not see this in your logs?

No idea, but presumably because this doesn't actually report a 'fault' it's not
logged(by the master)

>  If you think this is a Hudson bug, then can you
> report it please?

Sure.  I built a quick test job to try and replicate this.  I used the same repo as your build and set the checkout time to 5 minutes and then swapped the label between the slave group and specific slaves, but so far I haven't see this
issue.

Is there another step I might be missing?

-M.
Comment 8 Steve Powell CLA 2011-01-25 05:27:21 EST
To understand this I need a better concept of what a 'node' really means, in relation to the workspace. If hudson conflates the 'slave group name' with the 'node' and the 'workspace' names, then we are in trouble.

If it doesn't, then it ought to be able to work out where the last build was done independently of what slave or slave-group the job was run on.

I've no idea why this is happening, so cannot help you with extra steps, sorry.
Comment 9 Eclipse Webmaster CLA 2012-03-16 16:41:52 EDT
This seems like a progenitor to 372755.  We made some changes (as discussed on cross project) and it seems like this was resolved.  If you're still seeing it please reopen.

-M.