Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 363891

Summary: builds frequently triggered due to "content change" when content has not changed
Product: [Technology] Hudson Reporter: David Williams <david_williams>
Component: CoreAssignee: Winston Prakash <winston.prakash>
Status: REOPENED --- QA Contact:
Severity: major    
Priority: P3 CC: denis.roy, mistria, mknauer, thanh.ha
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Whiteboard:
Attachments:
Description Flags
contents of jobs/indigo.runReports none

Description David Williams CLA 2011-11-16 00:11:19 EST
I have some hudson jobs that are (supposed to be) triggered by a file's content changing. Hudson "reads" the content, via URL ... not sure exactly what mechanism is uses. 

Problem is, occasionally I've noticed it just builds and rebuilds when the content has not changed. 

For example, see 
https://hudson.eclipse.org/hudson/view/Repository%20Aggregation/job/indigo.runReports/

Today it built 10 times (on 11/15) even though content has not changed since 11/1. Since 11/1, it has built a few times nearly every day, apparently. 

While this may be simply "a hudson bug", thought I'd report here in  infrastructure, in case there were any thoughts on what might make the file "appear" to change? As far as I can tell, the files access/modify/create times have not changed. Anything that could make them change "temporarily" and then change back? 

full path is 
/home/data/httpd/download.eclipse.org/releases/maintenance/content.jar

$ stat content.jar
  File: `content.jar'
  Size: 2334482         Blocks: 4568       IO Block: 32768  regular file
Device: 19h/25d Inode: 149930006   Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 7336/david_williams)   Gid: ( 8403/callistoadmin)
Access: 2011-11-01 13:42:51.000000000 -0400
Modify: 2011-11-01 13:42:51.000000000 -0400
Change: 2011-11-01 13:42:51.000000000 -0400

I killed one job, then did a "clean/remove workspace" in an effort to set things right ... guess we'll see if it keep happening.
Comment 1 David Williams CLA 2011-11-16 02:00:30 EST
Killing one job, wiping workspace did not seem to help anything ... it has start re-building (nearly) continuously. I'll let it run like that for a bit, in case it helps you spot anomalies in the logs, or something, but otherwise will disable it on Wednesday ... give it a rest for a while.
Comment 2 Denis Roy CLA 2011-11-16 11:31:10 EST
I get the feeling that Hudson may not be optimized to operate on NFS-mounted filesystems.  Bug 363652 is another filesystem issue.  Moving to the Hudson team for consideration.
Comment 3 David Williams CLA 2011-11-16 11:52:04 EST
Ok, thanks. I have disabled the job for now ... since it's ran 10 times already today. Since this effectively prevents me from using the "automatic" build, I'll change severity to "major". 

It seems to effect similar jobs in a similar way, only not as bad ... such as build only a couple of times per day. Such as see 
https://hudson.eclipse.org/hudson/view/Repository%20Aggregation/job/juno.runReports/
Comment 4 Winston Prakash CLA 2011-11-16 13:48:54 EST
David could you add your job configuration file. BTW, I guess this happens only when the files are in a NFS filesystem right?
Comment 5 Winston Prakash CLA 2011-11-16 13:49:42 EST
(In reply to comment #4)
> David could you add your job configuration file. BTW, I guess this happens only
> when the files are in a NFS filesystem right?

I meant attach your job configuration file for me to take a look at it.
Comment 6 David Williams CLA 2011-11-16 14:19:30 EST
Created attachment 207111 [details]
contents of jobs/indigo.runReports

I'll attach whole contents of /shared/jobs/indigo.runReports, in case it helps, though you did just ask for the config.xml. 

I did find looking in that directory interesting, as there is a file named 
url-change-trigger-oldmd5
so that implies it is using md5 to decide if/when the file has changed. 

And ... yes, as far as I know only NFS ... I have a parallel "test system" that does not have this problem ... and on it, all the files on are same file, literal, file system (though, of course, there are several differences ... such as, my test system is not "up and running" nearly as often).
Comment 7 David Williams CLA 2011-11-22 01:09:51 EST
Just to give some observations ... the "juno.runReports" had been running "continuously" quite a bit the last few days (even though no change in content) and I changed the content, it triggered a build, but then others were triggered after that, when content didn't change. I "manually" checked md5sum on
/home/data/httpd/download.eclipse.org/releases/staging/content.jar
and it matched the value in 
/shared/jobs/juno.runReports/url-change-trigger-oldmd5
So, neither changed, but kept building as though it had. 

I think noticed the "URL" I was using wasn't quite right, in configuration page it was 
file:/home/data/httpd/download.eclipse.org/releases/staging/content.jar
but I think the proper form is 
file:///home/data/httpd/download.eclipse.org/releases/staging/content.jar

So, changed it, saved it, and that seemed to stop the "continuous builds" (at least in the hour or so I watched it ... not long. So, thought maybe I'd "fixed" it by improving the URL syntax ... but then, to my surprise, I see that Hudson, somewhere along the line, changed it back to 
file:/home/data/httpd/download.eclipse.org/releases/staging/content.jar

So ... no closer to understanding causes/reasons, but, thought I'd report on what I'd observed and tried.
Comment 8 David Williams CLA 2011-11-28 14:44:37 EST
FYI, after noticing today these jobs were still "run away jobs", building over and over again, though no change in content, I tried changing the "URL" to the HTTP form, such as 

http://download.eclipse.org/releases/staging/content.jar 

but still no joy. Over and over. 

I have had to disable the jobs. In case its not obvious, this did use to work just fine ... something seems to have changed in recent releases?
Comment 9 Winston Prakash CLA 2012-02-10 20:31:15 EST
The issue seems to be due to Gerrit Plugin which is triggering the builds incorrectly. Matt disabled the Gerrit trigger plugin, so the spurious builds should not happen. If it happens again, let me know.

Mean while I'm looking at the Gerrit Pugin to find out why it triggers like that.
Comment 10 David Williams CLA 2012-02-22 12:11:41 EST
I don't think this was fixed by the Gerrit Plugin removal (or, its been added back in?). 

I tried to verify by re-enabling, "trigger on content change" in the "juno.runReports" job: 


https://hudson.eclipse.org/hudson/view/Repository%20Aggregation/job/juno.runReports/ 

You can see it runs numerous times per day, even though the critical file, content.jar, has not changed since the 16th: 

       [12:00:57] david_williams@build:/home/data/httpd/download.eclipse.org/releases/staging
 
$ stat content.jar
  File: `content.jar'
  Size: 2111379         Blocks: 4136       IO Block: 32768  regular file
Device: 19h/25d Inode: 95412227    Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 7336/david_williams)   Gid: ( 8403/callistoadmin)
Access: 2012-02-16 23:55:47.000000000 -0500
Modify: 2012-02-16 23:55:47.000000000 -0500
Change: 2012-02-16 23:55:47.000000000 -0500

Want me to leave enabled so you can 'investigate' easier? Soon? Else, I'll switch back to the more "manual" mode so as to not eat up resources.
Comment 11 Denis Roy CLA 2012-02-22 13:40:30 EST
David, you're using this as the URL for the "Build when a URL's content changes":

file:/home/data/httpd/download.eclipse.org/releases/staging/content.jar

That filesystem is mounted on the Hudson master, but I'm not sure if it's mounted on all the slaves.  In this case, would it make more sense to use http:// ?
Comment 12 David Williams CLA 2012-02-22 14:17:05 EST
(In reply to comment #11)
> David, you're using this as the URL for the "Build when a URL's content
> changes":
> 
> file:/home/data/httpd/download.eclipse.org/releases/staging/content.jar
> 
> That filesystem is mounted on the Hudson master, but I'm not sure if it's
> mounted on all the slaves.  In this case, would it make more sense to use
> http:// ?

Ok ... used to work as file:// (months ago) and I have tried http:// in the past (when first having the problem) ... but, given many other changes and complications it makes sense to try http:// now, so, I've changed it to 

http://download.eclipse.org/releases/staging/content.jar

We should no more by end of day (since I don't plan on changing it today ... it might build once, with "changed URL" .... but, should stop after that. 

Thanks.
Comment 13 David Williams CLA 2012-02-22 18:31:37 EST
Interesting results ... sort of. 

When I changed to "http://" I canceled the currently running job (and one was pending that I also canceled) ... no new builds were started (for an hour or so that I checked. Good so far. But being one to not leave well enough alone, I "manually" started a job to see what would happen. That seems to have re-started the build-for-no-reason pattern over again. 

I do notice, that a "pending" job starts, before the current job finishes. I wonder if there is something that "while its building" there is some check or state that goes astray (that is, if it was a "real quick" job, maybe this wouldn't happen. I do not know how often Hudson checks for a change, but this job takes about an hour to complete. (FYI, it does "read" from the content.jar, but would not change it, in case that helps). 

I'll switch back to "manual" soon to avoid the "runaway" jobs, but feel free if/when you want me to re-enable "content change" so it can be "watched" while its running (for someone who knows what to watch for).
Comment 14 David Williams CLA 2012-02-22 18:50:44 EST
Another (possible) "hint" ... this particular file that's check is fairly large, about 2 Megs. It is used as "indicator" since it is the file "that matters". I suppose there could be bugs with handling files that large? Which in my spare time, I could test by creating/using a small "indicator-only" file. 

Just wanted to mention that size variable.
Comment 15 Winston Prakash CLA 2012-02-22 21:55:23 EST
This functionality is provided by URL Change plugin, which downloads the file every minute and check if there is any change. However, this plugin is deprecated in favor of another plugin called URLTrigger plugin. This plugin is more accurate and gives option to check last modification date or URL content. However URLTrigger plugin is a Jenkins plugin. If needed we could fork the plugin and make it work for Hudson
Comment 16 David Williams CLA 2012-02-23 01:54:09 EST
(In reply to comment #15)
> This functionality is provided by URL Change plugin, which downloads the file
> every minute and check if there is any change. However, this plugin is
> deprecated in favor of another plugin called URLTrigger plugin. This plugin is
> more accurate and gives option to check last modification date or URL content.
> However URLTrigger plugin is a Jenkins plugin. If needed we could fork the
> plugin and make it work for Hudson

Well, I've moved back to my "semi-manual" method, so I can't say I NEED it. 

But, seems URL change sounds like a URLTrigger function would be useful, and I am fine carrying that as a "feature request". 

But, will ask ... should the URLChange plugin be uninstalled/not used? Perhaps it works for some people ... and, recall, it did used to work for me ... for years .... but obviously somethings changed. Is there a way, to query on cross-project list, or, better, by grepping the config.xml files? If no one uses it, I'd say remove it and avoid the chance of "runaway jobs" ... but, if some use it currently, and it truly works for them, then guess there's not too much harm leaving it there for them. (In short, I'd lean toward removing .. but don't think it's critical to remove). 

I am a little glad that one Gerrit Plugin wasn't causing all the unrelated job triggers by itself ... that sounds like a bigger bug!
Comment 17 David Williams CLA 2013-05-08 17:12:41 EDT
For the record, saw this bug again today ... a couple of jobs "running continuously". Thanh will look into other ways of triggering the jobs ... but "URL Change" should be disabled ... or come with a warning ... or be replaced. 

How's progress on that Jenkins plugin? :)
Comment 18 David Williams CLA 2013-05-08 17:16:24 EDT
I don't know if it was a coincidence of something else webmasters did, but after disabling that job ... which was "constantly" checking a CGit URL ... suddenly Hudson was responsive again!
Comment 19 Winston Prakash CLA 2013-05-08 22:46:06 EDT
(In reply to comment #18)

Hi David, since I did not hear from you over a year I thought there is no problem and didn't look in to it. I looked at the code now

This is how the URL Change Trigger works

- Every minute, the file content is downloaded by the plugin
- MD5 is computed from the file content and the content itself is discarded
- The MD5 is then written to a file <hudson-home>/jobs/<job-folder>/url-change-trigger-oldmd5
- If the newly computed MD5 is different than the old one written to the file, then the job is triggered.

Question we need to ask and answer is

Why every minute the MD5 of the file is computed differently if the content of the file did not change? Could a network issue caused the contents to be downloaded incorrectly and causing the MD5 to be computed differently?