| Summary: | Install the build timeout plugin on hudson | ||
|---|---|---|---|
| Product: | Community | Reporter: | Antoine Toulmé <antoine> |
| Component: | CI-Jenkins | Assignee: | CI Admin Inbox <ci.admin-inbox> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | CC: | d_a_carver, eclipse, stepper, webmaster, zteve.powell |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Mac OS X - Carbon (unsup.) | ||
| Whiteboard: | |||
|
Description
Antoine Toulmé
Are we running so low on executor threads that we need to 'force' timeouts on projects? Or do we just need to encourage more projects to make use of the slave(s)? -M. It's more like a good practice to avoid problems with builds staying stale for 2 days. It happens on the Apache infra. It may not happen on Eclipse infra (I could bring that up on cross to be sure). If it's not needed, you could keep it around as a tool. It may be of help. (In reply to comment #1) > Are we running so low on executor threads that we need to 'force' timeouts on > projects? Or do we just need to encourage more projects to make use of the > slave(s)? Can we expect bug 316883 not to happen again? I was so sick of switching my config back and forth between master and slave that I decided to stick with the master ;-( (In reply to comment #1) > Are we running so low on executor threads that we need to 'force' timeouts on > projects? Or do we just need to encourage more projects to make use of the > slave(s)? > > -M. It's not a matter of executors per say, but of builds that get stuck for various reasons. When this happens it just takes an executor and no other jobs can get it. There have been times when jobs have taken an executor for 15 hrs because they were stuck (either bad tests, waiting for signing, perm gen issues, etc). When this happens it is a good practice to set a limit for how long a job can run before it is automatically cancelled. (In reply to comment #3) > (In reply to comment #1) > > Are we running so low on executor threads that we need to 'force' timeouts on > > projects? Or do we just need to encourage more projects to make use of the > > slave(s)? > > Can we expect bug 316883 not to happen again? I was so sick of switching my > config back and forth between master and slave that I decided to stick with the > master ;-( The web masters are working on a new master/slave configuration that should help this in the near term. *** Bug 332729 has been marked as a duplicate of this bug. *** Ok, I'll install the build timeout plugin on the sandbox and we can test it to see how it behaves.
I'm going to copy the caveats from the plugins page here though:
Because Java only allows threads to be interrupted at a set of fixed locations, depending on how a build hangs, the abort operation might not take effect. For example,
* if Hudson is waiting for child processes to complete, it can abort right away.
* if Hudson is stuck in an infinite loop, it can never be aborted.
* if Hudson is doing a network or file I/O within the Java VM (such as lengthy file copy or SVN update), it cannot be aborted.
-M.
(In reply to comment #7) > Ok, I'll install the build timeout plugin on the sandbox and we can test it to > see how it behaves. > > I'm going to copy the caveats from the plugins page here though: > > Because Java only allows threads to be interrupted at a set of fixed > locations, depending on how a build hangs, the abort operation might not take > effect. For example, > > * if Hudson is waiting for child processes to complete, it can abort right > away. > * if Hudson is stuck in an infinite loop, it can never be aborted. > * if Hudson is doing a network or file I/O within the Java VM (such as > lengthy file copy or SVN update), it cannot be aborted. > > -M. I've been using this particular plugin for over a year, and have never had a problem with it. It is critical because people do not pay attention to their jobs, I've seen jobs be stuck for days. How has the trial on the sandbox gone? In response to your points Webmaster (Comment#7), what makes these sorts of failures specific to the timeout plugin? If they occur is it still possible to abort the job manually -- and in any case how is this worse than the present situation? (In reply to comment #9) > How has the trial on the sandbox gone? > > In response to your points Webmaster (Comment#7), what makes these sorts of > failures specific to the timeout plugin? If they occur is it still possible to > abort the job manually -- and in any case how is this worse than the present > situation? Can we get this to the production Hudson server? (In reply to comment #10) > > Can we get this to the production Hudson server? Sure, care to help test it first? So far it hasn't caused the sandbox to fall over(just by existing). (In reply to comment #9) > > In response to your points Webmaster (Comment#7), what makes these sorts of > failures specific to the timeout plugin? I don't think they are, but as I said in my post those caveats are from the plugins page(so it was to help set expectations). I presumed they would mean more to folks that spend more time than I do dealing with 'rogue' java processes. -M. (In reply to comment #11) > (In reply to comment #10) > > > > Can we get this to the production Hudson server? > > Sure, care to help test it first? So far it hasn't caused the sandbox to fall > over(just by existing). Give me the link, and I'll enable one of the XSL jobs to use it. Of course these are well behaving jobs so they don't time out already. :) Just give me the link the sandbox and I'll test it out. It's: https://hudson.eclipse.org/sandbox . I've created a job called 'timeout-test' you can use. -M. (In reply to comment #13) > It's: https://hudson.eclipse.org/sandbox . I've created a job called > 'timeout-test' you can use. > > -M. Okay, I tested this on the sandbox instance. Test 1 set the timeout to 3 minutes (it takes longer than that to check out the code). It timed the job out and failed the job. Test 2 changed the time out value on the job to 40 minutes, re-ran the job. It ran to completion successfully with no time outs. So it seems to work for me. The plugin has been installed. -M. (In reply to comment #15) Thank you. |