Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 339247 - Hudson Job/Executor Management
Summary: Hudson Job/Executor Management
Status: RESOLVED FIXED
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: CI-Jenkins (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Eclipse Webmaster CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-08 11:44 EST by David Carver CLA
Modified: 2011-08-12 16:07 EDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Carver CLA 2011-03-08 11:44:50 EST
In order to keep some things running, one thing I noticed about the configuration of the Master node, it isn't in the "Build2, hudson-slave1, or hudson-slave2" groups.  So unless somebody specifically ties to master, their job will never run on master.

I realize that part of the disk space issue people were running into were because of the extra workspaces being left around.  I still think it would be good to look into the Distributed Workspace cleanup plugin.

While I think we have a working solution for diskspace, my concern about the current hudson-slave1 configuration is long running jobs, or jobs that get stuck (i.e. don't have the build timeout plugin enabled on the job).

If you get 6 long running jobs on hudson-slave1, we are going to run into conention issues.  This will only become worse as the release train deadline approaches.

While we have the fast lane, I can see that getting clogged down as well.

It might be nice to be able to ramp up some extra slaves on demand, but then you have the disk space issue to contend with as well.

Also should the Locks and Latches plugin be re-installed?  Locks and Latches allows finer control on the order of jobs being run.  

http://wiki.hudson-ci.org/display/HUDSON/Locks+and+Latches+plugin

We need to figure out how to make sure that long running jobs don't monopolize all the executors so that other jobs can complete.  I've seen up to 20 jobs backed up waiting for executors during peak periods.
Comment 1 Denis Roy CLA 2011-03-08 15:22:28 EST
(In reply to comment #0)
> So unless somebody specifically ties to master, their
> job will never run on master.

Yes, we decided sometime ago that we want to dedicate the master in being the master only for better stability.

> I still think it would be
> good to look into the Distributed Workspace cleanup plugin.

The Distributed Workspace cleanup plugin is installed.  I didn't see anything to configure at the global level, though...  What needs to be done?


> I've seen up to 20 jobs
> backed up waiting for executors during peak periods.

The only times I've seen 20 jobs being backed are when there was a problem -- Hudson was in a sad state and was not 'letting go' of finished jobs.
Comment 2 David Carver CLA 2011-03-08 18:21:03 EST
(In reply to comment #1)
> (In reply to comment #0)
> > So unless somebody specifically ties to master, their
> > job will never run on master.
> 
> Yes, we decided sometime ago that we want to dedicate the master in being the
> master only for better stability.
> 
> > I still think it would be
> > good to look into the Distributed Workspace cleanup plugin.
> 
> The Distributed Workspace cleanup plugin is installed.  I didn't see anything
> to configure at the global level, though...  What needs to be done?
> 

It is probably a Per job configuration.  You'll need to go through and update all jobs.

Going forward I would setup a default template job that already has this configured for creating new jobs.


> 
> > I've seen up to 20 jobs
> > backed up waiting for executors during peak periods.
> 
> The only times I've seen 20 jobs being backed are when there was a problem --
> Hudson was in a sad state and was not 'letting go' of finished jobs.

We'll see how things go as we get closer and closer to release time.  It usually starts getting pretty busy during the Months of April - June on Hudson from past experience.
Comment 3 Eclipse Webmaster CLA 2011-03-09 14:12:13 EST
Thanks to the blog post that Antoine pointed out in 319909 I put together a groovy script to walk our jobs and add timeouts and the cleanup flag.

-M.
Comment 4 Denis Roy CLA 2011-08-12 16:07:46 EDT
> We'll see how things go as we get closer and closer to release time.

Release has come and gone.  I think our setup did quite well.