Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 332552

Summary: Set up a high-priority slave
Product: Community Reporter: Denis Roy <denis.roy>
Component: CI-JenkinsAssignee: Denis Roy <denis.roy>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: david_williams, d_a_carver, kim.moir, webmaster
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Whiteboard:
Attachments:
Description Flags
screenshot none

Description Denis Roy CLA 2010-12-14 12:26:08 EST
Currently, slave1 and slave2's build queues are full with build jobs of all kinds, including nightlies.  If a project needs to put out a release build, they must wait.

Let's set up a "slave3" but dedicate it to fast-lane builds.  I'm not sure how we can call it, but we need to make it obvious that nightly builds and tests shouldn't run here -- this is the fast lane.  We'll run this "slave3" on the same physical machine as the relatively idle master, so it will have access to more idle CPU time.
Comment 1 David Carver CLA 2010-12-14 12:51:40 EST
Only issue I can see here...when Release time comes around, every project that is on the release train is going to try and use Slave 3.  So you will get a backlog happening there.
Comment 2 David Carver CLA 2010-12-14 12:52:28 EST
Additional concerns, how many concurrent executors will Slave3 have?   If a job takes 6 hrs to run or longer, it'll tie an executor up for that entire time.
Comment 3 Kim Moir CLA 2010-12-15 09:40:29 EST
It would be great to have another slave.  Not sure how you will enforce "fast lane" use given that by default all slaves in the install are available for use.  Another regular slave might just solve the resource issue.
Comment 4 David Carver CLA 2010-12-15 11:04:35 EST
I agree with Kim, just add another regular slave to the existing "build" group, and if projects are setup right, their jobs will try to run on any available slave in that group.
Comment 5 Denis Roy CLA 2010-12-15 11:12:00 EST
That's all swell, but there is concern that during a release, while everyone is trying to put bits out, a bunch of nightlies or CI builds (which could perhaps otherwise wait a while) will just occupy the queue and processing resources.
Comment 6 Denis Roy CLA 2010-12-15 11:13:07 EST
> Not sure how you will enforce "fast lane" 

I didn't intend to.  I figured social convention would take care of that.



Closing as RESOLVED/STUPID IDEA.
Comment 7 Kim Moir CLA 2010-12-15 11:25:29 EST
It wasn't a stupid idea :-)  I just wasn't sure how you were going to enforce it. If I really needed a slave during a release and the queue was full of nightly builds, I would personally just send a note to cross project and ask if a nightly build could be stopped to allow me to build :-)
Comment 8 David Carver CLA 2010-12-15 11:29:28 EST
Not sure if this is do able, but if release time is a concern, then you might actually want a separate farm of slaves, just for release.  They can be managed by the same hudson instance, but a new release specific job can be created and tied to that particular farm.

It's like having test, qa, and release environments but with build servers.  it all depends on how complex we want to get.
Comment 9 David Williams CLA 2010-12-16 11:39:10 EST
I actually like this idea. I'm not sure it would solve the entire problem of "crunch time", but I think it might be a step in the right direction. 

We committers and release engineers must start to develop a culture that includes priority and limited limited resources. I know it usually seems like we have unlimited resources ... because Denis does such a good job of accommodating us :) ... but, every year we end up filling up all available resources at some point, and at some point, it will make us miss our dates, or at least add a lot of last minute stress, confusion and uncertainty. 

As one example, almost no one sets "niceness" ... I understand no one wants to "be last", and also, I've heard, there's nothing in Hudson that helps with priorities ... so, seems to do priority-by-slave might be a start. 

And, I'd say it'd be fine to have some strict rules, in addition to "social conventions". For example, I think we could say high priority slave is for builds completing in under 20 minutes, at most, and jobs over that time will be automatically killed. (Then, we'd probably set the the auto-time-limit to 30 minutes, just to accomiate variability). And then there would still be social conventions to say should not do nightlies there (even if short and quick). 

I think the benefit of this approach is that there is some "reward" to encourage people to do the right/best thing ... short quick builds can be done on HOV fast lane server.  

If we don't do something like this, I think we'll end up with the stick instead of the carrot ... such as we'll tell that pesky webtools team their builds just take too much resource and we'll have to publically shame them into making improvements :) Or, we'll scurry around at last minute asking people if we can kill those equinox nightly tests :) 

So, again, I don't think this would completely solve the whole problem of having traffic jams at releases and milestones ... but, might help get people thinking in terms of the resources they use and how to prioritize their work. 

Another approach -- to increase awareness -- might be to track how much each project uses the build server's CPU or resources ... similar to how we track disk space usage. But ... not sure how to do that sort of tracking, exactly. Might be hard to implement anything meaningful.
Comment 10 Denis Roy CLA 2011-01-24 09:31:15 EST
Created attachment 187424 [details]
screenshot

Matt has set up a slave called 'fastlane' which resides on the same host as the master, so it has plenty of CPU cycles.  However, since most jobs are configured to use any slave in the group, some nightly jobs are using the fastlane as depicted in the screenshot.

Matt, is there a way we can mark fastlane as not being part of the regular node group, so that jobs must be tied to it explicitly?
Comment 11 Eclipse Webmaster CLA 2011-01-24 11:17:10 EST
I've set it's availability to: 'Leave this machine for tied jobs only' , which seems to be our only option.

I've taken a look at the jobs that have run, and I don't see any indication that they are 'tied' to this node, so they should simply run elsewhere.

-M.
Comment 12 Denis Roy CLA 2011-02-23 08:44:23 EST
We're done here.  Thanks Matt.