Community
Participate
Working Groups
Hudson on build.eclipse.org has been unstable for some time, and we don't really have an understanding why. This bug is to track the post-Helios effort of fixing it, and "promoting" it to a first-class service at Eclipse.org. Here are some of the strategies I can think of to make this happen: 1. Move the master to an x86_64 box. I think a good tactic here would be to run the master in a VM on build3, where it can do very little work other than being a master. 2. Instead of running a slave on build2's bare iron, perhaps we should run a pair of virtualized slaves on build2, since we can likely reuse the same VM image as the master. In setting up build2 we've learned that there are a lot of moving parts that need to be configured, and it was a seemingly endless trial-and-error process in getting it to work. --> this image could then be used on additional x86_64 hardware and allow us to easily scale Hudson for Indigo. --> this image will also allow us to sandbox entire master/slave setups to test migrations, plugins, etc. 3. Webmaster will assume the entire administration duties of Hudson (ie, no committer admins). The shared administration has allowed us to get up and running quickly to discover the possibilities made available by Hudson, and I thank the volunteers for their hard work so far. But we're at a point where stability is more important than a constant stream of features. --> this means that, like every other piece of software we run, we won't upgrade Hudson until a) there is a definite need to do so, b) it won't interfere (too much) with the committer ability to get stuff done and c) it's been proven to work --> the above also means that, if you ask for a "toy" plugin that won't add much value to the system as a whole (a picture of Chuck Norris, a moody Hudson, green balls), the answer will be no --> likewise, I highly doubt we'll run any of those plugins that notify you of a build via Tweeter, Facebank, Ourspace, IM, ICQ, IRC, Skype, cell phone call, smoke signal, morse code. --> the above also means that if you ask to upgrade pluginX from 1.3.1.1 to 1.3.1.2 just because you have an itch for the bleeding edge, the answer will be no --> the above restrictions are not meant to be 'mean' or irrational dogma... While some plugins offer cool functionality, I'd like to keep the install to a strict minimum to ensure stability, security and performance. The statements above should only serve to set expectations. That's the vision I have so far. Feel free to comment.
Sounds great, Denis. I am suspicious of the ppc JVM on build. We get different behaviors in our JUnits when we run our builds there. This could be what's happening to Hudson. But that's just a guess. Stability is a key and I'm glad you're grabbing control of it. +1.
> That's the vision I have so far. Feel free to comment. I'll just say that several of the upgrades of plugins, like git, maven, etc have been done to address issues. I do think we need to come to an agreed upon list of plugins that are "Critical for Production use" of builds. I for one think that plugins that provide the ability to view various reports like Code Coverage, Number of Bugs, Warnings, etc critical to a CI build. Also, I do think there is value in having plugins that communicate build status to committers in different ways critical as well. The reason is that there needs to be a good way to communicate when a build fails. I personally use the RSS feeds quite often, and don't particularly like email notifiction. I hang out on IRC quite often so I would find IRC notification useful as I'll get the information faster. I also think that the Extended Email notification plugin is way better than the default email notification that comes with hudson. As you can get more detail error information sent you, in an email if that is your prefered notification strategy. So, I think before making arbitray decisions on what plugins go and what plugins stay, we need to know what are the most used plugins. While seeing blue or green balls may seem cosmetic, it actually is a standard practice that hudson deviated from to show good builds as Green not blue. In general the ones that we have Enabled in the Installed plugins, are probably the ones we need to keep. The rest would be nice to have but don't provide any critical functionality that somebody needed at one time over the last year and half. Also, you may want to still allow the committers themselves to be able to create new jobs. Otherwise, you will need to watch and create the necessary jobs for every project. As it is right now only the Hudson admins have rights to create new jobs, part of being a committer should be the ability to create jobs for your project and manage them. If we take away some of the reporting plugins, then my next request is going to be to ask for a Sonar server to be setup, or that eclipse projects can participate on using the public nemo sonar server. http://www.sonarsource.org/ http://nemo.sonar.codehaus.org/
(In reply to comment #2) > I do think we need to come to an agreed upon list of plugins that are "Critical > for Production use" of builds. Of course. But we'll also need to concede that our Hudson instance cannot fully satisfy every need of every committer, otherwise we'll be constantly tinkering with it. In my experience that leads to an unstable system. In other words, some specific features requested by some committers will need to be implemented outside of Hudson, using the committers' own resources (time, mainly). For instance, since most release engineers have shell access, you can get a mail->IRC script that runs within your own home. If your script explodes, it won't take the entire Hudson instance down. We don't have a Twitter plugin for Bugzilla, MediaWiki, nor for Git or CVS so I don't see why we would *need* one for Hudson. Also, let's not forget that there are other options to building, should a shared Hudson instance not meet someone's specific needs: - install/run what you need inside your shell account on build - use your own hardware at your own facility As usual, we'll do our best to meet the requirements shared by the group. > So, I think before making arbitray decisions on what plugins go and what > plugins stay, we need to know what are the most used plugins. While seeing > blue or green balls may seem cosmetic, it actually is a standard practice that > hudson deviated from to show good builds as Green not blue. A wise man once told me we should work with the Hudson developers to get Hudson issues fixed. Seems like a plugin is a band-aid at best :-) > Also, you may want to still allow the committers themselves to be able to > create new jobs. That is a good consideration. > If we take away some of the reporting plugins Reporting is hardly what I'd consider a "toy" plugin. I'm sorry if I left you with the impression that I would simply axe everything, but that is not the case. After Helios, we will have the freedom to migrate our master and install a bunch of plugins without the looming deadlines. The table will be open. We'll be able to try stuff. But once the system is stable, we'll lock it down. It doesn't mean we won't upgrade or install plugins -- we'll just evaluate the risk/benefit, be really cautious about what we choose to install, and only do one change at a time. But that takes time, and sometimes "ya'll want it now", hence the reason to set expectations. Remember, if we have a better setup (sandbox, virtualization, VM images etc) we should be able to better test plugins to ensure they have no negative impact on our setup. And if they _do_ have a negative impact, we should be able to revert easily enough.
I'm happy to hear that Hudson will be a first class service at eclipse.org. We would like to move our builds to eclipse.org hardware completely in the 3.7 timeframe and stability is key for us. I agree with Dave that continuing to have committers other than the webmaster with the ability to create jobs would be useful.
(In reply to comment #3) > Reporting is hardly what I'd consider a "toy" plugin. I'm sorry if I left you > with the impression that I would simply axe everything, but that is not the > case. After Helios, we will have the freedom to migrate our master and install > a bunch of plugins without the looming deadlines. The table will be open. > We'll be able to try stuff. > > But once the system is stable, we'll lock it down. It doesn't mean we won't > upgrade or install plugins -- we'll just evaluate the risk/benefit, be really > cautious about what we choose to install, and only do one change at a time. > But that takes time, and sometimes "ya'll want it now", hence the reason to set > expectations. > > Remember, if we have a better setup (sandbox, virtualization, VM images etc) we > should be able to better test plugins to ensure they have no negative impact on > our setup. And if they _do_ have a negative impact, we should be able to > revert easily enough. Yep we should work with the hudson developers and the plugin developers. Hudson is designed to have plugins, and many of the core functionality that comes with hudson is developed as plugins. It's the same concept as eclipse. I wouldn't call plugins bandaids, just something that provides additional features. We do need to work with the Hudson community to help address issues that we run into, and that includes what we consider must have plugins that go beyond what the core of Hudson provides. (In reply to comment #4) > I'm happy to hear that Hudson will be a first class service at eclipse.org. We > would like to move our builds to eclipse.org hardware completely in the 3.7 > timeframe and stability is key for us. I agree with Dave that continuing to > have committers other than the webmaster with the ability to create jobs would > be useful. Woo hoo...I'm not completely nuts, just mostly. :)
+1 for having a build service stable. My additionnal comments : - master/slaves/virtualizations : i use hudson in my own company and in eclipse and there is always a common issue : when deadlines are reached. in helios we hit the maximum load and need a perfect stability. one bleeding edge is everybody is running simultaneous builds that consume more resources than 'nightlies'. is it possible or a good idea to separates builds into differentes slaves according to their purpose ? that said, a 'signing' slave ? a 'nighlty' just for nightly builds ? maybe a 'helios' and 'indigo' ??? this is just a guess. there will be more and more jobs on hudson, and we need a stable 'release train' instance that can handle the whole load whitout worry about stability. for nightlies, an instance that provides builds with some metrics may be a good idea. separates concerns is a good thing. - plugins : i think this is good to keep actual way. that said, open a request and vote for it. don't forget that more plugins used, more time is spent to build.. - commiter can create jobs : i am not sure this is a good idea to leave commiter with creation rights. having a good process is better in my opinion. again, this is my humble opinion about a build system. i can easily understand it could be really difficult to manage for admins.
Stability is good and I'm happy to hear that is a goal. In terms of plugins, the various ones that support build approaches used for projects (ant, maven/tycho, buckminster) are important. Notification of build status is important too - emails with pertinent and configurable message content to the -dev list for the committers. I would also like to see twitter update capabilities too - for consumers that are not on the dev list (yes, we recommend they be on the dev list, and it makes sense for them to be on the dev list, but you can't make 'em subscribe) and want to only know about build updates. I'm very much in favo[u]r of green/red status indicators. I'm in favo[u]r of committers being able to create/remove jobs - purely to make the interaction as frictionless as possible.
(In reply to comment #7) > I'm very much in favo[u]r of green/red status indicators. Keep in mind there might be color-blind people using build.e.org. You don't want Indigo to fail just because somebody couldn't tell if the build is red or green.
(In reply to comment #8) > (In reply to comment #7) > > I'm very much in favo[u]r of green/red status indicators. > > Keep in mind there might be color-blind people using build.e.org. You don't > want Indigo to fail just because somebody couldn't tell if the build is red or > green. Isn't that what Chuck Norris is for? No mistaking the status that way. :)
> favo[u]r Watch it, that type of foul language will not be tolerated here :-) Seriously, thanks for all the feedback so far. Dave, Nick & co. did a great job of getting us going. With the new servers and a more conservative approach we should be able to smooth this thing out with minimal pain. Matt: Let's create a new host, hudson.eclipse.org as the master on 206.191.52.xx (you decide) on build3 and that will make it much easier for folks to migrate from build.eclipse.org/hudson to hudson.eclipse.org.
> Keep in mind there might be color-blind people using build.e.org. You don't > want Indigo to fail just because somebody couldn't tell if the build is red or > green. Even as I typed that sentence I knew someone was going to (rightfully) ding me for it! I've also found the 'weather' indicators useful - when you see stormy weather all the way along for a project, it gets you kinda cautious about consuming the artifacts...
(In reply to comment #11) > I've also found the 'weather' indicators useful - when you see stormy weather > all the way along for a project, it gets you kinda cautious about consuming > the artifacts... Weather indicators are built into Hudson, no plugin involved there.
My plan is to host the master and sandbox images on build3 along with slaves for both linux(32bit) and windows(32bit). Once the initial setup is done I'll take the current slave offline and convert it into a VM host which will provide us with space to grow and a x86_64 slave. I'll also create slave on build to provide us with PPC capacity. To keep things as clean as possible on the new master we're not going to import any jobs from the current setup, instead we'll ask projects(via cross-project) to file a bug requesting the move to make it clear this is a 'new' setup and that they (may) have to adjust things. I've created bug 320543 to discuss which plugins should be available. -M.
Matt, out of curiosity, what is your timeline for rolling this out? When do you think Dave can test drive it/kick the tires?
> Matt, out of curiosity, what is your timeline for rolling this out? I'm getting a bit bored watching all the issues with build2. It does not work, and it is a huge time sink. Here is my proposed timeline: 1. Projects on build2 are to be migrated to the "new Hudson" immediately. Since they are broken half the time, they should be ideal candidates to be migrated. --> Matt, where is the documentation that explains where the "new Hudson" is, and how to migrate to it? This is Job 1 at this time. 2. We disable build2 as a Hudson slave ASAP. Friday. Let's move on this. 3. We fix issues on "new Hudson" and reach uber stability (if such a thing is possible with Hudson). 4. Aug. 23: We announce deprecation of build.eclipse.org/hudson in favour of "new Hudson" and announce "new Hudson"'s availability, options (sandbox) and caveats (plugins and restrictions). 5. We all enjoy our lives while we do something more productive than kicking Hudson. 6. October: We remove Hudson from build.eclipse.org/hudson and designate build.eclipse.org as a standalone build server, for those projects who don't want to use Hudson. Thoughts?
Removing the security advisories group, and upping priority to prove that I'm mad.
Agreed... Also Denise, we will want to grab the build of Hudson that contains the IBM infamous 500 issue resolution. Since Matt applied the work around we haven't had to reboot, but the final fix should be in the next Hudson release. I'll monitor it and let you know when it's there. Also, I've been running several slaves (winodws and linux) at work with a master Hudson instance, and so far no connection issues between them. This is with the latest versions of Hudson and a limited set of plugins. The current culprit with Build 2 appears to the EMMA Hudson plugin. Recommendation is not to install this plugin at this time on the new servers. (In reply to comment #15) > > Matt, out of curiosity, what is your timeline for rolling this out? > > I'm getting a bit bored watching all the issues with build2. It does not work, > and it is a huge time sink. Here is my proposed timeline: > > 1. Projects on build2 are to be migrated to the "new Hudson" immediately. > Since they are broken half the time, they should be ideal candidates to be > migrated. > > --> Matt, where is the documentation that explains where the "new Hudson" is, > and how to migrate to it? This is Job 1 at this time. > > 2. We disable build2 as a Hudson slave ASAP. Friday. Let's move on this. > > 3. We fix issues on "new Hudson" and reach uber stability (if such a thing is > possible with Hudson). > > 4. Aug. 23: We announce deprecation of build.eclipse.org/hudson in favour of > "new Hudson" and announce "new Hudson"'s availability, options (sandbox) and > caveats (plugins and restrictions). > > 5. We all enjoy our lives while we do something more productive than kicking > Hudson. > > 6. October: We remove Hudson from build.eclipse.org/hudson and designate > build.eclipse.org as a standalone build server, for those projects who don't > want to use Hudson. > > Thoughts?
Ack ment Denis not Denise...stupid fingers!!!
> The current culprit with Build 2 appears to the EMMA Hudson plugin. > Recommendation is not to install this plugin at this time on the new servers. Gotcha, thanks. > Ack ment Denis not Denise...stupid fingers!!! Watch it -- calling me Denise makes me mad! :)
Can't fault the goal here -- you get my vote for a stable build system! I agree with most of the direction here -- though of course I'd quibble about the plug-ins. What we really need is a way of having a rock-solid and simple master to schedule and keep stats about jobs, while allowing anything experimental to be isolated to slaves. Is there a way of having certain plug-ins, for example, limited to certain slaves? Some sort of management strategy of instability is required, since it is unlikely that complete stability is ever entirely attained. The added flexibility of making all slaves virtual machines means that we can, in principle, 'clone' slaves for capacity relief; move slaves to different hardware for capacity and demand management; and use the virtual machine boundary to isolate experimental or otherwise temperamental build environments. We ought to be able to do this without disrupting other operational slaves and without bringing the master down. Plug-ins that necessarily interfere with the master stability (EMMA plug-in anyone?) should be vetoed until they are proven -- which means that a hudson test-bed ought to be run in parallel. By making the master and slaves VMs this test-bed could be attempted with little or no drain on hardware resources, and allow us to trial builds (at much-reduced performance, of course) to help preserve the stability of the main production build systems. Whoops -- more than two penn'orth. Thanks for the great staff-work Denis[e] :-)
Matt, during testing of the new system it looks like XVNC is not setup correctly. The following is being displayed: You will require a password to access your desktops. Password: Password too short Starting xvnc [workspace] $ vncserver :11 You will require a password to access your desktops. Password: Password too short Starting xvnc [workspace] $ vncserver :12 You will require a password to access your desktops. Password: Password too short Starting xvnc [workspace] $ vncserver :13 You will require a password to access your desktops. Password: Password too short FATAL: Failed to run 'vncserver :13' (exit code 1), blacklisting display #13; consider checking the "Clean up before start" option java.io.IOException: Failed to run 'vncserver :13' (exit code 1), blacklisting display #13; consider checking the "Clean up before start" option at hudson.plugins.xvnc.Xvnc.doSetUp(Xvnc.java:100) at hudson.plugins.xvnc.Xvnc.doSetUp(Xvnc.java:98) at hudson.plugins.xvnc.Xvnc.doSetUp(Xvnc.java:98) at hudson.plugins.xvnc.Xvnc.doSetUp(Xvnc.java:98) at hudson.plugins.xvnc.Xvnc.setUp(Xvnc.java:73) at hudson.model.Build$RunnerImpl.doRun(Build.java:132) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416) at hudson.model.Run.run(Run.java:1257) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:129) In the past we had to su to the Hudson userid and set the xvnc password in the .rc files. Unless this is fixed no, UI tests will run that want to use the xvnc plugin to do it.
> What we really need is a way of having a rock-solid and simple master to > schedule and keep stats about jobs, while allowing anything experimental to be > isolated to slaves. Is there a way of having certain plug-ins, for example, > limited to certain slaves? I think having slaves with specific plugin sets makes sense, provided the slave doesn't affect the master's stability. Ultimately, what I want to avoid is spending tons of time troubleshooting build issues. That is all. Matt and I are but a team of two people, and the constant struggle with Hudson is not productive for either us or for you. > which means that a hudson test-bed ought to be run in parallel Again, such a setup will incur a demand for human resources. We intend on running a sandbox Hudson to test plugins to be deployed to the production area, but to run a parallel master/slave setup which is open to committers, and where we install plugins promiscuously will bring us back to square 1 real quick. Everyone will want to use the test-bed for production builds because it is feature rich, yet be frustrated by its lack of stability. In the end, we must acknowledge that the Eclipse.org-supported Hudson instance will not be able to satisfy the itch of every committer.
(In reply to comment #22) > Again, such a setup will incur a demand for human resources. We intend on > running a sandbox Hudson to test plugins to be deployed to the production area, > but to run a parallel master/slave setup which is open to committers, and where > we install plugins promiscuously will bring us back to square 1 real quick. > Everyone will want to use the test-bed for production builds because it is > feature rich, yet be frustrated by its lack of stability. I thoroughly agree. I was reasoning out-loud to argue the case for the hudson master to be a VM as well as the slaves. Your test-bed is sensible -- but making the hudson master a VM means you can create/tear-down your 'test-bed' as and when you like; you don't have to maintain it -- it can always be a clone of the real one; and you can fiddle with it and break it as much as you like. I would certainly suggest you keep it to private access only. Experiments would be run by prior arrangement and for the purposes of testing hudson plug-in and feature enhancements only. It is this sort of need (which is simply a process by which new features can be introduced without seriously interrupting the flow of existing builds) which meant that I'm suggesting running the master on a VM as well as the slaves -- to allow this sort of play as, and when, the resources are found to make an enhancement. It also reduces your workload -- the test-bed need not be maintained beyond its need. As a committer I agree that all of my itches do not need to be scratched. (Provided the other committers make the same agreement.) I have a vested interest in keeping the workload of the administrators as small as possible -- but no smaller.
(In reply to comment #22) > > What we really need is a way of having a rock-solid and simple master to > > schedule and keep stats about jobs, while allowing anything experimental to be > > isolated to slaves. Is there a way of having certain plug-ins, for example, > > limited to certain slaves? > > I think having slaves with specific plugin sets makes sense, provided the slave > doesn't affect the master's stability. > > Ultimately, what I want to avoid is spending tons of time troubleshooting build > issues. That is all. Matt and I are but a team of two people, and the constant > struggle with Hudson is not productive for either us or for you. Just a clarification. The Master controls the plugins that can be run. The slaves don't control it beyond simple configuration of where appropriate binaries are located on that slave machine (i.e. java runtimes, git, etc). Everything else is serialized from hudson to the slave if the plugin needs to run on the slave. If a plugin doesn't want to be allowed to run on a particular slave (i.e. XVNC can't run on Windows boxes), it has to specify that specifically.
(In reply to comment #15) > --> Matt, where is the documentation that explains where the "new Hudson" is, > and how to migrate to it? This is Job 1 at this time. I've updated the wiki(wiki.eclipse.org/Hudson) to explain where the new instance will be hosted(https://hudson.eclipse.org/hudson) and how to get your job(s) moved. > 2. We disable build2 as a Hudson slave ASAP. Friday. Let's move on this. Ok, I'll disable the slave friday at 1pm(EDT) > 3. We fix issues on "new Hudson" and reach uber stability (if such a thing is > possible with Hudson). Dave and Kim are currently helping with shakedown testing and filing bugs for issues they encounter. -M.
> > 2. We disable build2 as a Hudson slave ASAP. Friday. Let's move on this. > > Ok, I'll disable the slave friday at 1pm(EDT) Well, to be fair to the projects on that slave, we should notify them first. I think a message to cross-project-issues dev, outlining the problem, the above doc and the migration path is what's needed. It will also serve as an introduction to the New Hudson. At this point we should push back build2's demise to Wednesday, or next Friday. Agree? Thanks for posting an update.
(In reply to comment #17) > Also Denise, we will want to grab the build of Hudson that contains the IBM > infamous 500 issue resolution. Since we didn't have this workaround when the new instance was setup, I chose in run Hudosn with the Sun 1.6r21 JDK . > The current culprit with Build 2 appears to the EMMA Hudson plugin. > Recommendation is not to install this plugin at this time on the new servers. I agree. (In reply to comment #26) > Well, to be fair to the projects on that slave, we should notify them first. I > think a message to cross-project-issues dev, outlining the problem, the above > doc and the migration path is what's needed. It will also serve as an > introduction to the New Hudson. At this point we should push back build2's > demise to Wednesday, or next Friday. Agree? Sure, I'll send out a notice. -M.
>Here is my proposed timeline: > > 1. Projects on build2 are to be migrated to the "new Hudson" immediately. > Since they are broken half the time, they should be ideal candidates to be > migrated. > > --> Matt, where is the documentation that explains where the "new Hudson" is, > and how to migrate to it? This is Job 1 at this time. STEP 1 is done. > > 2. We disable build2 as a Hudson slave ASAP. Friday. Let's move on this. STEP 2 is done. > > 3. We fix issues on "new Hudson" and reach uber stability (if such a thing is > possible with Hudson). STEP 3 is done, right? hudson.eclipse.org seems to be stable? > 4. Aug. 23: We announce deprecation of build.eclipse.org/hudson in favour of > "new Hudson" and announce "new Hudson"'s availability, options (sandbox) and > caveats (plugins and restrictions). That was yesterday. Are we there yet?
> > STEP 3 is done, right? hudson.eclipse.org seems to be stable? I believe so. I've just been waiting to hear back from Steve Powell about Ivy as that's the only issue I'm aware of. At this time 99% of jobs should just work. > That was yesterday. Are we there yet? If we can fix the Ivy issue 'later' then yeah we're there. -M.
Dear -M, Don't hold off closing this bug on my account! It's true that none of my builds are working, but that's OK because we have to get round to enabling the use of a proxy for our S3 dependency downloads. It is not something we have ever had to do, so we don't know how it works, but I'm sure we can arrange something -- it's another bug in any case. Where did Step 4 get done? (I missed it.)
> Where did Step 4 get done? (I missed it.) It hasn't -- but we've confirmed that we are there. We are so there.
Matt -- for convenience's sake, can we a) enable http://hudson.eclipse.org -> https://hudson.eclipse.org/hudson ? Right now I just get a timeout b) enable https://hudson.eclipse.org -> https://hudson.eclipse.org/hudson ? Right now I get a 403 Forbidden
(In reply to comment #32) > Matt -- for convenience's sake, can we > > a) enable http://hudson.eclipse.org -> https://hudson.eclipse.org/hudson ? > Right now I just get a timeout > > b) enable https://hudson.eclipse.org -> https://hudson.eclipse.org/hudson ? > Right now I get a 403 Forbidden Done. (In reply to comment #31) > > It hasn't -- but we've confirmed that we are there. We are so there. I'm writing the notice now. -M.
I'm going to close this bug as it has served its purpose. We do have a few teething issues, but those are being worked on via the respective bugs. -M.
So close it.