Community
Participate
Working Groups
I'm not sure what the foundation policy is on this, but it would be nice if we could leverage the eclipse community and allow a community build swarm to be developed. This would help out the entire community as a whole, and particularly the main eclipse platform (see bug 293830). Community swarm instances could come and go as needed, and in general would be of very little cost to maintain and run. If a spare older machine is available, it could be used. If a developer machine isn't being used for much beyond development, it might be able to spare a few cpu cycles for a build or two.
I may have a couple of available machines, but would need to know what is required before I commit them. Machine time is cheap, people time to futz with configurations is pretty scarce.
(In reply to comment #1) > I may have a couple of available machines, but would need to know what is > required before I commit them. Machine time is cheap, people time to futz with > configurations is pretty scarce. Some different ways to get going, all fairly simple to get a Slave up an running: http://wiki.hudson-ci.org/display/HUDSON/Distributed+builds Then we can enable the SWARM plugin for the slaves to allow them to discover the Master server: http://wiki.hudson-ci.org/display/HUDSON/Swarm+Plugin I'm in the process or reconstructing an old machine that I should be able to contribute to the effort as well.
I have a new quad-core desktop machine that can support this effort. It's my server testbed, and is under-used at the moment, so it has plenty of cycles to spare. It's already running VirtualBox so I can experiment with various OS and server process configurations, so I can load up one or more environments to run builds and tests as needed. What environments would be of the most use for this effort?
(In reply to comment #3) > I have a new quad-core desktop machine that can support this effort. It's my > server testbed, and is under-used at the moment, so it has plenty of cycles to > spare. It's already running VirtualBox so I can experiment with various OS and > server process configurations, so I can load up one or more environments to run > builds and tests as needed. > > What environments would be of the most use for this effort? I'm adding Kim on this, as she can provide some more details on exact requirements she needs to help off load some of the tests.
I'll have a machine soon that I can donate to running builds. I'm in the process of replacing it now and it should be ready for non-critical tasks within a week or two...
Here's what we have today JUNIT 2 windows XP 2 linux (RHEL 5.3) 1 mac (Leopard) 1 cvs test machine (RHEL) Could reside on the same machine as the database. JUNIT Performance 2 windows XP 2 linux (RHEL 5.3, SUSE 10) 1 performance database machine (RHEL running derby) Most of the machines are all dual core 3.0GHZ machines with 3GB of memory. The performance machines are supposed to replicate what a developer would have on their desktop and thus reveal performances issues that we wouldn't see on the latest machines. Today, each test run takes between 6-8 hours per machine. This is too long. We have 54,000 tests. More machines = faster test results. Also, our platform test coverage is limited to three platforms...soon we'll need a Windows 7 machine. Other people have requested coverage on linux.gtk.ppc for instance, but we simply lack the hardware. The build user also needs to control the display on all the machines because we run UI tests where there can't be any other UI events occurring to interfere with the tests. So the uid that the tests run as is also the uid of the person logged into X. The machines also need a mechanism to be managed remotely so they can be rebooted etc. The performance tests should be the run on the same hardware every time, so that variations in hardware don't impact the performance results. We run a baseline every week on the performance hardware. Also, they stay at the same os level so patches don't interfere with the performance results, which tend to be very sensitive.
I also have to say thank you to all of you willing to donate hardware. It will be very cool if this works out, and it will help other teams who want to run tests, not just the Eclipse & Equinox teams.
This is a great idea, Dave. I'm not sure if we could use a swarm for release builds, or for any builds, but I certainly cannot see why it can't be used for tests. From what I hear, building is not the hardest part -- it's the tests. Thanks for the initiative! I'll bounce this off the folks at the EMO to see what sticks.
(In reply to comment #8) > This is a great idea, Dave. I'm not sure if we could use a swarm for release > builds, or for any builds, but I certainly cannot see why it can't be used for > tests. From what I hear, building is not the hardest part -- it's the tests. > > Thanks for the initiative! > > I'll bounce this off the folks at the EMO to see what sticks. Even if it is just to do a two stage build...build on Hudson, then run the tests on the swarm, would be a help. I'm in the stage of testing out the SWARM plugin, and having a slave autoconfigure and connect to it...as soon as I get my wireless card working on the slave linux box. :)
I did a bit of testing with the SWARM Plugin, and having a slave auto configure itself and attach to the Master. Once I got beyond my typical headaches of configuring weird hardware for a new linux machine, getting the Hudson Master and the Slaves talking to each other was not too difficult. Basically on the Master instance you need to install the SWARM Plugin for Hudson. Make sure you configure Hudson to allow auto installation of the various Installers that are needed (i.e. ANT, MAVEN, JDKs, etc). This will allow for easier configuration for the slave instances later. Once the plugin is installed, restart your Hudson master, and it should be monitoring for new connections in the background as well as broadcasting it's UDP packets to let the slaves know it's available. Installing a slave is very simple. Download the latest CLI slave client from: http://wiki.hudson-ci.org/display/HUDSON/Swarm+Plugin Then issue the following command after downloading the swarm-client-1.1-jar-with-dependencies.jar: java -jar path/to/swarm-client-1-1-jar-with-dependencies.jar -description "Detailed Description of this SWARM Slave" -executors 3 -fsroot /tmp -labels linux-32bit-gtk -name NameOfThisSlave For the community swarm we will probably need to have slaves pass in the -master option with the Host Name or IP address of the Master hudson instance so they can connect with out searching (since they won't be nearby probably to hear the broadcast). The Executors option should be set to the number of Jobs that you want to run on a particular Slave machine. Typically set to the number of CPUs or Cores that you may have. Projects by default will run on the Master and have to be told which slaves they can use. It may take a bit to work some kinks out with the slave configuration, but really after I got my network connection issue straightened out. I had the Master and SWARM slave talking to each other in less than 3 minutes.
That's great news Dave. Yes, definitely still run the build on build.eclipse.org to take advantage of the fast network access to the eclipse.org servers. Looking at this presentation, it looks like you can label the slaves and then tie jobs to labels. This would allow us to run specific tests on specific machines. See https://slx.sun.com/1179275729 at 28 min mark
We'd be willing to permit use of about 5 machines (win xp, vista, osx 10.5, 10.6) in idle hours for this. Are you folks ready for the community to try it?
(In reply to comment #12) > We'd be willing to permit use of about 5 machines (win xp, vista, osx 10.5, > 10.6) in idle hours for this. Are you folks ready for the community to try it? I think the only thing keeping us back at the moment is permission from the powers that be (i.e. EMO), and having the necessary SWARM plugin installed. Nice thing about the SWARM is that the slaves can come and go as best fits the people contributing the machines. Anything more permanent should probably be hosted by the foundation or donated by members.
(In reply to comment #11) > That's great news Dave. > > Yes, definitely still run the build on build.eclipse.org to take advantage of > the fast network access to the eclipse.org servers. > > Looking at this presentation, it looks like you can label the slaves and then > tie jobs to labels. This would allow us to run specific tests on specific > machines. > > See https://slx.sun.com/1179275729 at 28 min mark Yes, the labels are the indicators along with the descriptions of what the slaves support (i.e. OS, JRE versions, etc.). Projects pick and choose what slaves they will allow their projects to run on. Some may never use a slave machine, but for those projects like the platform that need to parallelize their unit testing, it can be of great benefit.
> I think the only thing keeping us back at the moment is permission from the > powers that be (i.e. EMO) The EMO agrees that a swarm for _testing_ is a great idea, but insists that building activities (including producing ZIP/JAR consumables for end-users) be performed on machines that committers and project leads/PMCs control and trust. > and having the necessary SWARM plugin installed. Matt is our resident Hudson expert; he'll be able to investigate that for us.
(In reply to comment #15) > > I think the only thing keeping us back at the moment is permission from the > > powers that be (i.e. EMO) > > The EMO agrees that a swarm for _testing_ is a great idea, but insists that > building activities (including producing ZIP/JAR consumables for end-users) be > performed on machines that committers and project leads/PMCs control and trust. Agreed...and this actually opens up a possibility for a Swarm/Build ranking system. If a committer finds a good system that is reliable stable, and responsive it could get promoted to one that not only runs tests but builds as well. But I agree, this first phase needs to be primarily about distributing the tests for those long running projects so the overall community benefits.
Would the machines need to be real hardware, or could they be virtual machine? If virtual are okay I'd have the possibility to contribute Win7 Professional machine.
(In reply to comment #17) > Would the machines need to be real hardware, or could they be virtual machine? > If virtual are okay I'd have the possibility to contribute Win7 Professional > machine. Virtual Machines are fine. In fact they may be preferred in some cases as they tend to be isolated and can be brought up and down as needed.
I did go ahead and download through Hudson the SWARM plugin. However, Hudson needs a restart in order for the plugin to be enabled. There is also a Node labeler plugin we might want to consider that will give the details about the particular Node (i.e. what OS and specs it has).
We should hopefully have the SWARM plugin enabled by Wednesday. Denis I believe needs to reboot the build machine so this will effectively restart Hudson. Once this is done, we should be able to start doing some limited tests of getting some additional slave hudson instances up and running.
Hudson has been restarted with the restart of the build server. -M.
Thanks. I have verified that the plugin is installed. I'll see if I can get my test server connected to it later this evening. Kim feel free to play around with it as well. By default all existing projects will be tied to the master server, they will have to choose which swarm machines they want to run tests on (which may need to be a separate build).
With the restart I've been trying to find documentation on the security/job options for this plugin and I can't seem to locate any. Dave can you or anyone else point me in the right direction? -M.
Here is some general information: SWARM Plugin: http://wiki.hudson-ci.org/display/HUDSON/Swarm+Plugin http://weblogs.java.net/blog/2009/05/23/hudson-swarm-slave-plugin If you don't want it to be broadcasting via UDP to allow auto discovery then you should be able to have the Firewall block those UDP broadcasts. And we can force people to have to connect their slaves via direct host connection (which we probably need to do anyways). Hudson Distributed Builds: http://wiki.hudson-ci.org/display/HUDSON/Distributed+builds In particular you want to read the excellent blog articles at the bottom of the wiki entry: http://www.sonatype.com/people/2009/01/the-hudson-build-farm-experience-volume-i/ http://www.sonatype.com/people/2009/02/the-hudson-build-farm-experience-volume-ii/ http://www.sonatype.com/people/2009/02/the-hudson-build-farm-experience-volume-iii/ http://www.sonatype.com/people/2009/02/the-hudson-build-farm-experience-volume-iv/ There is more that we will need to do I'm sure as we progress with this. Hope the above helps.
Thanks Dave, I'll play with it once you have your test box connected. I can't see how to configure what swarm machines to use, since there aren't any available right now. I've created bug 295393 against me to create a build to run the JUnit tests on one platform. It's interesting that the Sonatype blog posts describe the same challenges that we face - it's difficult to run tests that require you to control the display on Windows. Currently we use RSH but this obviously isn't a solution going forward.
(In reply to comment #25) > Thanks Dave, I'll play with it once you have your test box connected. I can't > see > how to configure what swarm machines to use, since there aren't any available > right now. I've created bug 295393 against me to create a build to run the > JUnit tests on one platform. It's interesting that the Sonatype blog posts > describe the same challenges that we face - it's difficult to run tests that > require you to control the display on Windows. Currently we use RSH but this > obviously isn't a solution going forward. It looks like we need to configure Hudson so that it communicates to slaves on a particular open port. I believe the current eclipse firewall configuration is blocking some of the requests but will need the webmasters opinions on this. Here is what I tried tonight: 1. Create a slave manually and have it connect. Hudson tried to have to slave connect on port 59920, button was unsuccessful. Locally on my own network I could easily connect through the JNLP launch option for the configured test slave. 2. Tried to connect to build.eclipse.org using the SWARM plugin but it never connected. Again I think due to some firewall blocking rules that are in place. So there are a couple of options. Manually configured slaves, should always connect through a standard port that is known and has access. We need some help from the web masters to diagnose the connection issues with a SWARM instance connecting to build.eclipse.org.
Here are the port numbers I'm seeing that Hudson is using for communication with the slave servers: UDP Broadcasts: 33848 JNLP Slave Listener: Random - controlled by Hudson configuration Also it looks by default that the SWARM plugin only allows nearby connections, and may not necessarily allow external connections. Source might have to be modified (there is one if statement to comment out), in order to allow external connections.
After reading through the suggested links here's what I've found: -This tooling is meant to link slaves on networks local to the master (re: recompiling the code - we (try to) install the least amount of hacked and compiled software) -Process/Job control is patchy (no way to limit slaves to tests only) -Our server requires ssh access to your slave. That probably violates your local network security policy. -Requires consideration of who 'manages' the slave, as it's non-local to any of our infra. In the end I don't see this as a workable solution at this time. -M.
(In reply to comment #28) > After reading through the suggested links here's what I've found: > > -This tooling is meant to link slaves on networks local to the master (re: > recompiling the code - we (try to) install the least amount of hacked and > compiled software) > > -Process/Job control is patchy (no way to limit slaves to tests only) > > -Our server requires ssh access to your slave. That probably violates your > local network security policy. > > -Requires consideration of who 'manages' the slave, as it's non-local to any of > our infra. > > In the end I don't see this as a workable solution at this time. What we can do then is gather requirements for what we would need. You outlined some items above. Hudson has a plugin architecture that allows you to extend it's functionality. If the current mechanism for allowing slave machines and swarms of servers isn't the correct approach, then I think we need to start some more development in Dash to help. There is a real need to have alternate build/testing servers. It will only become more critical as more projects start to use Hudson. Any possibility of Eclipse itself adding a few more potential slave machines to the network cluster that you control?
I opened bug 295761, to consider some plugin options for Hudson and maybe developing particular plugins for the eclipse infrastructure.
Another option would be to ask the community for direct hardware donations. If the Eclipse foundation had hardware on the same network as build.eclipse.org, there wouldn't be the security issues and a new Hudson plugin wouldn't have to be developed. I think the Apache foundation accepts hardware donations... http://www.apache.org/dev/hardware-wish-list.html and you can even donate your car to them for money :-) but I think this is a US thing. http://www.apache.org/foundation/contributing.html#CarProgram Another alternative would be to use Hudson's existing Amazon EC2 plugin to run tests in the cloud.
> There is a real need to have alternate build/testing servers. How do you figure? Right now the build server is idle. From midnight to about 2:00am ET the load average was a paltry 0.11. On the weekends the machine really doesn't do much, and even during the week the machine often spends hours with load < 2.00. Hardware donations are nice, but we do need to pay for rackspace, power and cooling. For Hudson slaves, I'd suggest 1U single power supply dual-cpu six-core units with gobs of RAM. Actually, a single machine configured as such can accomplish an incredible amount of work in 24 hours. As for the EC2 cloud, that is a great idea. However, when given a vast and seemingly endless amount of resources, they tend to be used with little concern for frugality.
(In reply to comment #32) > > There is a real need to have alternate build/testing servers. > > How do you figure? Right now the build server is idle. From midnight to about > 2:00am ET the load average was a paltry 0.11. On the weekends the machine > really doesn't do much, and even during the week the machine often spends hours > with load < 2.00. The need comes from projects like the platform which need to test and build on multiple operating systems. Building source code is one thing...testing that the product works on multiple operating systems is another. Kim can give the gorey details on the issues and time that it takes to get the eclipse platform project tested. Being able to distribute the load to other servers and have that managed by a central server is something that we need to address. Right now you also have projects that are building "official" releases on non-eclipse controled systems. While that is fine, it doesn't provide a central place that the community can go to handle items. Also, as soon as a few more of the really long running project builds like the eclipse platform, webtools, and others jobs that take 6 to 8 hours to finish start using Hudson, you will start experiencing build backs and queue delays as the number of Executors is taken up. Sure we can add more executors, but there is a point of diminishing returns. I'd personally like to be able to test the XSL Tools build not only on the PPC build machine, but also on a windows, and mac machine as well just to be sure that there are no odd issues that happen with the tests on those machines. Gives me and the community that added level of quality assurance.
That all makes sense, Dave. Thanks for the clarification. Is any/all of that possible using EC2?
Denis, we need to run tests on three platforms (win, lin, mac). So this can't be accommodated on the current build.eclipse.org server. The load average on build.eclipse.org today isn't the issue. The issue is that there aren't test machines with different os configurations available for JUnit tests. I agree that running the tests in the cloud is problematic in terms of resource consumption given that the Hudson EC2 plugin as it stands today only can be configured with one certificate which is tied to a single credit card. On the other hand, the costs are low, and variable, not fixed as they would be if you bought additional rack space. And it's not like most teams need this functionality today - many are happily running their builds on hudson without additional test machines. People in the Eclipse community always complain that our build 1) Takes too long 2) Only IBMers can initiate it If we had sufficient test machines at eclipse, these issues could be resolved.
In addition to Kim's comments. (In reply to comment #34) > That all makes sense, Dave. Thanks for the clarification. > > Is any/all of that possible using EC2? Partially, the Amazon EC2 plugin allows you to spin up any Unix related OS, currently it doesn't support spinning up a Windows OS. Which is where we need a set of Slave instances of Virtual Machines that can be used for testing purposes. But for testing on a variety of Linux or Free BSD, or Open Solaris platforms it would work well. Kim has the most experience with this so knows the downsides as well as the upsides to this. As Kim said, we need to put these builds where anybody that is on a project can potentionally run and maintain them. Having knowledge silos in an open source community is not a good thing.
We need to be able to own the display when we run many of our ui tests and this is difficult in a virtualized Windows environment. I also haven't found how to run a Mac environment on EC2.
From the "For What It's Worth" dept... > On the other hand, the costs are low They are low because you're using it reasonably. When faced with a virtually limitless (and fast) CPU resource, suddenly builds and tests are no longer 'expensive' and can be done continuously -- whether or not the produced results are actually used or consumed. > [costs are] variable, not fixed Accountants much prefer fixed, predictable costs.
For the sake of experimenting, and to kick-start the process in the most inexpensive fashion, here's what I can do: I have an unused desktop PC at home. I'll bring it in and set it up inside the Foundation's office, and grant someone access to it. That person can then set it up as a trusted machine for running tests. Please write your name here if you want to be that person: _______________________ I can install a barebones Linux on it -- just tell me what you want. I do not have licenses for (ugh) Windows, but if you want Windows (*gasp*), I can buy a copy (ugh) and (ugh) install it (*faint*). Again, tell me what you want.
Denis, please put my name down to to test the machine. Linux in runlevel 5, 1.5 and 1.6 vms, and ssh installed please. I'm not a big fan of windows either. However, the according to the download stats, windows is the most popular Eclipse platform. So we need to test that platform so that we have happy users. Unlike most of the other eclipse projects, we have os-ws-arch specific fragments + native launchers that need to be tested. Otherwise, we won't find the bugs that are specific to each platform. I know that money is a concern. Unfortunately, I work the in the "give away free software" department, not the "get new hardware department". I'm going to raise these issues with my committer reps and let them know that we need test resources at the foundation and see what happens. If the community wants the Eclipse build process to be more open, they need to open their wallets :-) You say that continuous integration resources could be wasted. Catching bugs earlier is never a waste, it's a tradeoff. Today for the Eclipse and Equinox projects, we don't have continuous integration and it's really hurting us. Other projects at Eclipse can run their builds on Hudson in a continuous manner because they have smaller builds. We build for 13 platforms and run 54,000 JUnit tests. That takes more time than a single feature.
> Denis, please put my name down to to test the machine. Linux in runlevel 5, > 1.5 and 1.6 vms, and ssh installed please. I was hoping you wouldn't say this. Wouldn't a Windows machine be more useful for you? I was just guessing that since build.eclipse.org is a Linux machine (albeit ppc) with a usable display via Xvfb you wouldn't need yet another Linux box.
(In reply to comment #41) > > Denis, please put my name down to to test the machine. Linux in runlevel 5, > > 1.5 and 1.6 vms, and ssh installed please. > > I was hoping you wouldn't say this. Wouldn't a Windows machine be more useful > for you? I was just guessing that since build.eclipse.org is a Linux machine > (albeit ppc) with a usable display via Xvfb you wouldn't need yet another Linux > box. Actually, I would ask the powers that be at the foundation to pony up for a licensed version of Windows XP, Vista, or Windows 7 for the new build machine. In the long run we need VMs that are running linux, windows, mac, and solaris...as well as the other testing environments that Kim and the platform and E4 projects are going to have to test against.
But for the initial testing, I agree with Kim linux should do, as we need to make sure the slave can talk and work out the various issues that can occur with the slave setup.
I'd like to try Linux box first because it's easier to test than a Windows one. According to the Sonatype blog articles on running slaves, configuring Windows ones is quite dicey. Once we get the slave working on Linux, we could try other platforms. Today, we only run tests on Windows XP, RHEL 5 and Mac Leopard. Eventually we will run them on Windows 7. We build more than a dozen platforms, but only test on the three that make up the vast majority of downloads. Tradeoffs again :-)
> I'd like to try Linux box first because it's easier to test than a Windows one. I see, that makes sense. In the end, once you're comfortable with setting up a slave, since you can already do Linux testing on build.eclipse.org now, I would like for this box to be used for Windows testing since we don't have any Windows hardware right now. Make sense? > Actually, I would ask the powers that be at the foundation to pony up for a > licensed version of Windows XP, Vista, or Windows 7 for the new build machine. I'd like to clarify that the ultimate purpose of this new Windows machine would be for tests only. You can use Linux to build everything, right?
Yes, we build everything on Linux. Tests only on the slave.
(In reply to comment #46) > Yes, we build everything on Linux. Tests only on the slave. Including the JNI native code from SWT?
The native SWT and launcher code is compiled on the various platforms and contributed to CVS in binary format. These binary bits are then consumed by the Eclipse build. The SWT team actually uses a hudson server to do this. See http://dev.eclipse.org/blogs/iamkevb/2009/08/19/building-native-swt-libraries-with-hudson/
Due to valid security concerns the Community Swarm will not come on line. However with the new build machines we will get additional slave machines so going to resolve this one.