Community
Participate
Working Groups
As a 3.7 plan item, I've been working to run the Eclipse/Equinox build and JUnit tests on eclipse.org hardware. Build at eclipse.org https://bugs.eclipse.org/bugs/show_bug.cgi?id=325997 I'm now at the point where 90% of our JUnit tests run on the new Linux, Windows 7 and Mac Hudson instances at the foundation. I'm still working on addressing the remaining issues. Thanks to the webmaster team for all their hard work getting these up and running. Another portion of the build that we run on IBM machines today are the performance tests. Currently we run performance tests on six builds a week (one integration, two nightly, one maintenance, two baselines). These test results are compared with a baseline performance run so that we can identify any regressions in the current streams, identify the root cause and fix it. For instance, here are bugs that track all the performance issues that have been addressed in the ongoing 3.6 and 3.7 releases [perfs] Root bug to track 3.7 work on performance https://bugs.eclipse.org/bugs/show_bug.cgi?id=313891 [perfs] Root bug to track 3.6 work on performance https://bugs.eclipse.org/bugs/show_bug.cgi?id=286955 Currently we have four machines (two windows, two linux (SLES and RHEL) that run performance tests. We have two machines for each platform so that we can compare the results on each platform and determine if there are outliers on a single machine that can be ignored. Also, there is a database that stores the performance results. The database could reside on existing eclipse.org hardware. This bug is a request for this hardware at foundation. The machines cannot be virtualized. Because they are running performance tests, the operating system must be installed in an environment that replicates the environment that a typical Eclipse developer would have on their desktop (in rack mounted form :-)
This money is not intended for hardware that is to be maintained by the Eclipse Webmasters. That sort of hardware has an ongoing cost. The Webmaster team does, however, have a budget of their own (some of which comes from Friends of Eclipse donations). This really needs to be handled by them. Unfortunately, I can't identify a reasonable component to reassign this to. Webmaster, I assume that this is already on your list; where/how should be categorize this bug?
It should be noted that our (IBM) lab will be moving to a new office building very early in 2012. At this time, we won't have an Eclipse lab in our building with the expectation that the build will be completely transitioned to eclipse foundation hardware. Thus if the community would like to see performance tests continue on the Eclipse and Equinox projects, we need machines :-) The are quite valuable in making our software more responsive, as you can see below [perfs] Root bug to track 3.7 work on performance https://bugs.eclipse.org/bugs/show_bug.cgi?id=313891 [perfs] Root bug to track 3.6 work on performance https://bugs.eclipse.org/bugs/show_bug.cgi?id=286955
Having a machine dedicated to a single task is costly in terms of power consumption and rackspace. With Xen, I can pin a specific CPU core (or cores) to a virtual server. This means the host OS (hypervisor) will not use those CPUs for other vservers. Understanding that RAM is dedicated to a vserver as well, this should provide an environment where your test results are consistent regardless of host activity. If you'd like to try this out, I can set up a vserver this week so that you can see how it goes. How much RAM and disk space would you need, knowing that Linux vservers will have /shared mounted locally?
Should we move this discussion to Community/Servers?
I'll bring your proposal up at our eclipse team planning call tomorrow. In the past, the PMC has always stated that they need to run performance tests on bare metal hardware because there is too much variation between test runs on virtualized hardware. This isn't a high priority request, we don't plan to run performance tests on foundation hardware for 3.7. I just opened it early because I know it takes a long time to order/setup/configure hardware. Our goal is to run just JUnit tests on foundation hardware for 3.7.
I think we would need to run the baseline tests a few times and see what the variability looks like. I have no idea how the UI tests would behave in such an environment.
The ui tests are emulated via Xvnc on Linux. We are still having issues running JUnit tests on the Windows 7 box. As an aside, webmasters, the results for the performance tests are stored in a database that we use to store both baseline results and build results, and run comparisons . Currently, we use Derby.
> I think we would need to run the baseline tests a few times Matt and I will set up an environment to determine if what I'm claiming is even possible, and if it works, so that we don't waste your time. If we can get accurate, consistent performance results from the same tests running on a vserver regardless of the load on the host, then we'll feel confident that it will work for you too.
>Matt and I will set up an environment to determine if what I'm claiming is even possible, and if it works, so that we don't waste your time. Thanks! Much appreciated. Like I mentioned before, it would be nice to have the issues related to the JUnit tests resolved first before we start looking at performance tests. For instance, bug 329830 (install pkill on mac slave) and bug 335196 (shorten hudson workspace so we don't hit file path limitations on windows. :-)
Why is this a no (see bug 336561)? I think this is a valid use of funds.
(In reply to comment #10) > Why is this a no (see bug 336561)? > > I think this is a valid use of funds. Because there is an ongoing cost associated with it. Space in our rack and Webmaster's time cost money.
(In reply to comment #11) > Because there is an ongoing cost associated with it. Space in our rack and > Webmaster's time cost money. Fair enough, but can't we just petition for a decent amount of money to help alleviate the cost of money? When I envisioned the FoE Funds program, I thought things like this would be perfect for it.
I would expect the FoE money could still be used to offset the initial capital cost of new hardware. However Denis suggested the vserver route might be more cost-effective, so I think it's worth exploring that before looking at new physical hardware (see comment #3).
(In reply to comment #13) > I would expect the FoE money could still be used to offset the initial capital > cost of new hardware. Agreed. But nothing goes into Webmaster's rack without his approval.
I'm all for rephrasing this proposal to give Denis' more money to put stuff in his rack so we can have more and better hardware to run tests across the eclipse.org infrastructure. I know Denis recently had a problem with disk space that was causing some issues with hudson builds too.
(In reply to comment #15) > I'm all for rephrasing this proposal to give Denis' more money to put stuff in > his rack so we can have more and better hardware to run tests across the > eclipse.org infrastructure. That's worth considering. However, this is probably more a case of we need more time (time == money) from Webmaster, and I'm not sure how much time they have to give. > > I know Denis recently had a problem with disk space that was causing some > issues with hudson builds too. Funding Webmaster for additional disk space could probably work; though I'll have to let webmaster answer that one.
regarding comment 13, qgreed, I understand that Denis and Matt are investigating vservers. My point bug in 336561 was that we need new hardware to support our builds and funding seems to be an issue. So I was trying to suggest creative ways to raise money :-)
(In reply to comment #16) > Funding Webmaster for additional disk space could probably work; though I'll > have to let webmaster answer that one. I'll gladly file a request if he needs more hardware (see bug 335809), from hard drives to memory to iPads. Ok, maybe not iPads.
Created attachment 188483 [details] Screenshot I'm in the process of running a simple test on a vserver which has one CPU core dedicated to it. It is running on the same host as the Hudson slaves, so load on the actual machine will vary plenty throughout the day. If after 24 hours I get the same results regardless of the time of day, that should indicate your performance results should also be consistent. One caveat, however, is that the underlying disk subsystem is shared, so any 'load from disk' test will not necessarily be consistent.
> I'm in the process of running a simple test on a vserver which has one CPU core > dedicated to it. It is running on the same host as the Hudson slaves, so load > on the actual machine will vary plenty throughout the day. The results are in. I've run a test (an md5sum of a large file) throughout the day. As mentioned, the physical host is hosting both Hudson slaved, but one (of eight) CPU core was dedicated to this test vserver. The test was run 1167 times Fastest time was: 3.29s Slowest time was: 3.40s Average time was: 3.31s 95th percentile: 3.34s (only 5% were above this value) That means there's a 3% difference from average to 95th percentile, for a test that takes 3.3 seconds to run. Over 60% of the samples were within 2% of the average. In your opinion, is this accurate and consistent enough to warrant further testing?
Created attachment 188639 [details] Test data, in Open Document Spreadsheet format
Mike, comment 20 is FYI.
> In your opinion, is this accurate and consistent enough to warrant further > testing? If I understood correctly the test only involved disk I/O. How does it behave when running UI tests? Would it be possible to test with "real" (I really mean our) tests? Other important data is how the deviation looks like for shorter and longer tests (e.g. 100 ms and 1 s).
> If I understood correctly the test only involved disk I/O. No, quite the opposite. This does not touch disk I/O, only memory bandwidth and CPU cycles since the file being md5sum'd stays in RAM after the first read.
It does sound promising at least from a CPU perspective. I think our next step would be attempting to run some of our performance tests in such a setup to see how it behaves. What would that involve - setting up separate hudson slave that runs on that dedicated CPU, and only allows one job to run at once?
I'd like to replicate your current performance test environment so that, is we run performance tests on both the current and the new environment, we can make a valid comparison between both. Are your current performance tests being run on Hudson? If not, can you describe the environment we'd need to set up?
(In reply to comment #26) > I'd like to replicate your current performance test environment so that, is we > run performance tests on both the current and the new environment, we can make > a valid comparison between both. > > Are your current performance tests being run on Hudson? If not, can you > describe the environment we'd need to set up? Just waiting for a reply on this. Do you want us to go ahead and provision a Hudson slave in a dedicated CPU environment, with a single job queue?
Sorry, this bug got lost in my inbox when I was away last week :-) >Just waiting for a reply on this. Do you want us to go ahead and provision a Hudson slave in a dedicated CPU environment, with a single job queue? Today our performance tests aren't run in Hudson. However, like the JUnit tests we'd like to run them on Hudson so if you could provision a dedicated instance that would be great. Thanks!
Kim, I was thinking of setting this up with a 20G disk. You'd have about 17G usable. Is this enough? I can't imagine running performance tests would require tons of space.
Each performance run consumes 2GB or space. We run performance tests about five times a week. So that should be okay.
hudson-perf1 has been set up as a slave. It has one dedicated CPU core (Intel(R) Xeon(R) CPU E5504 @ 2.00GHz) and 1.2G of RAM. We'll increase the RAM to something more realistic as we adjust the RAM of the other vms on that host. Give it a shot, and I'll close this as fixed. If you encounter issues, please file bugs against Community/Hudson.
(In reply to comment #31) > hudson-perf1 has been set up as a slave I realize this slave has been down since our outage last week. Have you had a chance to test it out? Do you want us to power it back on?
I haven't had a chance to test it out yet. I'll be looking at running the performance tests once I have all the issues worked out running the regular JUnit tests in bug 295393. So I would say leave it off until it's needed.