Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 333594

Summary: new hardware to run eclipse/equinox performance tests
Product: Community Reporter: Kim Moir <kim.moir>
Component: ServersAssignee: Eclipse Webmaster <webmaster>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: caniszczyk, daniel_megert, david_williams, gunnar, john.arthorne, mike.milinkovich, ob1.eclipse, remy.suen, wayne.beaton, webmaster
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on:    
Bug Blocks: 333655    
Attachments:
Description Flags
Screenshot
none
Test data, in Open Document Spreadsheet format none

Description Kim Moir CLA 2011-01-05 14:02:18 EST
As a 3.7 plan item, I've been working to run the Eclipse/Equinox build and JUnit tests on eclipse.org hardware.  

Build at eclipse.org
https://bugs.eclipse.org/bugs/show_bug.cgi?id=325997

I'm now at the point where 90% of our JUnit tests run on the new Linux, Windows 7 and Mac Hudson instances at the foundation.  I'm still working on addressing the remaining issues.  Thanks to the webmaster team for all their hard work getting these up and running.

Another portion of the build that we run on IBM machines today are the performance tests.  Currently we run performance tests on six builds a week (one integration, two nightly, one maintenance, two baselines).  These test results are compared with a baseline performance run so that we can identify any regressions in the current streams, identify the root cause and fix it.  

For instance, here are bugs that track all the performance issues that have been addressed in the ongoing 3.6 and 3.7 releases

[perfs] Root bug to track 3.7 work on performance
https://bugs.eclipse.org/bugs/show_bug.cgi?id=313891

[perfs] Root bug to track 3.6 work on performance
https://bugs.eclipse.org/bugs/show_bug.cgi?id=286955

Currently we have four machines (two windows, two linux (SLES and RHEL) that run performance tests. We have two machines for each platform so that we can compare the results on each platform and determine if there are outliers on a single machine that can be ignored.  Also, there is a database that stores the performance results.  The database could reside on existing eclipse.org hardware.  

This bug is a request for this hardware at foundation.  The machines cannot be virtualized.  Because they are running performance tests, the operating system must be installed in an environment that replicates the environment that a typical Eclipse developer would have on their desktop (in rack mounted form :-)
Comment 1 Wayne Beaton CLA 2011-01-17 10:36:27 EST
This money is not intended for hardware that is to be maintained by the Eclipse Webmasters. That sort of hardware has an ongoing cost. The Webmaster team does, however, have a budget of their own (some of which comes from Friends of Eclipse donations). This really needs to be handled by them.

Unfortunately, I can't identify a reasonable component to reassign this to. 

Webmaster, I assume that this is already on your list; where/how should be categorize this bug?
Comment 2 Kim Moir CLA 2011-01-31 17:41:46 EST
It should be noted that our (IBM) lab will be moving to a new office building very early in 2012.  At this time, we won't have an Eclipse lab in our building with the expectation that the build will be completely transitioned to eclipse foundation hardware.  Thus if the community would like to see performance tests continue on the Eclipse and Equinox projects, we need machines :-)

The are quite valuable in making our software more responsive, as you can see below

[perfs] Root bug to track 3.7 work on performance
https://bugs.eclipse.org/bugs/show_bug.cgi?id=313891

[perfs] Root bug to track 3.6 work on performance
https://bugs.eclipse.org/bugs/show_bug.cgi?id=286955
Comment 3 Denis Roy CLA 2011-01-31 20:16:24 EST
Having a machine dedicated to a single task is costly in terms of power consumption and rackspace.

With Xen, I can pin a specific CPU core (or cores) to a virtual server.  This means the host OS (hypervisor) will not use those CPUs for other vservers.  Understanding that RAM is dedicated to a vserver as well, this should provide an environment where your test results are consistent regardless of host activity.

If you'd like to try this out, I can set up a vserver this week so that you can see how it goes.  How much RAM and disk space would you need, knowing that Linux vservers will have /shared mounted locally?
Comment 4 Wayne Beaton CLA 2011-01-31 22:20:03 EST
Should we move this discussion to Community/Servers?
Comment 5 Kim Moir CLA 2011-02-01 11:18:33 EST
I'll bring your proposal up at our eclipse team planning call tomorrow.  In the past, the PMC has always stated that they need to run performance tests on bare metal hardware because there is too much variation between test runs on virtualized hardware.  This isn't a high priority request, we don't plan to run performance tests on foundation hardware for 3.7. I just opened it early because I know it takes a long time to order/setup/configure hardware.  Our goal is to run just JUnit tests on foundation hardware for 3.7.
Comment 6 John Arthorne CLA 2011-02-01 13:35:02 EST
I think we would need to run the baseline tests a few times and see what the variability looks like. I have no idea how the UI tests would behave in such an environment.
Comment 7 Kim Moir CLA 2011-02-01 14:12:07 EST
The ui tests are emulated via Xvnc on Linux.  We are still having issues running JUnit tests on the Windows 7 box.  As an aside, webmasters, the results for the performance tests are stored in a database that we use to store both baseline results and build results, and run comparisons . Currently, we use Derby.
Comment 8 Denis Roy CLA 2011-02-01 15:21:34 EST
> I think we would need to run the baseline tests a few times

Matt and I will set up an environment to determine if what I'm claiming is even possible, and if it works, so that we don't waste your time.

If we can get accurate, consistent performance results from the same tests running on a vserver regardless of the load on the host, then we'll feel confident that it will work for you too.
Comment 9 Kim Moir CLA 2011-02-01 16:14:29 EST
>Matt and I will set up an environment to determine if what I'm claiming is even
possible, and if it works, so that we don't waste your time.

Thanks!  Much appreciated.  Like I mentioned before, it would be nice to have the issues related to the JUnit tests resolved first before we start looking at performance tests. For instance, bug 329830 (install pkill on mac slave) and bug 335196 (shorten hudson workspace so we don't hit file path limitations on windows. :-)
Comment 10 Chris Aniszczyk CLA 2011-02-07 16:08:16 EST
Why is this a no (see bug 336561)?

I think this is a valid use of funds.
Comment 11 Wayne Beaton CLA 2011-02-07 16:15:11 EST
(In reply to comment #10)
> Why is this a no (see bug 336561)?
> 
> I think this is a valid use of funds.

Because there is an ongoing cost associated with it. Space in our rack and Webmaster's time cost money.
Comment 12 Chris Aniszczyk CLA 2011-02-07 16:40:21 EST
(In reply to comment #11)
> Because there is an ongoing cost associated with it. Space in our rack and
> Webmaster's time cost money.

Fair enough, but can't we just petition for a decent amount of money to help alleviate the cost of money? When I envisioned the FoE Funds program, I thought things like this would be perfect for it.
Comment 13 John Arthorne CLA 2011-02-07 16:54:51 EST
I would expect the FoE money could still be used to offset the initial capital cost of new hardware. However Denis suggested the vserver route might be more cost-effective, so I think it's worth exploring that before looking at new physical hardware (see comment #3).
Comment 14 Wayne Beaton CLA 2011-02-07 16:59:12 EST
(In reply to comment #13)
> I would expect the FoE money could still be used to offset the initial capital
> cost of new hardware. 

Agreed. But nothing goes into Webmaster's rack without his approval.
Comment 15 Chris Aniszczyk CLA 2011-02-07 17:01:35 EST
I'm all for rephrasing this proposal to give Denis' more money to put stuff in his rack so we can have more and better hardware to run tests across the eclipse.org infrastructure.

I know Denis recently had a problem with disk space that was causing some issues with hudson builds too.
Comment 16 Wayne Beaton CLA 2011-02-07 17:05:37 EST
(In reply to comment #15)
> I'm all for rephrasing this proposal to give Denis' more money to put stuff in
> his rack so we can have more and better hardware to run tests across the
> eclipse.org infrastructure.

That's worth considering. However, this is probably more a case of we need more time (time == money) from Webmaster, and I'm not sure how much time they have to give. 

> 
> I know Denis recently had a problem with disk space that was causing some
> issues with hudson builds too.

Funding Webmaster for additional disk space could probably work; though I'll have to let webmaster answer that one.
Comment 17 Kim Moir CLA 2011-02-07 17:07:06 EST
regarding comment 13, qgreed, I understand that Denis and Matt are investigating vservers.

My point bug in 336561 was that we need new hardware to support our builds and funding seems to be an issue.   So I was trying to suggest creative ways to raise money :-)
Comment 18 Chris Aniszczyk CLA 2011-02-07 17:10:06 EST
(In reply to comment #16)
> Funding Webmaster for additional disk space could probably work; though I'll
> have to let webmaster answer that one.

I'll gladly file a request if he needs more hardware (see bug 335809), from hard drives to memory to iPads. Ok, maybe not iPads.
Comment 19 Denis Roy CLA 2011-02-07 17:10:40 EST
Created attachment 188483 [details]
Screenshot

I'm in the process of running a simple test on a vserver which has one CPU core dedicated to it. It is running on the same host as the Hudson slaves, so load on the actual machine will vary plenty throughout the day.

If after 24 hours I get the same results regardless of the time of day, that should indicate your performance results should also be consistent.

One caveat, however, is that the underlying disk subsystem is shared, so any 'load from disk' test will not necessarily be consistent.
Comment 20 Denis Roy CLA 2011-02-09 20:06:59 EST
> I'm in the process of running a simple test on a vserver which has one CPU core
> dedicated to it. It is running on the same host as the Hudson slaves, so load
> on the actual machine will vary plenty throughout the day.

The results are in.  I've run a test (an md5sum of a large file) throughout the day.  As mentioned, the physical host is hosting both Hudson slaved, but one (of eight) CPU core was dedicated to this test vserver.

The test was run 1167 times

Fastest time was:	3.29s
Slowest time was:	3.40s
Average time was:	3.31s
95th percentile:	3.34s  (only 5% were above this value)

That means there's a 3% difference from average to 95th percentile, for a test that takes 3.3 seconds to run.  Over 60% of the samples were within 2% of the average.

In your opinion, is this accurate and consistent enough to warrant further testing?
Comment 21 Denis Roy CLA 2011-02-09 20:07:31 EST
Created attachment 188639 [details]
Test data, in Open Document Spreadsheet format
Comment 22 Denis Roy CLA 2011-02-09 20:08:54 EST
Mike, comment 20 is FYI.
Comment 23 Dani Megert CLA 2011-02-10 02:46:41 EST
> In your opinion, is this accurate and consistent enough to warrant further
> testing?
If I understood correctly the test only involved disk I/O. How does it behave when running UI tests? Would it be possible to test with "real" (I really mean our) tests?

Other important data is how the deviation looks like for shorter and longer tests (e.g. 100 ms and 1 s).
Comment 24 Denis Roy CLA 2011-02-10 09:13:35 EST
> If I understood correctly the test only involved disk I/O.

No, quite the opposite.  This does not touch disk I/O, only memory bandwidth and CPU cycles since the file being md5sum'd stays in RAM after the first read.
Comment 25 John Arthorne CLA 2011-02-10 14:04:55 EST
It does sound promising at least from a CPU perspective. I think our next step would be attempting to run some of our performance tests in such a setup to see how it behaves. What would that involve - setting up separate hudson slave that runs on that dedicated CPU, and only allows one job to run at once?
Comment 26 Denis Roy CLA 2011-02-10 14:45:13 EST
I'd like to replicate your current performance test environment so that, is we run performance tests on both the current and the new environment, we can make a valid comparison between both.

Are your current performance tests being run on Hudson?  If not, can you describe the environment we'd need to set up?
Comment 27 Denis Roy CLA 2011-02-17 15:45:53 EST
(In reply to comment #26)
> I'd like to replicate your current performance test environment so that, is we
> run performance tests on both the current and the new environment, we can make
> a valid comparison between both.
> 
> Are your current performance tests being run on Hudson?  If not, can you
> describe the environment we'd need to set up?

Just waiting for a reply on this.  Do you want us to go ahead and provision a Hudson slave in a dedicated CPU environment, with a single job queue?
Comment 28 Kim Moir CLA 2011-02-17 15:55:32 EST
Sorry, this bug got lost in my inbox when I was away last week :-)

>Just waiting for a reply on this.  Do you want us to go ahead and provision a
Hudson slave in a dedicated CPU environment, with a single job queue?

Today our performance tests aren't run in Hudson. However, like the JUnit tests we'd like to run them on Hudson so if you could provision a dedicated instance that would be great.  Thanks!
Comment 29 Denis Roy CLA 2011-03-03 11:38:35 EST
Kim, I was thinking of setting this up with a 20G disk.  You'd have about 17G usable.  Is this enough?  I can't imagine running performance tests would require tons of space.
Comment 30 Kim Moir CLA 2011-03-03 13:34:55 EST
Each performance run consumes 2GB or space.  We run performance tests about five times a week. So that should be okay.
Comment 31 Denis Roy CLA 2011-03-04 09:34:42 EST
hudson-perf1 has been set up as a slave.  It has one dedicated CPU core (Intel(R) Xeon(R) CPU E5504  @ 2.00GHz) and 1.2G of RAM.  We'll increase the RAM to something more realistic as we adjust the RAM of the other vms on that host.

Give it a shot, and I'll close this as fixed.  If you encounter issues, please file bugs against Community/Hudson.
Comment 32 Denis Roy CLA 2011-06-08 15:34:46 EDT
(In reply to comment #31)
> hudson-perf1 has been set up as a slave

I realize this slave has been down since our outage last week.  Have you had a chance to test it out?  Do you want us to power it back on?
Comment 33 Kim Moir CLA 2011-06-09 10:32:29 EDT
I haven't had a chance to test it out yet. I'll be looking at running the performance tests once I have all the issues worked out running the regular JUnit tests in bug 295393. So I would say leave it off until it's needed.