Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 71383 - Requirements for Performance tools for Eclipse
Summary: Requirements for Performance tools for Eclipse
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Releng (show other bugs)
Version: 3.0   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Platform-Releng-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 71123
  Show dependency tree
 
Reported: 2004-08-04 11:04 EDT by Michael Van Meekeren CLA
Modified: 2005-09-14 15:11 EDT (History)
27 users (show)

See Also:


Attachments
Performance test result file. (3.37 KB, text/plain)
2004-08-10 14:25 EDT, Sonia Dimitrov CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Van Meekeren CLA 2004-08-04 11:04:36 EDT
This bug should track the discussions regarding what tools will be used to test
performance as well as how the result of those tests will be processed and
displayed.
Comment 1 Ed Burnette CLA 2004-08-04 13:26:54 EDT
I've been trying JUnitPerf on one of my projects and it works as advertised. 
http://www.clarkware.com/software/JUnitPerf.html . So far I've just used 
TimedTest, which you can wrap around a normal TestCase and specify the minimum 
expected time that the test should take. There's also a LoadTest class you can 
use that will start up multiple concurrent threads to run the TestCase. 
TimedTest, RepeatedTest, and LoadTest can be combined.

One advantage the JUnitPerf style has is that existing tests can be used for 
multiple purposes without change. You don't write special timing tests, you 
just use and write unit tests same as always.

For example, I had some 400 odd unit tests in this project, and I changed the 
main program that invoked them so that they would fail if any took more than 
100ms. This quickly pointed out about 14 tests that I needed to look at in 
more detail. What would be more difficult and tedious would be to have a 
different time limit for each test. So far I haven't needed that, but I've 
only been using it for a couple weeks.

If you need that level of detail, you might be better off with something that 
records the time of each unit test in a table or database. Then on each run it 
could compare the time with the last time or the average time and flag any 
variations above some threshold. Some profiling tools I've used like Quantify 
have this kind of differencing built in.

Also don't forget memory usage as a performance indicator. Some profiling 
tools like Yourkit have a way to diff memory usage between runs. I don't know 
if that can be done in batch mode or not. JUnitPerf doesn't address this but 
if you can examine your own memory usage at a fine level of granularity it 
seems like it would be technically possible to add that.

Although it wouldn't necessarily help automated performance testing, some 
additional support for performance numbers in the built-in JUnit view, maybe 
direct support for JUnitPerf there (it has a BSD license), would be helpful 
for day-to-day use.
Comment 2 Michael Van Meekeren CLA 2004-08-04 13:47:56 EDT
My feeling is that this area is huge and had a lot of potential but I think we
also must support the most basic simple scenario first.  That is: 

It should be possible to take any current Eclipse JUnit test and run it as a
performance test, the results should be compared against some baseline (e.g.
Eclipse 2.1 or R3.0) and the data graphed per build and posted as a link with
the build.  This allows for rudimentary quick and frequent review of the results
as well, it lowers the bar for entry for teams joining in with a desire to do
testing.  

This likely requires a few things: 
1) performance monitoring of tests can not inlcude setup/teardown 
2) graphs autogenerated from the data and posted by releng
3) might need some other basic support to be able to ensure the workbench is
idle etc... or in the "correct" state for a certain test
4) might need to run performance tests more then once to generate better results
(10x?)

Comment 3 Kai-Uwe Maetzel CLA 2004-08-05 10:09:34 EDT
Here is a first dump of what the text team learned from its initial 
performance test efforts:

Performance tests differ from correctness tests in that it is a more complex 
process to decide whether a test run was successful or not. However, at the 
end there should be a yes/no result. The performance test "Open Java editor", 
for example, can be successful when the execution of the test is not more than 
2% slower than a given reference. A different success criterion could be 
that "Open Java editor" may not take more than 140% of the execution time 
of "Open text editor". The decision process of whether a performance test is 
successful or not is the checking of a set of success criteria whereby a 
criterion is either independent from the tested scenario (despite the 
retrieval of the reference time) or depends on the tested scenario. Thus, it 
seems useful to have a way to grow the number of scenario independent success 
criteria without a need to touch any of the implemented tests.

Implementing performance tests as JUnit test, this gives us the following 
blueprint for a single performance test:

public void testScenarioA() {
	perform test specific setup;
	performanceMeter.start();
	run scenario A;
	performanceMeter.stop();
	perform test specific tear down;
	assertTrue(runScenarioASpecificChecks(performanceMeter);
	assertTrue(runScenarioIndependentChecks(performanceMeter);
}

The performance meter is the abstraction of a performance data collector. Test 
should be independent from the concrete implementation of the performance 
meter. However, it is necessary to specify those properties of the performance 
meter that directly effect the design of the scenario. E.g., if the 
performance meter is capable of measuring thread specific times, the design of 
a scenario might look different from the design that is needed for a meter 
that measures VM execution times.  In addition, we need to specify the 
influence of the performance meter onto the measured data. This is 
particularly necessary to cover cases in which an active performance meter, 
for example, slows down the execution speed of a particular action by 10% and 
that of a different action by 20%.

In order to implement the success criteria, the performance meter must define 
a data model for the collected data and provide access to the data. 
Additionally, access to the reference data is required.  The reference data is 
not static over time, e.g., the reference time might be the execution time of 
the test with the last release, the last integration build, or the average of 
the last three integration builds (all these examples are scenario 
independent). Thus, a concrete implementation of the performance meter will 
usually be backed by a data base or any other kind of persistent data store as 
this is the case with the current version of the performance plug-ins.

The following additional requirements should be met by the performance test 
infrastructure:
- Test execution and result evaluation happens inside Eclipse.
- Allows to simply add scenario independent success criteria
  for all performance tests
- Allow to simply change the performance meter and the scenario
  independent success criteria for all performance tests.
- Allows different text execution and data storage setups such as on
  your local machine, a machine at your site,  or during the build process. 
	- There should be no dependency on one central server. 
	- Which setup to take is configurable.
- Enables graphical evaluation of test results inside Eclipse.


The minimal requirement when running performance tests is to indicate whether 
performance improved or deteriorated compared to the reference data. A 
sufficiently broad test base offers a way to gain suspects that cause 
deteriorated performance. As with correctness tests a sensible clustering of 
the tests is key. There is no initial need to instrument Eclipse itself for 
that.

The following aspects must be  considered when implementing and running 
performance tests:
	- stable test execution and data storage setup
	- class loading
	- plug-in activation
	- JIT
	- running scenarios in different configurations (i.e. with
          different preference settings)
	- realistic scenarios (workspace size and content, action sequence)
	- influence of the performance meter onto the measured data
Comment 4 Sonia Dimitrov CLA 2004-08-10 14:19:03 EDT
I have generated some basic graphs from raw data from some recent Nightly and 
Integration build performance test results.  These graphs are available here:
http://download.eclipse.org/downloads/graphs.html

Each point on the graph represents the diff of a specified measurement taken 
before and after a test run, or specifically the diff between a measurement 
taken in step 1 and 2 (see test.xml attached) for a particular build.

The reference build/value is marked in green.  The Y-axis covers a range of 
values +10% necessary to plot the graph.

Some ideas for refinements include the following:
1.  Weed out un-interesting tests and measurements.
2.  Thumbnails and enlargements will be generated when necessary, as data 
accumulates.
3.  Superimpose different measurements on a single graph.

For the time being, I will update the graphs as new data becomes available 
with each build.

Thoughts or comments?
Comment 5 Sonia Dimitrov CLA 2004-08-10 14:25:46 EDT
Created attachment 13851 [details]
Performance test result file.
Comment 6 Michael Van Meekeren CLA 2004-08-10 14:39:06 EDT
These graphs look like a great start.  This would be a good place to go when
digging deeper into a performance problem, especially when one knows the
specific tests they want to see.

As a refinement to this, I would like to suggest that we agree on a very small
number (3?) of key performance charts that are worth posting on the download
page for every build.  The rational being that all would see the results
whenever downloading the build and that we care about these things and do not
want them to degrade, some sample graphics would be:

1) build zip size (when did that extra 5mb get added?)
2) time to start the same large workspace/small workspace
3) time to open a specific jave file in the java editor
4) memory footprint used by test 2 or 3 (or other) 

In some cases they could even be overlaid on a single chart to save space or
provide other interesting data points such as the tradeoff of memory vs time
Comment 7 Gary Karasiuk CLA 2004-08-10 15:21:32 EDT
In addition to the 4 things that Michael mentioned in note 6, I would add 
rebuilding a large workspace.

Also I think for the startup tests, you want two variations, startup after a 
reboot and a warm startup.
Comment 8 Dorian Birsan CLA 2004-08-19 14:48:12 EDT
Has anybody investigated the use of AspectJ to collect performance data during 
test execution? While I have only seen a presentation on this technology and 
not actually used it, it appears we could write some "aspects" that could 
collect timing info at various points during method execution, and then passed 
to our performance tests framework for analysis/rendering.
Comment 9 Erich Gamma CLA 2004-08-20 05:34:25 EDT
Given that this topic is of general interest to platform developers (and is a 
3.1 plan item) we will continue the discussion on this topic on the platform-
dev mailing list. 

I'll post an update on this mailing list.

Comment 10 Sonia Dimitrov CLA 2005-09-14 15:11:07 EDT
Closing.