Community
Participate
Working Groups
This bug should track the discussions regarding what tools will be used to test performance as well as how the result of those tests will be processed and displayed.
I've been trying JUnitPerf on one of my projects and it works as advertised. http://www.clarkware.com/software/JUnitPerf.html . So far I've just used TimedTest, which you can wrap around a normal TestCase and specify the minimum expected time that the test should take. There's also a LoadTest class you can use that will start up multiple concurrent threads to run the TestCase. TimedTest, RepeatedTest, and LoadTest can be combined. One advantage the JUnitPerf style has is that existing tests can be used for multiple purposes without change. You don't write special timing tests, you just use and write unit tests same as always. For example, I had some 400 odd unit tests in this project, and I changed the main program that invoked them so that they would fail if any took more than 100ms. This quickly pointed out about 14 tests that I needed to look at in more detail. What would be more difficult and tedious would be to have a different time limit for each test. So far I haven't needed that, but I've only been using it for a couple weeks. If you need that level of detail, you might be better off with something that records the time of each unit test in a table or database. Then on each run it could compare the time with the last time or the average time and flag any variations above some threshold. Some profiling tools I've used like Quantify have this kind of differencing built in. Also don't forget memory usage as a performance indicator. Some profiling tools like Yourkit have a way to diff memory usage between runs. I don't know if that can be done in batch mode or not. JUnitPerf doesn't address this but if you can examine your own memory usage at a fine level of granularity it seems like it would be technically possible to add that. Although it wouldn't necessarily help automated performance testing, some additional support for performance numbers in the built-in JUnit view, maybe direct support for JUnitPerf there (it has a BSD license), would be helpful for day-to-day use.
My feeling is that this area is huge and had a lot of potential but I think we also must support the most basic simple scenario first. That is: It should be possible to take any current Eclipse JUnit test and run it as a performance test, the results should be compared against some baseline (e.g. Eclipse 2.1 or R3.0) and the data graphed per build and posted as a link with the build. This allows for rudimentary quick and frequent review of the results as well, it lowers the bar for entry for teams joining in with a desire to do testing. This likely requires a few things: 1) performance monitoring of tests can not inlcude setup/teardown 2) graphs autogenerated from the data and posted by releng 3) might need some other basic support to be able to ensure the workbench is idle etc... or in the "correct" state for a certain test 4) might need to run performance tests more then once to generate better results (10x?)
Here is a first dump of what the text team learned from its initial performance test efforts: Performance tests differ from correctness tests in that it is a more complex process to decide whether a test run was successful or not. However, at the end there should be a yes/no result. The performance test "Open Java editor", for example, can be successful when the execution of the test is not more than 2% slower than a given reference. A different success criterion could be that "Open Java editor" may not take more than 140% of the execution time of "Open text editor". The decision process of whether a performance test is successful or not is the checking of a set of success criteria whereby a criterion is either independent from the tested scenario (despite the retrieval of the reference time) or depends on the tested scenario. Thus, it seems useful to have a way to grow the number of scenario independent success criteria without a need to touch any of the implemented tests. Implementing performance tests as JUnit test, this gives us the following blueprint for a single performance test: public void testScenarioA() { perform test specific setup; performanceMeter.start(); run scenario A; performanceMeter.stop(); perform test specific tear down; assertTrue(runScenarioASpecificChecks(performanceMeter); assertTrue(runScenarioIndependentChecks(performanceMeter); } The performance meter is the abstraction of a performance data collector. Test should be independent from the concrete implementation of the performance meter. However, it is necessary to specify those properties of the performance meter that directly effect the design of the scenario. E.g., if the performance meter is capable of measuring thread specific times, the design of a scenario might look different from the design that is needed for a meter that measures VM execution times. In addition, we need to specify the influence of the performance meter onto the measured data. This is particularly necessary to cover cases in which an active performance meter, for example, slows down the execution speed of a particular action by 10% and that of a different action by 20%. In order to implement the success criteria, the performance meter must define a data model for the collected data and provide access to the data. Additionally, access to the reference data is required. The reference data is not static over time, e.g., the reference time might be the execution time of the test with the last release, the last integration build, or the average of the last three integration builds (all these examples are scenario independent). Thus, a concrete implementation of the performance meter will usually be backed by a data base or any other kind of persistent data store as this is the case with the current version of the performance plug-ins. The following additional requirements should be met by the performance test infrastructure: - Test execution and result evaluation happens inside Eclipse. - Allows to simply add scenario independent success criteria for all performance tests - Allow to simply change the performance meter and the scenario independent success criteria for all performance tests. - Allows different text execution and data storage setups such as on your local machine, a machine at your site, or during the build process. - There should be no dependency on one central server. - Which setup to take is configurable. - Enables graphical evaluation of test results inside Eclipse. The minimal requirement when running performance tests is to indicate whether performance improved or deteriorated compared to the reference data. A sufficiently broad test base offers a way to gain suspects that cause deteriorated performance. As with correctness tests a sensible clustering of the tests is key. There is no initial need to instrument Eclipse itself for that. The following aspects must be considered when implementing and running performance tests: - stable test execution and data storage setup - class loading - plug-in activation - JIT - running scenarios in different configurations (i.e. with different preference settings) - realistic scenarios (workspace size and content, action sequence) - influence of the performance meter onto the measured data
I have generated some basic graphs from raw data from some recent Nightly and Integration build performance test results. These graphs are available here: http://download.eclipse.org/downloads/graphs.html Each point on the graph represents the diff of a specified measurement taken before and after a test run, or specifically the diff between a measurement taken in step 1 and 2 (see test.xml attached) for a particular build. The reference build/value is marked in green. The Y-axis covers a range of values +10% necessary to plot the graph. Some ideas for refinements include the following: 1. Weed out un-interesting tests and measurements. 2. Thumbnails and enlargements will be generated when necessary, as data accumulates. 3. Superimpose different measurements on a single graph. For the time being, I will update the graphs as new data becomes available with each build. Thoughts or comments?
Created attachment 13851 [details] Performance test result file.
These graphs look like a great start. This would be a good place to go when digging deeper into a performance problem, especially when one knows the specific tests they want to see. As a refinement to this, I would like to suggest that we agree on a very small number (3?) of key performance charts that are worth posting on the download page for every build. The rational being that all would see the results whenever downloading the build and that we care about these things and do not want them to degrade, some sample graphics would be: 1) build zip size (when did that extra 5mb get added?) 2) time to start the same large workspace/small workspace 3) time to open a specific jave file in the java editor 4) memory footprint used by test 2 or 3 (or other) In some cases they could even be overlaid on a single chart to save space or provide other interesting data points such as the tradeoff of memory vs time
In addition to the 4 things that Michael mentioned in note 6, I would add rebuilding a large workspace. Also I think for the startup tests, you want two variations, startup after a reboot and a warm startup.
Has anybody investigated the use of AspectJ to collect performance data during test execution? While I have only seen a presentation on this technology and not actually used it, it appears we could write some "aspects" that could collect timing info at various points during method execution, and then passed to our performance tests framework for analysis/rendering.
Given that this topic is of general interest to platform developers (and is a 3.1 plan item) we will continue the discussion on this topic on the platform- dev mailing list. I'll post an update on this mailing list.
Closing.