| Summary: | create performance test harness | ||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | z_Archived | Reporter: | Mik Kersten <mik.kersten> | ||||||||||||||||||||||||||||||||||||||||
| Component: | Mylyn | Assignee: | Steffen Pingel <steffen.pingel> | ||||||||||||||||||||||||||||||||||||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||||||||||||||||||||||||||||||||||||
| Severity: | enhancement | ||||||||||||||||||||||||||||||||||||||||||
| Priority: | P4 | CC: | jingweno, mik.kersten, mjmeijer, robert.elves, shawn.minto, steffen.pingel | ||||||||||||||||||||||||||||||||||||||||
| Version: | 0.4 | Keywords: | helpwanted | ||||||||||||||||||||||||||||||||||||||||
| Target Milestone: | 3.3 | ||||||||||||||||||||||||||||||||||||||||||
| Hardware: | PC | ||||||||||||||||||||||||||||||||||||||||||
| OS: | Windows XP | ||||||||||||||||||||||||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||||||||||||||||||||||||
| Bug Depends on: | 213100 | ||||||||||||||||||||||||||||||||||||||||||
| Bug Blocks: | |||||||||||||||||||||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||
|
Description
Mik Kersten
This presentation offers a great guide: http://www.eclipsecon.org/2005/presentations/EclipseCon2005_13.2ContinuousPerformance.pdf For a sample of the output see: http://download.eclipse.org/eclipse/downloads/drops/S-3.4M3-200711012000/performance/performance.php For bug 116487 to work and be resolved I see the following prerequisites: 1- we need a naming convention and packaging structure for performance tests, separate from standard junit tests and split in ui and core (non-ui) to be able to batch running them 2- we need a place to store all data and a way to access them: a remote derby server on eclipse.org? I see security issues there. A potential solution is to create a special launch configuration for this, that handles the connection and authentication to access to the performance DB. Registration of users is therefore desired, maybe eclipse.org bugzilla users? 3- there must be a web interface to see the results, failures and hot spots to work on as there are no eclipse internal viewers. 4- we need access to a range of machines to see platform differences, or all developers must be able to contribute to one central DB (see 2) 5- performance test must be optional on local dev machines as not everybody will have the infrastructure to do them 6- a set of code templates must be added to the mylyn-settings-templates.xml or content assist to make creating performance test easier. > 1- we need a naming convention and packaging structure for performance tests, > separate from standard junit tests and split in ui and core (non-ui) to be able > to batch running them We can add the tests to the current test plug-ins in a performance package and create suites as needed to group theses tests (it seems that is the way JDT does it). I'll worry about setting up the infrastructure for running tests on mylyn.eclipse.org once we have tests to run. > 5- performance test must be optional on local dev machines as not everybody will > have the infrastructure to do them We will not include them in the All*Tests suites. > 6- a set of code templates must be added to the mylyn-settings-templates.xml or > content assist to make creating performance test easier. Wouldn't copy and pasting existing performance tests be sufficient? As far as I can see extending PerformanceTestCase is all that is required. Created attachment 83034 [details]
performance test
Created attachment 83035 [details]
mylyn/context/zip
Fantastic to see progress in this area. To see regression problems we have to think about how to share performance data. I'm using a Mac so I only get to see System Time and Used Java Heap in the output. Apparently the jnilib for the performance testing omn the Mac doesn't exits (bug 68111 again?) I've played a little with org.eclipse.test.internal.performance.db.View to get stuff from the database again. It is vital that all config parameters are specified to do scenario comparison over builds to get output. So what config parameters are we going to store in the database? I think build, platform, jvm. Any more? We need some guidance because otherwise we cannot share our findings. As build configurations are not im/exportable they cannot be easily shared. They should be described per platform. Also data cannot be compared across different developers machines due to differences in hardware. So all results comparison is by necessity relative and per developer. Therefore developers should build up their own local data set to run the comparisons against. checkout a certain build, run the performance tests to store in local DB, checkout the next build, run tests again, etc. to build up a history to compare current development against. I wonder if that can be easily scripted... The first step will be to deploy the tests on the server and have them run periodically so we can start gathering comparable results. In a second step we can consider adding other platforms such as Windows which seem to provide additional details. I agree that it does not makes much sense to collect performance data from developer machines but providing instructions in the wiki on how to setup/run performance tests on local machines would be very helpful. Provide performance tests for the following bugs? bug 207659, bug 208488, bug 197942, bug 199430, bu 211907, bug 197395, bug 205357, bug 207602 Yes, all issues marked with [performance] would be a good start. Rob and I will look into running performance tests as part of the weekly builds in the next few days so hopefully we'll have a suite ready before the next release. Awesome! When I use the tests as in CVS, I always get: Scenario 'org.eclipse.mylyn.tasks.tests.performance.TaskContainerTest#testContains()' (average over 10 samples): System Time: 6ms (95% in [1ms, 12ms]) Measurable effect: 9ms (1.3 SDs) (required sample size for an effect of 5% of stdev: 6400) Used Java Heap: 29.63K (95% in [-305.83K, 365.09K]) Measurable effect: 593.16K (1.3 SDs) (required sample size for an effect of 5% of stdev: 6400) So apparently the sample size (nr of loops) is too small. Increasing to recommended 6400 gives: Scenario 'org.eclipse.mylyn.tasks.tests.performance.TaskContainerTest#testContains()' (average over 6400 samples): System Time: 4ms (95% in [4ms, 4ms]) Measurable effect: 0ms (0.0 SDs) Used Java Heap: 243 (95% in [-11.75K, 12.23K]) Measurable effect: 24.17K (0.0 SDs) Before further extending this set of test, I propose that a loop counter be externalized in for example a properties file. What do you think? Please also allow to specify things like that trough the system properties, so one could just declare them in the launch configuration or in the command line. Created attachment 92635 [details]
ActivateTaskPerformanceTest
undone,more to come
Created attachment 92636 [details]
mylyn/context/zip
Some performance tests, hope that help, more to come Created attachment 92637 [details]
TaskListPerformanceTest
Created attachment 92638 [details]
mylyn/context/zip
Created attachment 92639 [details]
RetrieveTaskListPerformanceTest
Created attachment 92640 [details]
mylyn/context/zip
Created attachment 92641 [details]
ContextPerformanceTest
more typical test data needed, I used my local activity data (280 entries)
Created attachment 92642 [details]
mylyn/context/zip
Created attachment 92643 [details]
activity data
Created attachment 92644 [details]
mylyn/context/zip
Thanks a lot for contributing the test cases Owen! I have merged ActivateTaskPerformanceTest, RetrieveTaskListPerformanceTest and the added test case for TaskListPerformanceTest with minor modifications. I have added an @author tag with your name to the header of the files that you modified. Other changes I made: - I kept the number of times tests were run at 10. Running each test 6401 times seems quite high and takes a lot of time on my system. Could you elaborate why you changed that? Did you get better/other results when running the tests more often? Did you look at other performance tests in platform? How often do they repeat tests? - I didn't include the call to taskListManager.saveTaskList() in TaskListPerformanceTest.setUp(). What was the reason for adding this call? I wasn't able to add the ContextPerformanceTest. It seems to be missing MockInteractionContextManager. Could you attach that as well? It is much easier for me to review and apply your contributions if you submit them as patches. Please take a look at the section about creating patches on this wiki page: http://wiki.eclipse.org/Mylyn_Contributor_Reference#Patches Here is a large context that is causing performance problems and might be good input for a test case: 222773: [performance] Mylyn monitor appears to cause hang on startup in 3.4.0 https://bugs.eclipse.org/bugs/show_bug.cgi?id=222773 (In reply to comment #26) > - I kept the number of times tests were run at 10. Running each test 6401 times > seems quite high and takes a lot of time on my system. Could you elaborate why > you changed that? Did you get better/other results when running the tests more > often? Did you look at other performance tests in platform? How often do they > repeat tests? Maybe look again at the remarks in comment#13 Running only 10 times is not statistically significant... Better make the number of loops a system property as per Eugene's remarks in comment #14 Created attachment 95149 [details]
Performance test for mylyn.context
ContextPerformanceTest
MockInteractionContextManager
sorry for the late reply Steffen :) (In reply to comment #26) > Thanks a lot for contributing the test cases Owen! I have merged > ActivateTaskPerformanceTest, RetrieveTaskListPerformanceTest and the added test > case for TaskListPerformanceTest with minor modifications. I have added an > @author tag with your name to the header of the files that you modified. Other > changes I made: yeah, you can make whatever changes you want > > - I kept the number of times tests were run at 10. Running each test 6401 times > seems quite high and takes a lot of time on my system. Could you elaborate why > you changed that? Did you get better/other results when running the tests more > often? Did you look at other performance tests in platform? How often do they > repeat tests? as mm105@xs4all.nl (hi, haven't known ur name yet) mentioned, basically the performance tests need 6401 times to become statistically significant. Detail descriptions are available here in the wiki that I wrote: http://wiki.eclipse.org/Mylyn/Performance_Testing The basic idea is when your sample data is not enough (not running enough times), the confidence interval will be large, so if you have minor performance drops in the next version, the test cases are not sensitive enough to detect these drops and no performance loss will be reported. This situation also applies for performance increases. As you can see in the wiki (link provided above), the higher the sample size, the smaller the confidence interval, hence the performance test cases become more sensitive (more useful?). I also understand its not realistic to run 6401 times for all test cases, I will try to find a reasonable and acceptable number in the near future. > - I didn't include the call to taskListManager.saveTaskList() in > TaskListPerformanceTest.setUp(). What was the reason for adding this call? > my bad, a typo, can u see the commented statement above? I believe taskListManager.saveTaskList() came with it (forgot to comment it as well). > I wasn't able to add the ContextPerformanceTest. It seems to be missing > MockInteractionContextManager. Could you attach that as well? It is much easier > for me to review and apply your contributions if you submit them as patches. > Please take a look at the section about creating patches on this wiki page: > > http://wiki.eclipse.org/Mylyn_Contributor_Reference#Patches > yeah, I have made a patch and uploaded, please check that out > Here is a large context that is causing performance problems and might be good > input for a test case: > > 222773: [performance] Mylyn monitor appears to cause hang on startup in 3.4.0 > https://bugs.eclipse.org/bugs/show_bug.cgi?id=222773 n/p, I will take a look at it this week, also I will use the profiler to guide me to write the performance tests BTW, do you guys happen to know anyone also writing this kind of performance tests? How many times do they normally run their tests? Hi Jingwen, mm105 is my email adress, I'm the maarten meijer shown in comments #3, #8 and #13 Please leave me on CC. Maarten (In reply to comment #30) > Hi Jingwen, > > mm105 is my email adress, I'm the maarten meijer shown in comments #3, #8 and > #13 > Please leave me on CC. > > Maarten > sorry, Maarten, some stupid mistakes. play with JProfiler to find some startup performance hot spots (eclipse 3.3), it indicates GoToUnreadTaskAction takes some time, so I give it a performance test. how come I can't simply attach file, the task editor keeps warning "commit edit or synchronize task...", I think I have already done that... Created attachment 95791 [details]
mylyn/context/zip
Created attachment 95792 [details]
TaskListViewPerformanceTest
write a performance test after profiling startup
> As you can see in the wiki (link provided above), the higher the sample size, > the smaller the confidence interval, hence the performance test cases become > more sensitive (more useful?). I also understand its not realistic to run 6401 > times for all test cases, I will try to find a reasonable and acceptable number > in the near future. Thanks for the info. It might indeed be a good idea to make this configurable so it is easily possible to run the tests more often in an automated environment and less often when running on the local machine. It'd be good to find out how other Eclipse (platform) projects do this. > > I wasn't able to add the ContextPerformanceTest. It seems to be missing > > MockInteractionContextManager. > > yeah, I have made a patch and uploaded, please check that out Is MockInteractionContextManager a copy of InteractionContextManager with slight modifications to make it testable? It is important to make changes of that sort in the original class, e.g. by extracting the relevant pieces of the code you want to test into a separate method. > BTW, do you guys happen to know anyone also writing this kind of performance > tests? How many times do they normally run their tests? I don't know anyone else personally who is working on performance tests. I took a quick look at ActivateEditorTest in org.eclipse.jdt.text.tests. It seems that by default the code is run 10 times for warm up and 5 times to measure the performance but they allow overriding it through a debug option. (In reply to comment #32) > play with JProfiler to find some startup performance hot spots (eclipse 3.3), it > indicates GoToUnreadTaskAction takes some time, so I give it a performance test. Yes, this action traverses the Tree control and expands and collapses nodes so it can be slow on task lists with many (collapsed) nodes. I looked at the test case and it only seem to test the handler activation (which takes < 3ms) but does not run the actual command. You could try creating a new instance of GoToUnreadTaskAction and calling the run() method on it instead. (In reply to comment #36) > Is MockInteractionContextManager a copy of InteractionContextManager with slight > modifications to make it testable? It is important to make changes of that sort > in the original class, e.g. by extracting the relevant pieces of the code you > want to test into a separate method. > Right, I modified InteractionContextManager to make two of the methods (loadActivityMetaContextFromFile & getFileForContext) more accessible for testing purpose, e.g. loading context with specified file path. > I don't know anyone else personally who is working on performance tests. I took > a quick look at ActivateEditorTest in org.eclipse.jdt.text.tests. It seems that > by default the code is run 10 times for warm up and 5 times to measure the > performance but they allow overriding it through a debug option. I created two helper classes (TaskPerformanceTestCase & InvocationCountPerformanceMeter) copying their ideas but with reflection. Now we can do the warm up "exercises" and the debug stuff now :), also the running times become pretty flexible, but will introduce minor overheads because reflection is used. Since normally we compare the results with the previous builds to decide performance dropdowns, I don't think the overheads will be a big problem. I modified two performance test cases (TaskListPerformanceTest & TaskContainerTest) to show the usage. Please let me know whether you guys like the idea or not. If do, I can make the helper classes more general for all the performance test cases. Created attachment 96692 [details]
Performance test helper classes
check TaskContainTest1 for usage. need org.eclipse.jdt.debug as dependency
(In reply to comment #37) > (In reply to comment #32) > Yes, this action traverses the Tree control and expands and collapses nodes so > it can be slow on task lists with many (collapsed) nodes. I looked at the test > case and it only seem to test the handler activation (which takes < 3ms) but > does not run the actual command. You could try creating a new instance of > GoToUnreadTaskAction and calling the run() method on it instead. Done, it "is" a hotspot that has performance issue (may need fixing), image the unread task is at the bottom of the tree, it will be a pain to dig that out (finding an unread task at the 2nd top of the tree surprisingly takes 400ms on my machine). BTW, a more typical data sample is needed to run this performance test, since in existing sample the first unread task is at the 2nd top. Created attachment 96728 [details]
GoToUnreadTaskAction Performance Test
more typcal test data are needed
Created attachment 97960 [details]
6 performance tests
performance tests from little popup window to TaskFormPage
Removing milestone as this will be ongoing work. Created attachment 146251 [details]
patch for performance tools
A very basic performance test suite now exists in the org.eclipse.mylyn.tests.performance plug-in. The baseline suite (3.2.0 at the moment) is run weekly and performance tests from head are run nightly. The nightly builds generate performance reports. The next step is to extend the suite with additional tests. Performance reports will be published with each release. I'll mark this as resolved. We can create additional bugs to implement performance tests for identified hotspots. |