Community
Participate
Working Groups
I am opening this bug as a place for discussion about improving our performance testing infrastructure to support reliable measurement of memory usage. This is a follow-up to an initial discussion in this week's architecture meeting. Our current state is that we have a performance test infrastructure that supports measuring various dimensions of memory usage (from org.eclipse.test.performance.Dimension): WORKING_SET: the amount of memory in the working set of this process. USED_JAVA_HEAP: The amount of memory used in the JVM. WORKING_SET_PEAK: the maximum amount of memory in the working set of this process at any point in time. COMMITTED: The total amount of committed memory (for the entire machine). We captured data for these dimensions using a combination of native OS calls, and methods on java.lang.Runtime. However, our performance tests are automated tests that run in every build. We have a very large number of tests so we don't check the individual results in every build. What we look for are trends in the performance graph that may point to particular builds where regressions occurred. We found that the memory performance measurements were far too volatile and inconsistent from build to build, making the trend lines not very useful. For example garbage collection or other background activity would vary across runs, producing significant memory differences when there were no code changes in the tested components at all. We ended up disabling automated memory testing because the false positives outweighed the real regressions. Instead we periodically do manual testing with profilers such as YourKit, looking for big leaks or memory hogs that have cropped up since we last tested.
I am working on the Eclipse Memory Analyzer project (tools.mat) and have already spent some time thinking how MAT could be used not only as a post-mortem analysis tool, but also for memory measurements and checks during the development process. I think that it should be possible to use heap dumps during the tests and build some tools on top of (or within) MAT to analyze them. Roughly the process could look like: 1. Define some kind of checkpoints within the tests which are executed and give them some description 2. During the test execution heap dumps are generated at the selected checkpoints 3. After the tests execution the heap dumps are analyzed (what this means need to be defined) and for there is a summary generated for every checkpoint The most interesting question here is what the analysis will do. From my experience so far, I know that one useful number to measure is the retained size of a group of objects, where the group can be all objects of a class, or all objects of classes belonging to certain packages. The retained size means the amount of java heap which could be GC-ed if we assume the group of objects goes away. So if I stick to the retained size as an example, the results of the analysis would tell us for example that package org.eclipse.mat.* used X mb at the first checkpoint, Y mb at the second checkpoint, and so on… And if these numbers are observed over time, then I believe the information will be valuable for the developers and would allow to recognize improvement / regression. Measuring the retained size of object groups is just one idea. A heap dump contains the whole object graph, i.e. all the objects, their fields and values. MAT exposes this information as API and one could build arbitrary very specific checks using this API, like - assert that there are not more than 1 objects of class X - check if object of type X is only referenced by objects of types Y - and so on… Having all this info however has its downsides. Heap dumps will be about as big as the heap of the process which was dumped. I have no idea how big are the processes during the tests, what is the experience there? Any comments on these thoughts? Does this (although very roughly sketched) sound reasonable and applicable for the problems you are trying to solve? Unfortunately I have so far no experience with the tooling you use for the performance testing. Can you give me some hints where to read about it and try it out?
This all sounds reasonable/interesting. I suspect that many people would be surprised if they started asserting some things about their object structure :-) The heaps might be big but they are also transient -- dump, dump, dump, analyze, toss.
> Unfortunately I have so far no experience with the tooling you use for the > performance testing. Can you give me some hints where to read about it and try > it out? I'd start here: http://wiki.eclipse.org/Platform-releng-faq#How_do_I_set_up_performance_tests.3F
Dani, thanks for the hint. In the last days I played a little bit with the performance testing tools and had some sample own test running. I think now I understand them a bit better. Below I¡¦ll try to describe my current ideas on this and hope to get some feedback then. I think the major questions to be discussed / addressed are: * Where (in the test code) and how to trigger heap dumps * When to analyze the heap dumps * What to analyze / what could be interesting numbers to derive * How to integrate all steps in a usable way ----------------------------------------------------------------------- Where (in the test code) and how to trigger heap dumps ----------------------------------------------------------------------- At the moment the heap dump acquiring features in Memory Analyzer are designed to be used by our GUI, and are not easily usable programmatically. I can try to wrap this and provide a method similar to the Performance.createPerformanceMeter(String id). It should be possible that the test writer just says ¡§create now a dump with id [my-sample-id]¡¨ within his tests. Here I had also some other thoughts ¡V for the performance tests one needs to write dedicated tests and specify when the measurement starts and when it stops. For analysis based on heap dumps I was hoping to be able to reuse some existing tests (I guess certain scenario or integration tests will be suitable). All you need to say is ¡§dump now¡¨, so if there is a test which tests for example opening a number of files in an editor, the developer can add a line to trigger a dump when all files are open. ----------------------------------------------------------------------- When to analyze the heap dumps ----------------------------------------------------------------------- My initial idea was that the heap dumps are just triggered during the tests, and that the whole analysis is done afterwards. As I¡¦ve seen the Performance.assertPerformance() method within the test code, I though it would be nice to be able to make similar asserts within the memory tests, e.g. something like ¡§assert that the retained heap for class X is not much worse as last time¡¨ or ¡§assert that there is only one instance of class Y¡¨. This will have the positive effect that the junit tests will report errors if there are big deviations, but then the waiting for the dump to be processed and analyzed will be within the test. This should be fine if these are dedicated tests for memory, but not good if one create the dumps as part of some existing integration tests. Any thought on this? I still think the analysis should be done as a separate step after the tests are executed. ----------------------------------------------------------------------- What to analyze / what could be interesting numbers to derive ----------------------------------------------------------------------- I think a good beginning will be to provide some functionality which enables the developers to say ¡§I¡¦d like to see the retained size for packages (and subpackages) of org.eclipse.X in the heap dump with id [my-sample-id].¡¨ Then the tool should write this data in certain format, and the performance test tools can use this data. I can build something like this in a new bundle under the Memory Analyzer project. I think that if I provide this and give you some code for a sample test, then you should be able to try it out on some components you are interested in and see if the numbers are helpful and observe how stable they are. ----------------------------------------------------------------------- How to integrate all steps in a usable way ----------------------------------------------------------------------- I suggest that we start with having the analysis done as a separate step, may be even manually executed at the moment. Then we can have something to play with, collect new ideas, find some reasonable numbers to measure, etc¡K Once we have some good results, I believe you¡¦ll have already some ideas how the whole think can be integrated in one process. Well, this is what I currently have in my head. Any feedback is welcome. If you think the proposal is reasonable, then I can start preparing the first building blocks (like triggering the dumps, etc¡K). Any help with coding is also appreciated ƒº
Sorry for the garbage symbols in my last post. Copying from a different editor into bugzilla had some side effects. I re-posting now the the same (comment 4) with hopefully proper signs. Below I'll try to describe my current ideas on this and hope to get some feedback then. I think the major questions to be discussed / addressed are: * Where (in the test code) and how to trigger heap dumps * When to analyze the heap dumps * What to analyze / what could be interesting numbers to derive * How to integrate all steps in a usable way ----------------------------------------------------------------------- Where (in the test code) and how to trigger heap dumps ----------------------------------------------------------------------- At the moment the heap dump acquiring features in Memory Analyzer are designed to be used by our GUI, and are not easily usable programmatically. I can try to wrap this and provide a method similar to the Performance.createPerformanceMeter(String id). It should be possible that the test writer just says "create now a dump with id [my-sample-id]" within his tests. Here I had also some other thoughts – for the performance tests one needs to write dedicated tests and specify when the measurement starts and when it stops. For analysis based on heap dumps I was hoping to be able to reuse some existing tests (I guess certain scenario or integration tests will be suitable). All you need to say is "dump now", so if there is a test which tests for example opening a number of files in an editor, the developer can add a line to trigger a dump when all files are open. ----------------------------------------------------------------------- When to analyze the heap dumps ----------------------------------------------------------------------- My initial idea was that the heap dumps are just triggered during the tests, and that the whole analysis is done afterwards. As I've seen the Performance.assertPerformance() method within the test code, I though it would be nice to be able to make similar asserts within the memory tests, e.g. something like "assert that the retained heap for class X is not much worse as last time" or "assert that there is only one instance of class Y". This will have the positive effect that the junit tests will report errors if there are big deviations, but then the waiting for the dump to be processed and analyzed will be within the test. This should be fine if these are dedicated tests for memory, but not good if one create the dumps as part of some existing integration tests. Any thought on this? I still think the analysis should be done as a separate step after the tests are executed. ----------------------------------------------------------------------- What to analyze / what could be interesting numbers to derive ----------------------------------------------------------------------- I think a good beginning will be to provide some functionality which enables the developers to say "I'd like to see the retained size for packages (and subpackages) of org.eclipse.X in the heap dump with id [my-sample-id]." Then the tool should write this data in certain format, and the performance test tools can use this data. I can build something like this in a new bundle under the Memory Analyzer project. I think that if I provide this and give you some code for a sample test, then you should be able to try it out on some components you are interested in and see if the numbers are helpful and observe how stable they are. ----------------------------------------------------------------------- How to integrate all steps in a usable way ----------------------------------------------------------------------- I suggest that we start with having the analysis done as a separate step, may be even manually executed at the moment. Then we can have something to play with, collect new ideas, find some reasonable numbers to measure, etc... Once we have some good results, I believe you'll have already some ideas how the whole think can be integrated in one process. Well, this is what I currently have in my head. Any feedback is welcome. If you think the proposal is reasonable, then I can start preparing the first building blocks (like triggering the dumps, etc...). Any help with coding is also appreciated :)
> ----------------------------------------------------------------------- > Where (in the test code) and how to trigger heap dumps > ----------------------------------------------------------------------- > At the moment the heap dump acquiring features in Memory Analyzer are designed > to be used by our GUI, and are not easily usable programmatically. I can try to > wrap this and provide a method similar to the > Performance.createPerformanceMeter(String id). It should be possible that the > test writer just says "create now a dump with id [my-sample-id]" within his > tests. > > Here I had also some other thoughts – for the performance tests one needs to > write dedicated tests and specify when the measurement starts and when it > stops. For analysis based on heap dumps I was hoping to be able to reuse some > existing tests (I guess certain scenario or integration tests will be > suitable). All you need to say is "dump now", so if there is a test which tests > for example opening a number of files in an editor, the developer can add a > line to trigger a dump when all files are open. The situation is similar for timed tests. Sometimes there are particular blocks you want to time, in other situations you may just time a whole test. You could envision just setting an option and measuring certain things around any test. > ----------------------------------------------------------------------- > When to analyze the heap dumps > ----------------------------------------------------------------------- > My initial idea was that the heap dumps are just triggered during the tests, > and that the whole analysis is done afterwards. As I've seen the > Performance.assertPerformance() method within the test code, I though it would > be nice to be able to make similar asserts within the memory tests, e.g. > something like "assert that the retained heap for class X is not much worse as > last time" or "assert that there is only one instance of class Y". This will > have the positive effect that the junit tests will report errors if there are > big deviations, but then the waiting for the dump to be processed and analyzed > will be within the test. This should be fine if these are dedicated tests for > memory, but not good if one create the dumps as part of some existing > integration tests. > Any thought on this? I still think the analysis should be done as a separate > step after the tests are executed. After the fact analysis may be ok. One of the convenient things about doing it inline is that the semantics are easy to express. If it is done after the face then someone has to spec what dumps should be compared and in what way. Inline you just have a series of asserts. You still want a test that comes out pass or fail so you still have to wait for the analysis. It may be simpler just to do this inline with some new asserts and trigger methods that test writers can use. Makes the tests specific to memory performance but for the most part I suspect they will be anyway. In any event, whatever path works best to get something running will be fine.
- (In reply to comment #6) > After the fact analysis may be ok. One of the convenient things about doing it > inline is that the semantics are easy to express. If it is done after the face > then someone has to spec what dumps should be compared and in what way. Inline > you just have a series of asserts. You still want a test that comes out pass > or fail so you still have to wait for the analysis. It may be simpler just to > do this inline with some new asserts and trigger methods that test writers can > use. Makes the tests specific to memory performance but for the most part I > suspect they will be anyway. In any event, whatever path works best to get > something running will be fine. I guess there are different use cases. Sometimes it would be nice to wait and assert a condition, other times it would be enough to just trigger a dump and report after the tests some numbers (e.g. retained size of org.eclipse.xyz...) which are persisted by a tool for later reference. What botthers me is that with inlined assertions people will be tempted to do much more dumps, and that they may have negative experience with delays. I think what could help is to have the possibility to trigger the dumps at any time, and then have throughout the test access to each of them (by the id). Then it should be possible to assert some things immediately, and to implement separate tests doing more complex analysis which run at a later moment. I think it will be best if we have some prototype to play with. Therefore, I will try to write somethig in the next days, just to provide the basics: - to easily trigger a heap dump and give it an identifier - to be able to say "report mem.usage of org.eclipse.xyz..." - to be able to do a simple assert, i.e. "assert number of objects is less than ..." - to give the possibility to access any of the heap dumps and use MAT's API to build arbitrary analysis. I'm not sure how much time this will take me, hopefully not much. Then you'll have the possibility to play with the prototype and I hope we develop then further ideas together.
(In reply to comment #7) FYI: We already have some simple leak tests that count instances of a given class and report the backlinks paths if the count is not as expected. See LeakTestCase and subclasses in the org.eclipse.jdt.ui.tests project. Those tests are quite slow, since the implementation is based on reflection, and they miss instances that are only referenced by local variables or by native roots.
Created attachment 185717 [details] first prototype for memory testing tool Thanks for the hint with the jdt leak tests. I used them as a basis for testing my prototype. What I am attaching is really just an initial proposal. I am hoping it will give you something to play with, and will make it easier for us to continue the discussion with concrete experience, new ideas, etc... ------------------------------------- prototyping the tool ------------------------------------- The coding is just attached here as a sample, I haven't submitted it in MAT. What I did was 1) implemented a class which let's you create a heap dump - MemoryTest. 2) implemented a MemoryTestSnapshot class which represents a heap dump triggered by the MemoryTest.createSnapshot() methods. Inside I put some of the possible checks/reports - assertInctanceCount(String className, int count) - reportRetainedSize(Pattern, boolean) - this one just finds all instances of classes matching the name pattern (possibly also sublcasses - the second parameter) and calculates the retained size of all of them together. The size is currently just printed to stdout and returned as a result from the method - reportRetainedSizePerClass(Pattern, boolean) - this method will print to a file one line for every class matching the pattern, stating how many instances of it there are and what the shallow/retained sizes are. The retained size of some of the classes will be calculated into the retained size of other classes, i.e. this is really a flat view of all classes. The file is stored in the folder where heap dumps are produced. - reportTopConsumerClasses(Pattern, boolean) - this method will remove the "double-counting" of the size. It will first find in the dominator tree only the instances which are independent of each other (i.e. not nested) and will then report the sizes (per class) only for this "top-level" objects. I am sure there are also other numbers which can be insteresting. But hopefully you can start with these ones and give me then your feedback. These two classes are attached (as an Eclipse project, zipped). Just unzip and import them into your workspace. You'll need to have Memory Analyzer in the workspace for them to work. You can take the one from Indigo M4. Remarks: To specify where heap dumps are written use -Dheapdumps.dir=... in plugin test VM arguments I am currently using assert, you'll need to ad -ea to the starting options to enable assertions Dumps are triggered using the MBean com.sun.management:type=HotSpotDiagnostic . This will most probably (?) not work on IBM VMs. --------------------------------------- The JDT leak test as a sample --------------------------------------- I modified then the org.eclipse.jdt.ui.tests.leaks.JavaLeakTest class and attached it here as JavaLeakTestReworked. First, I simply used the instance count assertions based on heap dumps, leaving the methods as they are. This however creates over 50 heap dumps (about 3.6 GB) while running the test, and it is even slower (although not much) than using the reflection approach. I also print the consumption of org.eclipse.jdt for every dump. I think this is too slow. Therefore I added one more test - testEditorsClose() which could replace several of the others. Instead of testing after each editor creation/closing I packed the opening and closing of different editors together and do similar checks with a few dumps only. This is an approach I can imagine will be more usable. Additionally I added one test - testConsumptionOfEditors() - which opens 100 editors editors of each type (text, properties, java) and then reports the jdt memory usage. Here I used all of the report* methods, so that you can have an idea what each of them produces. Have a look at the numbers, think if they are understandable, if they are something that can be compared against, etc... This one takes several minutes to complete as it calculates too many retained sets. I guess the top-ten classes would be enough. Finally added a test asserting the instance count of a class which fails, just to show that you can get path to the object (currenly only one path and not nicely printed :-) but this can be changed). Field names should be added. ---------------------------- There are many other things which can be checked using heap dumps and which affect the quality / memory consumption of an application, for example the memory waste in empty collections, the memory waste in redundant String objects, usage of soft/weak refs etc... Try opening a heap dump and then running the "component report" - inside MAT. It will provide numbers like this for a component you specify. One can get the undelying MAT ISnapshot by calling MemoryTestSnapshot.getSnapshot() and then implement arbitrary checks. Some rough description (work in progress) of what can be read out of the heap dump is available here: http://wiki.eclipse.org/MemoryAnalyzer/Reading_Data_from_Heap_Dumps I'll be waiting for your feetback/further ideas.
Created attachment 185719 [details] Modified copy of org.eclipse.jdt.ui.tests.leaks.JavaLeakTest Here also the modified JavaLeakTest class. As samples for using the prototype look at testEditorsClose(), testConsumptionOfEditors() and testFailingInstanceCountTest() Read also the previous (rather lengthy) comment for more details
sounds very reasonable to me :-) In fact back in 2006/2007 (at SAP) I developed a simple tool for MAT that would gather certain Key performance indicators (KPI's) from heap dumps. I believe for enterprise server applications it's more useful to execute this kind of tests on load tests, because you may also want to know how certain KPI's would change with the number of users on the system. We would execute such kind of memory usage tests regularly to see whether on where memory usage has increased (or decreased hopefully). My simple tool would just compute the retained size for certain set of objects, mainly based on package names, using regexp. It would then dump these numbers to a csv file/database to be able to compare them with earlier values. There are more KPI's you could measure, but I believe that most times you would want to use package names or other more complicated descriptions of set of objects. Everything else is just too complicated to compare (in regression testing at least) and usually the goal is to find the right engineer to look into the issue in detail, for which package names are very convenient. Instead of manually configuring package names, one could also use the dominator tree aggregated by package to find "interesting"=large packages automatically. Other KPI's you could measure are the amount of duplicated Strings per package (max and average), nr. of empty collections etc. Also I strongly believe that this approach makes a lot of sense for servers , I believe it's even more interesting for mobile (android) applications. The reason is that those heap dumps are much smaller and therefore much faster to analyze, which means you could do heap dumps more often and execute more test. It's also obvious I think that the pressure to reduce memory usage is much higher on a mobile device. With more heap dumps available you could also more easily test for memory leaks. You would just compare whether certain package size would constantly grow. Regards, Markus (In reply to comment #1) > I am working on the Eclipse Memory Analyzer project (tools.mat) and have > already spent some time thinking how MAT could be used not only as a > post-mortem analysis tool, but also for memory measurements and checks during > the development process. > I think that it should be possible to use heap dumps during the tests and build > some tools on top of (or within) MAT to analyze them. > > Roughly the process could look like: > 1. Define some kind of checkpoints within the tests which are executed and give > them some description > 2. During the test execution heap dumps are generated at the selected > checkpoints > 3. After the tests execution the heap dumps are analyzed (what this means need > to be defined) and for there is a summary generated for every checkpoint > > The most interesting question here is what the analysis will do. From my > experience so far, I know that one useful number to measure is the retained > size of a group of objects, where the group can be all objects of a class, or > all objects of classes belonging to certain packages. The retained size means > the amount of java heap which could be GC-ed if we assume the group of objects > goes away. So if I stick to the retained size as an example, the results of the > analysis would tell us for example that package org.eclipse.mat.* used X mb at > the first checkpoint, Y mb at the second checkpoint, and so on… And if these > numbers are observed over time, then I believe the information will be valuable > for the developers and would allow to recognize improvement / regression. > > Measuring the retained size of object groups is just one idea. A heap dump > contains the whole object graph, i.e. all the objects, their fields and values. > MAT exposes this information as API and one could build arbitrary very specific > checks using this API, like > > - assert that there are not more than 1 objects of class X > - check if object of type X is only referenced by objects of types Y > - and so on… > > Having all this info however has its downsides. Heap dumps will be about as big > as the heap of the process which was dumped. I have no idea how big are the > processes during the tests, what is the experience there? > > Any comments on these thoughts? Does this (although very roughly sketched) > sound reasonable and applicable for the problems you are trying to solve? > > Unfortunately I have so far no experience with the tooling you use for the > performance testing. Can you give me some hints where to read about it and try > it out?
Sounds like great progress and promise.
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. If the bug is still relevant, please remove the "stalebug" whiteboard tag.