| Summary: | OSX Cocoa - Things get slower over time when running long sequence of tests | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] Platform | Reporter: | Francis Upton IV <francisu> | ||||||||||||
| Component: | SWT | Assignee: | Platform-SWT-Inbox <platform-swt-inbox> | ||||||||||||
| Status: | CLOSED WONTFIX | QA Contact: | |||||||||||||
| Severity: | normal | ||||||||||||||
| Priority: | P3 | CC: | cocoakevin, lshanmug, sgandon, Silenio_Quarti, skovatch | ||||||||||||
| Version: | 4.3 | Keywords: | triaged | ||||||||||||
| Target Milestone: | --- | ||||||||||||||
| Hardware: | PC | ||||||||||||||
| OS: | Mac OS X | ||||||||||||||
| Whiteboard: | stalebug | ||||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
Francis Upton IV
Created attachment 157688 [details]
Several stack traces
Also, this is on a 10.6 system running with a 32-bit Java 1.5 VM which was taken from a 10.5 system.
The reason I do this that what my product uses Hibernate which I build from source and Hibernate does not build on JDK 1.6.
I saw the posting about the problems associated with using a 10.5 JRE 1.5 JVM on a 10.6 system and as far as I can tell they have to do with AWT use. I ran into these when my RCP application would not even boot because it hung looking for the AWT libraries. I fixed the application to remove all use of AWT and I have not had any problems with the app at all, so I do not think this current problem is related to that issue.
The Eclipse version is 3.5.1.
Created attachment 157689 [details]
More stack traces later in the same run
They are starting to look more the same, and the run is getting even slower, tests cases taking about 5x as they would on the other platforms.
Francis, do you call Display#readAndDispatch during your test runs? We create and dispose NSAutoreleasePool objects in that function. (In reply to comment #3) > Francis, do you call Display#readAndDispatch during your test runs? We create > and dispose NSAutoreleasePool objects in that function. Yes I do, I use it when I want to make sure all outstanding events have been processed. I call it between test cases and at other times. Is it bad that I do that, should I be doing something else? > Is it bad that I do that, should I be doing something else?
No, that's good. I was going to suggest adding that code if you didn't have it. Cocoa has a ref counting system for memory management. When you create objects you have the option of putting them into an NSAutoreleasePool instead of managing the releasing of the object yourself. The unfortunate side affect of this is that you have to release your pools frequently, or the objects in them stick around. In SWT we manage the pools for you in read and dispatch, and Shell#setWindowVisible.
We have seen the behavior you describe while running tests when either readAndDispatch isn't called, or it's only called on a modal event loop and shells are never opened.
Is your test code open source?
Created attachment 157935 [details] Test class that uses readAndDispatch() > Is your test code open source? Well the one class that calls it is now open source (per the terms of the bugzilla agreement). Though it depends on a bunch of other junk. I have no objection to releasing it as open source, it's just a question of where and packaging it. Hopefully from this class though you can find what you need. (In reply to comment #5) > > Is it bad that I do that, should I be doing something else? In SWT we manage the pools for you in read and dispatch, and > Shell#setWindowVisible. This is likely the problem; with my tests hundreds of test cases are run and the window is never set to be visible, so this pool then is accumulating all of this stuff. Is there something I can call in my tests to clean this out? I would imagine this problem is going to show up in other situations similar to mine where testing is done in a "sort of headless" way. Or other situations where the heuristic that SWT uses to reset the pool does not match what the app is doing, so having this documented and having explicit control would be good. On the other hand, it's a platform dependent thing which is bad. Of course none of this is telling you anything you don't know so I will stop now. :) From the stack traces, it looks like your tests are run in asyncExec. Do you have one asyncExec per test, or one that runs all of your tests? Can you try adding code like "shell.setVisible(true); shell.setVisible(false);" to your test code somewhere? (In reply to comment #8) > From the stack traces, it looks like your tests are run in asyncExec. Do you > have one asyncExec per test, or one that runs all of your tests? Can you try > adding code like "shell.setVisible(true); shell.setVisible(false);" to your > test code somewhere? I think the platform is started with one big asyncExec, and all of my tests run on the UI thread under that (I just do the same thing the Eclipse UI tests do, use the Junit plugin support to run the tests). I tried adding a setVisible() as you suggested between each test case, and the problem still seems to happen, I'm getting the same stack traces as before, and things progressively slow down. And I can see that the setVisible()s are getting called because the shell becomes visible each time (which sort of defeats the purpose of having it be non-visible in the first place, so it's not really a workable solution for me). One alternative I can try (not sure if it would work) is to use the vfb stuff that Paul (and I use) on Linux. But I don't know if the Mac uses the X stuff or not. This happens on 3.6.1 as well, and it's getting important to me to fix it as I want to move my development to the Mac. I would be happy to put some counters or tracing in the code to try and track this down since I can easily make it happen. Created attachment 231185 [details]
Traces on 4.3M7
And here are some stack traces for 4.3M7, which are substantially the same as they were in 3.6.1. I just did a kill -3 a few times and it's always in msg_send, and the amount of non-Java memory that is consumes seems to grow quite a bit. It's running a suite of about 1400 test cases, and it seems to gradually slow down as it gets further along into the suite, tests that normally take a second to run are running in 3-4 seconds after the first couple of hundred tests and it gets worse.
I have the XCode instrumentation tools and have been profiling a little bit with those. One thing I noticed in my testing with 3.8.2 is that the number of SWTCanvasView objects (as shown by the OSX instruments) seemed to be about twice or more than the number of SWTCanvasView objects as shown in JProfiler memory view. The ones shown in JProfiler roughly corresponded to the Composite objects and seemed to be within normal limits. The ones showing in OSX instruments seemed to be growing much beyond the active Java objects. I'm going to do a complete run with the OSX instrumentation turned on to try and further characterize this.
I would be happy for any guidance about what I can look for using the Mac profiling tools which seem to be pretty good.
Additional data: if I run one portion of the tests my themselves and do a bunch of kill -3 so get the tracing, it's often in the normal Java code (not in a msg send) which is what I would expect. More progress. I have been focused on SWTCanvasView which is clearly leaking when I compare the counts with JProfiler. I have since instrumented the SWT source to show the number of these objects which should be allocated (by having a static count of when they are allocated in Composite.createHandle() and released in Control.releaseHandle(). When I do this running a Snippet (like Snippet98) is behaves as expected when looking at the Allocations instrument, the retainCount field of the SWTCanvasView, and the static counter I added. No matter how many composites I create using that snippet, all of the counts line up. However, when running in my tests, the retainCount and the count according to the Allocations instrument get larger over time which is a leak. The number of Composite objects are not getting larger, so this indicates that something is retaining the SWTCanvasView outside of the lifetime of the Composite object (I'm assuming the Composite goes away at releaseHandle(), I guess I can check this with the Allocations instrument). I also put instrumentation around the addPool/removePool and verified they were being called a lot. And I went further and made it only use one pool all the time (which was removed in removePool) to make sure this was not a pool leak issue. (I added a Shell.setVisible(false) and then true between each test case. I'm now going to try to make a small test case (using my tests) that shows the problem in a short time to try and narrow it down. Does anyone have an idea about why an SWTCanvasView would have a lifetime longer than its enclosing Composite? I have been able to reproduce the problem with a loop closing and recreating one of the EditorParts associated with my product. According to JProfiler, and my own instrumentation added to the SWT code, the Composite and SWTCanvasView are completely stable as editors are closed. However, in the allocations instrument, the SWTCanvasView is growing each time by a considerable amount, much higher than one object per editor (I have not calculated exactly). When releasing the handle (the code in Control), I check the retainCount, and in in the loop creating the editors, this only is higher than 2 (it's a 3) consistently according to the stack trace below, this corresponds to releasing the main Composite associated with the EditorPart (this holds several controls that implement the editor). So it's clear the Java objects Composite and SWTCanvasView are being correctly garbage collected and there is no leak there, but the corresponding native object to SWTCanvasView is leaking quite a lot in this situation. Next step is to try and find out what things can use and retain a reference to the SWTCanvasView object. Any ideas or help is appreciated. Daemon Thread [Thread-1] (Suspended (breakpoint at line 2847 in Control)) owns: RunnableLock (id=212) Composite(Control).releaseHandle() line: 2847 Composite(Scrollable).releaseHandle() line: 311 Composite(Control).destroyWidget() line: 1037 Composite(Widget).release(boolean) line: 1310 Composite(Control).release(boolean) line: 2836 Composite(Widget).dispose() line: 663 EditorPane(PartPane).dispose() line: 182 EditorReference(WorkbenchPartReference).dispose() line: 681 WorkbenchPage.disposePart(WorkbenchPartReference) line: 1810 WorkbenchPage.handleDeferredEvents() line: 1514 WorkbenchPage.deferUpdates(boolean) line: 1498 WorkbenchPage.closeEditors(IEditorReference[], boolean) line: 1472 WorkbenchPage.closeEditor(IEditorPart, boolean) line: 1536 Editor$14.run() line: 1724 UISynchronizer(Synchronizer).syncExec(Runnable) line: 180 UISynchronizer.syncExec(Runnable) line: 150 Display.syncExec(Runnable) line: 4607 Editor(AppEditorImpl).displaySyncExec(Runnable) line: 2100 Editor.closeMainNode(BaseMainNode, boolean) line: 1707 MemoryLeakTest.testSwtLeak() line: 36 NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method] NativeMethodAccessorImpl.invoke(Object, Object[]) line: 39 DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 25 Method.invoke(Object, Object...) line: 597 MemoryLeakTest(TestCase).runTest() line: 168 MemoryLeakTest(TransformTestBase).runTest() line: 1168 MemoryLeakTest(TestCase).runBare() line: 134 TestResult$1.protect() line: 110 TestResult.runProtected(Test, Protectable) line: 128 TestResult.run(TestCase) line: 113 MemoryLeakTest(TestCase).run(TestResult) line: 124 TestSuite.runTest(Test, TestResult) line: 243 TestSuite.run(TestResult) line: 238 SuiteMethod(JUnit38ClassRunner).run(RunNotifier) line: 83 JUnit4TestClassReference(JUnit4TestReference).run(TestExecution) line: 50 TestExecution.run(ITestReference[]) line: 38 RemotePluginTestRunner(RemoteTestRunner).runTests(String[], String, TestExecution) line: 467 RemotePluginTestRunner(RemoteTestRunner).runTests(TestExecution) line: 683 RemotePluginTestRunner(RemoteTestRunner).run() line: 390 RemotePluginTestRunner.main(String[]) line: 62 PlatformUITestHarness$1.run() line: 47 RunnableLock.run() line: 35 UISynchronizer(Synchronizer).runAsyncMessages(boolean) line: 135 Display.runAsyncMessages(boolean) line: 3946 Display.readAndDispatch() line: 3623 Workbench.runEventLoop(Window$IExceptionHandler, Display) line: 2701 Workbench.runUI() line: 2665 Workbench.access$4(Workbench) line: 2499 Workbench$7.run() line: 679 Realm.runWithDefault(Realm, Runnable) line: 332 Workbench.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 668 PlatformUI.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 149 IDEApplication.start(IApplicationContext) line: 124 UITestApplication(NonUIThreadTestApplication).runApp(Object, IApplicationContext, String[]) line: 54 UITestApplication.runApp(Object, IApplicationContext, String[]) line: 41 UITestApplication(NonUIThreadTestApplication).start(IApplicationContext) line: 48 EclipseAppHandle.run(Object) line: 196 EclipseAppLauncher.runApplication(Object) line: 110 EclipseAppLauncher.start(Object) line: 79 EclipseStarter.run(Object) line: 353 EclipseStarter.run(String[], Runnable) line: 180 NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method] NativeMethodAccessorImpl.invoke(Object, Object[]) line: 39 DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 25 Method.invoke(Object, Object...) line: 597 Main.invokeFramework(String[], URL[]) line: 629 Main.basicRun(String[]) line: 584 Main.run(String[]) line: 1438 Main.main(String[]) line: 1414 Created attachment 231506 [details]
UI test program that shows the leak
Attached is a JUnit test case adapted from the Eclipse Platform UI test suite that demonstrates the problem:
1) Check out the platform.ui git repo and import the projects
2) Copy this file into the org.eclipse.ui.tests.api package in Eclipse UI Tests
3) Start the test with JProfiler (or the like), then start the Allocation instrument on the test
In your profiler you will see the number of SWT Composite objects stabilize at around 26, but when you look the Allocation instrument you will see the corresponding SWTCanvasView objects monotonically increasing for each editor created, showing the leak.
So now I'm a bit stuck and need some help. My next thought is to put a breakpoint at retain() in SWTCanvasView. To do this, I would need to add the retain method in the Java and then generate the JNI, but in trying this, I can't find out where there is any JNI for this class. It does seem from the profiling that this class exists both in C and Java, but I can't figure out how it exists in C. Can someone guide me a little? (In reply to comment #16) > So now I'm a bit stuck and need some help. My next thought is to put a > breakpoint at retain() in SWTCanvasView. To do this, I would need to add the > retain method in the Java and then generate the JNI, but in trying this, I > can't find out where there is any JNI for this class. You can either copy or override the retain method from NSObject to SWTCanvasView.java. No need to generate any JNI. NSObject.retain() uses OS.objc_msgSend(this.id, OS.sel_retain) and the JNI for the method already exists. (In reply to comment #15) > Created attachment 231506 [details] > UI test program that shows the leak > > Attached is a JUnit test case adapted from the Eclipse Platform UI test > suite that demonstrates the problem: > > 1) Check out the platform.ui git repo and import the projects > 2) Copy this file into the org.eclipse.ui.tests.api package in Eclipse UI > Tests > 3) Start the test with JProfiler (or the like), then start the Allocation > instrument on the test > > In your profiler you will see the number of SWT Composite objects stabilize > at around 26, but when you look the Allocation instrument you will see the > corresponding SWTCanvasView objects monotonically increasing for each editor > created, showing the leak. Lakshmi, can you try this program to duplicate the problem? I'm still having it. Bug triaged, visit https://wiki.eclipse.org/SWT/Devel/Triage for more information. This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. -- The automated Eclipse Genie. |