| Summary: | IDE no longer responsive when junit tests are running | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] JDT | Reporter: | Philipe Mulet <philippe_mulet> | ||||||||||
| Component: | UI | Assignee: | Markus Keller <markus.kell.r> | ||||||||||
| Status: | VERIFIED FIXED | QA Contact: | |||||||||||
| Severity: | critical | ||||||||||||
| Priority: | P3 | CC: | benno.baumgartner, daniel_megert, darin.eclipse, jerome_lanneluc, martinae, tobias_widmer | ||||||||||
| Version: | 3.2 | ||||||||||||
| Target Milestone: | 3.2 RC7 | ||||||||||||
| Hardware: | PC | ||||||||||||
| OS: | Windows XP | ||||||||||||
| Whiteboard: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Philipe Mulet
Junit view is configured to only show failures (hierarchical/flat doesn't seem to make any difference) Glancing at windows task manager, it appears that the VM running the tests only get 5% of CPU, where the IDE task consumes 95% of the CPU. Actually, still on the same run, the problem seemed to have disappeared. I wonder if the toggle from hierarchical back to flat did solve the problem (maybe I saw no difference, since it wasn't yet considered... ?) Did you see any blocking/progress dialogs? I had the same feeling once when I wanted to Releng > Release our plug-ins while our tests were still running. There was no progress dialog involved in there. Just plain junit running, and still I was able to click in editor etc... just it was deadly slow to react. Like 10 seconds in between click in editor, and cursor move... I observed this behavior when autoscroll is on. When turning autoscroll off, I can edit files again. Do you still see this? I tried with out tests and a large CU and it looked OK. I'm not seeing the problem either. Code in that area has changed a lot. Please reopen if you see this again. We're seeing it again with N20060518. To reproduce: 1. Check out org.eclipse.jdt.core.tests.compiler v_667 2. Make sure that 'Show Failure Only' is on, and 'Scroll Lock' is off 3. Run org.eclipse.jdt.core.tests.compiler.regression.TestAll as a Junit Plug-in Test Observe: The UI becomes unresponsive after a while. Thread dumps show that some work is done in the UI thread:
main" prio=6 tid=0x00277c00 nid=0xea4 runnable [0x0086e000..0x0086fe64]
java.lang.Thread.State: RUNNABLE
at org.eclipse.jdt.internal.junit.model.TestElement$Status.combineError(TestElement.java:98)
at org.eclipse.jdt.internal.junit.model.TestElement$Status.combineStatus(TestElement.java:82)
at org.eclipse.jdt.internal.junit.model.TestSuiteElement.getStatus(TestSuiteElement.java:49)
at org.eclipse.jdt.internal.junit.model.TestSuiteElement.getStatus(TestSuiteElement.java:45)
at org.eclipse.jdt.internal.junit.model.TestSuiteElement.getStatus(TestSuiteElement.java:48)
at org.eclipse.jdt.internal.junit.ui.TestViewer.autoScrollInUI(TestViewer.java:550)
at org.eclipse.jdt.internal.junit.ui.TestViewer.processChangesInUI(TestViewer.java:459)
at org.eclipse.jdt.internal.junit.ui.TestRunnerViewPart.processChangesInUI(TestRunnerViewPart.java:859)
at org.eclipse.jdt.internal.junit.ui.TestRunnerViewPart.access$13(TestRunnerViewPart.java:843)
at org.eclipse.jdt.internal.junit.ui.TestRunnerViewPart$UpdateUIJob.runInUIThread(TestRunnerViewPart.java:502)
at org.eclipse.ui.progress.UIJob$1.run(UIJob.java:94)
at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35)
at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:123)
- locked <0x10190b18> (a org.eclipse.swt.widgets.RunnableLock)
at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:3325)
at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2971)
at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:1914)
at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:1878)
at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:419)
at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:143)
at org.eclipse.ui.internal.ide.IDEApplication.run(IDEApplication.java:95)
at org.eclipse.core.internal.runtime.PlatformActivator$1.run(PlatformActivator.java:78)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:92)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:68)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:400)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:177)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:589)
at org.eclipse.core.launcher.Main.invokeFramework(Main.java:336)
at org.eclipse.core.launcher.Main.basicRun(Main.java:280)
at org.eclipse.core.launcher.Main.run(Main.java:977)
at org.eclipse.core.launcher.Main.main(Main.java:952)
I also saw it on N20060517. The behavior is ok if toggling off hierarchical mode. Not only does it make UI non responsive, but it also make the tests run much much slower. I think this is a must fix for 3.2.0 Markus, please investigate Re: comment 14. Actually enabling scroll lock is the workaround. org.eclipse.jdt.internal.junit.model.TestSuiteElement.getStatus() is called very often and is very expensive (combines status of all tests, see also attached yourkit profile). I guess this can be optimized big time. But I'm not sure if it is a must fix since, at least on my machine, Eclipse is still quite responsive (but I agree that it is a pain to write something) plus there is a workaround. But maybe my machine is just too fast:-) Created attachment 42042 [details]
yourkit hot spot
Method counts after 1150 tests
I used to run with exact same settings (no scroll lock), and it was fine. Something must have changed recently. Currently, Eclipse is not even repainting when running JDT/Core tests, and they take forever (>2hours in this mode). So this is really deadly. Out of the box, I suspect scroll lock is disabled... so users will see it big time I am afraid --> must fix to me. Created attachment 42137 [details]
proposed fix
This is a proposed fix as small as possible, maybe too small? Markus can you have a look at it.
Basic idea is that children inform there parent about status changes and the parent does recalculate its state only if a child state has changed.
I'm not sure why there is a getSuiteStatus() and a getStatus(), IMHO this is not good because it's possible to call setStatus() and then getStatus() right afterwarts and get a diffrent status then the one set, which is very confusing to a client.
Created attachment 42277 [details] Fix2 The proposed fix only fixes the performance problems in some scenarios, but makes it even worse in others (since it calculates the suite status on every change, even if the status is never used afterwards). Here's a better fix that caches the status of suite elements (and flushes the cache whenever a child status has changed). The changes in TestElement and TestSuiteElement fix most of the perfomance problems with the filtered & hierarchical mode. The patch also fixes bug 142190 (in TestViewer and TestSessionLabelProvider) and removes the other perfomance bottleneck by not calling TreeViewer#remove(..) on items that do not exist in the viewer. I made various perfomance tests and could only produce UI lagging if a big number of tests fail. This is hardly avoidable. Most of the time spent there is in TreeViewer.add(..), which could only be optimized further with a completely different update strategy. In realistic scenarios, UI overhead has dropped so much that it is hardly measurable. The hot spot is now in ViewPart#setContentDescription(..), which has been there forever. Tobias has reviewed the fix and gives a "Go". Reviewed patch with Markus. I'll test the patch tomorrow more thoroughly. Benno, can you also do review and testing? Unfortunatly we didn't find a one or two line fix for this problem. Introducing synchronized statements is also something we wanted to avoid but was necessary as the status and children are accessed from two threads. The current implementation keeps the lock time as small as possible and does not make any calls to the outside when inside a lock. Note that this is a regression to 3.1 due to the rewrite of the JUnit view in M6. I think it is a must fix as Philipps comments show: The UI is completly blocked if you have big test suites. Darin, Dani and Philippe, do you approve the patch for 3.2? We will stress test the JUnit view tomorrow to make sure to not run into a regression. I will review it tomorrow. I agree with fixing this for 3.2.0, but have not had a chance to review the code. Created attachment 42878 [details] Simpler fix (propagating upwards only) Fix2 was too dangerous, since the synchronize statements could potentionally have lead to deadlocks. I'm attaching a much simpler fix that relies on the order of status changes (NotRun -> Running (run children) -> Done). Status changes are only propagated towards the test tree root and child stati are cumulated if a child change could have lead to a status change in the parent. Again, the patch also contains fixes for bug 142190, since the optimization for this bug and the rendering of failed suites in filtered mode are mutually dependent. For the records: I've reviewed the patch last week together with Markus and we agreed that 'Fix2' is quite complicated and risky and decided to try to find a better fix. The better solution is now the 'Simpler fix' for which I can vote +1 for 3.2 RC7. I confirm Markus' latest patch is addressing my problem, and bug 142190 as well which I came across as well. I also have the impression that the UI is even better responsive than it was in the past (maybe over enthusiastic here ? <g>) reviewed and tested latest patch, looks good and is simpler than the patch before as it avoids any synchronization issues. +1 for 3.2 RC7. Philippe, Dani and Darin, do you approve? +1 for 3.2RC7 +1 for 3.2 RC7 Released 'Simpler fix' for 3.2RC7. *** Bug 142190 has been marked as a duplicate of this bug. *** Verified in I20060602-0010. *** Bug 129925 has been marked as a duplicate of this bug. *** |