Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 129929

Summary: IDE no longer responsive when junit tests are running
Product: [Eclipse Project] JDT Reporter: Philipe Mulet <philippe_mulet>
Component: UIAssignee: Markus Keller <markus.kell.r>
Status: VERIFIED FIXED QA Contact:
Severity: critical    
Priority: P3 CC: benno.baumgartner, daniel_megert, darin.eclipse, jerome_lanneluc, martinae, tobias_widmer
Version: 3.2   
Target Milestone: 3.2 RC7   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Attachments:
Description Flags
yourkit hot spot
none
proposed fix
none
Fix2
none
Simpler fix (propagating upwards only) none

Description Philipe Mulet CLA 2006-03-01 07:54:56 EST
Build 3.2m5

When running the junit tests for JDTCore, I observed that it is almost impossible to further edit some files (e.g. buildnotes) until the tests have completed.
I accept the tests are using lots of CPU, but this is on a different VM instance.

Other applications are still responsive (email client, web browser, etc...).
Is this progress reporting causing grief here ?
Comment 1 Philipe Mulet CLA 2006-03-01 07:55:55 EST
Junit view is configured to only show failures (hierarchical/flat doesn't seem to make any difference)
Comment 2 Philipe Mulet CLA 2006-03-01 08:02:31 EST
Glancing at windows task manager, it appears that the VM running the tests only get 5% of CPU, where the IDE task consumes 95% of the CPU.

Comment 3 Philipe Mulet CLA 2006-03-01 08:04:41 EST
Actually, still on the same run, the problem seemed to have disappeared. I wonder if the toggle from hierarchical back to flat did solve the problem (maybe I saw no difference, since it wasn't yet considered... ?)
Comment 4 Dani Megert CLA 2006-03-01 08:30:41 EST
Did you see any blocking/progress dialogs? I had the same feeling once when I wanted to Releng > Release our plug-ins while our tests were still running.
Comment 5 Philipe Mulet CLA 2006-03-01 09:09:44 EST
There was no progress dialog involved in there. Just plain junit running, and still I was able to click in editor etc... just it was deadly slow to react. Like 10 seconds in between click in editor, and cursor move...
Comment 6 Jerome Lanneluc CLA 2006-03-01 09:20:53 EST
I observed this behavior when autoscroll is on. When turning autoscroll off, I can edit files again.
Comment 7 Dani Megert CLA 2006-05-03 12:00:07 EDT
Do you still see this?
Comment 8 Dani Megert CLA 2006-05-03 12:01:17 EDT
I tried with out tests and a large CU and it looked OK.
Comment 9 Jerome Lanneluc CLA 2006-05-03 12:26:47 EDT
I'm not seeing the problem either.
Comment 10 Markus Keller CLA 2006-05-03 12:31:39 EDT
Code in that area has changed a lot. Please reopen if you see this again.
Comment 11 Jerome Lanneluc CLA 2006-05-18 12:28:09 EDT
We're seeing it again with N20060518.
Comment 12 Jerome Lanneluc CLA 2006-05-18 13:04:10 EDT
To reproduce:
1. Check out org.eclipse.jdt.core.tests.compiler v_667
2. Make sure that 'Show Failure Only' is on, and 'Scroll Lock' is off
3. Run org.eclipse.jdt.core.tests.compiler.regression.TestAll as a Junit Plug-in Test
Observe: The UI becomes unresponsive after a while.
Comment 13 Jerome Lanneluc CLA 2006-05-18 13:04:48 EDT
Thread dumps show that some work is done in the UI thread:

main" prio=6 tid=0x00277c00 nid=0xea4 runnable [0x0086e000..0x0086fe64]
  java.lang.Thread.State: RUNNABLE
       at org.eclipse.jdt.internal.junit.model.TestElement$Status.combineError(TestElement.java:98)
       at org.eclipse.jdt.internal.junit.model.TestElement$Status.combineStatus(TestElement.java:82)
       at org.eclipse.jdt.internal.junit.model.TestSuiteElement.getStatus(TestSuiteElement.java:49)
       at org.eclipse.jdt.internal.junit.model.TestSuiteElement.getStatus(TestSuiteElement.java:45)
       at org.eclipse.jdt.internal.junit.model.TestSuiteElement.getStatus(TestSuiteElement.java:48)
       at org.eclipse.jdt.internal.junit.ui.TestViewer.autoScrollInUI(TestViewer.java:550)
       at org.eclipse.jdt.internal.junit.ui.TestViewer.processChangesInUI(TestViewer.java:459)
       at org.eclipse.jdt.internal.junit.ui.TestRunnerViewPart.processChangesInUI(TestRunnerViewPart.java:859)
       at org.eclipse.jdt.internal.junit.ui.TestRunnerViewPart.access$13(TestRunnerViewPart.java:843)
       at org.eclipse.jdt.internal.junit.ui.TestRunnerViewPart$UpdateUIJob.runInUIThread(TestRunnerViewPart.java:502)
       at org.eclipse.ui.progress.UIJob$1.run(UIJob.java:94)
       at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35)
       at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:123)
       - locked <0x10190b18> (a org.eclipse.swt.widgets.RunnableLock)
       at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:3325)
       at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2971)
       at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:1914)
       at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:1878)
       at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:419)
       at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:143)
       at org.eclipse.ui.internal.ide.IDEApplication.run(IDEApplication.java:95)
       at org.eclipse.core.internal.runtime.PlatformActivator$1.run(PlatformActivator.java:78)
       at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:92)
       at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:68)
       at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:400)
       at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:177)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:589)
       at org.eclipse.core.launcher.Main.invokeFramework(Main.java:336)
       at org.eclipse.core.launcher.Main.basicRun(Main.java:280)
       at org.eclipse.core.launcher.Main.run(Main.java:977)
       at org.eclipse.core.launcher.Main.main(Main.java:952)
Comment 14 Philipe Mulet CLA 2006-05-18 15:12:33 EDT
I also saw it on N20060517. The behavior is ok if toggling off hierarchical mode.
Comment 15 Philipe Mulet CLA 2006-05-18 15:14:18 EDT
Not only does it make UI non responsive, but it also make the tests run much much slower.

I think this is a must fix for 3.2.0
Comment 16 Martin Aeschlimann CLA 2006-05-19 08:51:18 EDT
Markus, please investigate
Comment 17 Philipe Mulet CLA 2006-05-19 09:30:32 EDT
Re: comment 14. Actually enabling scroll lock is the workaround.
Comment 18 Benno Baumgartner CLA 2006-05-19 12:35:48 EDT
org.eclipse.jdt.internal.junit.model.TestSuiteElement.getStatus() is called very often and is very expensive (combines status of all tests, see also attached yourkit profile). I guess this can be optimized big time. But I'm not sure if it is a must fix since, at least on my machine, Eclipse is still quite responsive (but I agree that it is a pain to write something) plus there is a workaround. But maybe my machine is just too fast:-)
Comment 19 Benno Baumgartner CLA 2006-05-19 12:36:36 EDT
Created attachment 42042 [details]
yourkit hot spot

Method counts after 1150 tests
Comment 20 Philipe Mulet CLA 2006-05-19 15:56:06 EDT
I used to run with exact same settings (no scroll lock), and it was fine. Something must have changed recently. Currently, Eclipse is not even repainting when running JDT/Core tests, and they take forever (>2hours in this mode).

So this is really deadly. Out of the box, I suspect scroll lock is disabled... so users will see it big time I am afraid --> must fix to me.
Comment 21 Benno Baumgartner CLA 2006-05-22 06:34:11 EDT
Created attachment 42137 [details]
proposed fix

This is a proposed fix as small as possible, maybe too small? Markus can you have a look at it. 

Basic idea is that children inform there parent about status changes and the parent does recalculate its state only if a child state has changed. 

I'm not sure why there is a getSuiteStatus() and a getStatus(), IMHO this is not good because it's possible to call setStatus() and then getStatus() right afterwarts and get a diffrent status then the one set, which is very confusing to a client.
Comment 22 Markus Keller CLA 2006-05-23 12:33:14 EDT
Created attachment 42277 [details]
Fix2

The proposed fix only fixes the performance problems in some scenarios, but makes it even worse in others (since it calculates the suite status on every change, even if the status is never used afterwards).

Here's a better fix that caches the status of suite elements (and flushes the cache whenever a child status has changed). The changes in TestElement and TestSuiteElement fix most of the perfomance problems with the filtered & hierarchical mode.

The patch also fixes bug 142190 (in TestViewer and TestSessionLabelProvider) and  removes the other perfomance bottleneck by not calling TreeViewer#remove(..) on items that do not exist in the viewer.

I made various perfomance tests and could only produce UI lagging if a big number of tests fail. This is hardly avoidable. Most of the time spent there is in TreeViewer.add(..), which could only be optimized further with a completely different update strategy. In realistic scenarios, UI overhead has dropped so much that it is hardly measurable. The hot spot is now in ViewPart#setContentDescription(..), which has been there forever.

Tobias has reviewed the fix and gives a "Go".
Comment 23 Tobias Widmer CLA 2006-05-23 12:54:56 EDT
Reviewed patch with Markus.

I'll test the patch tomorrow more thoroughly.
Comment 24 Martin Aeschlimann CLA 2006-05-23 12:58:09 EDT
Benno, can you also do review and testing?

Unfortunatly we didn't find a one or two line fix for this problem. Introducing synchronized statements is also something we wanted to avoid but was necessary as the status and children are accessed from two threads.

The current implementation keeps the lock time as small as possible and does not make any calls to the outside when inside a lock.

Note that this is a regression to 3.1 due to the rewrite of the JUnit view in M6.
I think it is a must fix as Philipps comments show: The UI is completly blocked if you have big test suites.
Comment 25 Martin Aeschlimann CLA 2006-05-23 13:00:03 EDT
Darin, Dani and Philippe, do you approve the patch for 3.2?
We will stress test the JUnit view tomorrow to make sure to not run into a regression.
Comment 26 Benno Baumgartner CLA 2006-05-23 13:04:32 EDT
I will review it tomorrow.
Comment 27 Darin Wright CLA 2006-05-23 16:42:24 EDT
I agree with fixing this for 3.2.0, but have not had a chance to review the code.
Comment 28 Markus Keller CLA 2006-05-29 13:53:21 EDT
Created attachment 42878 [details]
Simpler fix (propagating upwards only)

Fix2 was too dangerous, since the synchronize statements could potentionally have lead to deadlocks.

I'm attaching a much simpler fix that relies on the order of status changes (NotRun -> Running (run children) -> Done). Status changes are only propagated towards the test tree root and child stati are cumulated if a child change could have lead to a status change in the parent.

Again, the patch also contains fixes for bug 142190, since the optimization for this bug and the rendering of failed suites in filtered mode are mutually dependent.
Comment 29 Dani Megert CLA 2006-05-30 10:20:27 EDT
For the records: I've reviewed the patch last week together with Markus and we agreed that 'Fix2' is quite complicated and risky and decided to try to find a better fix. The better solution is now the 'Simpler fix' for which I can vote +1 for 3.2 RC7.
Comment 30 Philipe Mulet CLA 2006-05-30 10:22:26 EDT
I confirm Markus' latest patch is addressing my problem, and bug 142190 as well which I came across as well.

I also have the impression that the UI is even better responsive than it was in the past (maybe over enthusiastic here ? <g>)
Comment 31 Martin Aeschlimann CLA 2006-05-30 10:26:03 EDT
reviewed and tested latest patch, looks good and is simpler than the patch before as it avoids any synchronization issues.

+1 for 3.2 RC7.

Philippe, Dani and Darin, do you approve?
Comment 32 Philipe Mulet CLA 2006-05-30 10:31:36 EDT
+1 for 3.2RC7
Comment 33 Darin Wright CLA 2006-05-30 10:32:47 EDT
+1 for 3.2 RC7
Comment 34 Markus Keller CLA 2006-05-30 10:46:58 EDT
Released 'Simpler fix' for 3.2RC7.
Comment 35 Markus Keller CLA 2006-05-30 10:50:08 EDT
*** Bug 142190 has been marked as a duplicate of this bug. ***
Comment 36 Markus Keller CLA 2006-06-02 09:31:38 EDT
Verified in I20060602-0010.
Comment 37 Markus Keller CLA 2006-06-07 12:04:50 EDT
*** Bug 129925 has been marked as a duplicate of this bug. ***