Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 342053

Summary: jdt.ui.refactoring tests are timing out on Hudson Mac OS X 10.6 slave
Product: [Eclipse Project] JDT Reporter: Kim Moir <kim.moir>
Component: UIAssignee: JDT-UI-Inbox <jdt-ui-inbox>
Status: CLOSED DUPLICATE QA Contact:
Severity: normal    
Priority: P3 CC: daniel_megert, david_williams, deepakazad, markus.kell.r
Version: 3.7   
Target Milestone: ---   
Hardware: PC   
OS: Mac OS X - Carbon (unsup.)   
Whiteboard:
Bug Depends on:    
Bug Blocks: 295393    
Attachments:
Description Flags
Dump stack before killing test VM none

Description Kim Moir CLA 2011-04-06 13:10:49 EDT
https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-JUnit-mac/ws/ws/2011-04-05_13-35-59/eclipse-testing/results/macosx.cocoa.x86_5.0/org.eclipse.jdt.ui.tests.refactoring.all.AllAllRefactoringTests.txt/*view*/

Stops after this test

org.eclipse.jdt.ui.tests.refactoring.RenameNonPrivateFieldTests::testGenerics3 disabled (BUG_79990_CORE_SEARCH_METHOD_DECL)

We use Xvnc to run the tests.
Comment 1 Dani Megert CLA 2011-04-07 02:16:40 EDT
The link gives: Status Code: 404

>Stops after this test
>org.eclipse.jdt.ui.tests.refactoring.RenameNonPrivateFieldTests::testGenerics3
>disabled (BUG_79990_CORE_SEARCH_METHOD_DECL)
Are you sure? Or is this maybe the last thing you see in the log/console?
Comment 3 Markus Keller CLA 2011-04-07 14:46:03 EDT
Created attachment 192770 [details]
Dump stack before killing test VM

The test terminates fine and needs about 55 min on my Core Duo MacMini.

The logs don't reveal any more information, and there are a lot of test that run silently after RenameNonPrivateFieldTests::testGenerics3, so this also doesn't help much. For the records, here's the end of sysout of a successful test run:

...
org.eclipse.jdt.ui.tests.refactoring.RenameNonPrivateFieldTests::testGenerics3 disabled (BUG_79990_CORE_SEARCH_METHOD_DECL)

org.eclipse.jdt.ui.tests.refactoring.ccp.DeleteTest::testDeleteWithinCu16 disabled (testDeleteWithinCu16 disabled for bug#55221)

org.eclipse.jdt.ui.tests.refactoring.ccp.PasteActionTest::test0 disabled (not implemented yet)
------------------------------------------------------------------------------


If you cannot find any more information on the machine, could you try to run with the attached patch applied to the test framework? This should print out some helpful information shortly before the test gets killed by the Ant task (thread dump, SWT focus control, and active shells).

Unfortunately, I don't know how to run this locally, so be aware that the patch is completely untested.
Comment 4 Markus Keller CLA 2011-04-18 10:18:20 EDT
Kim, please let me know if I can do anything here.

If you can tell me how to run the test locally with an updated org.eclipse.test plug-in, then I could verify that the logging on timeout really works. Otherwise, the only way I see to proceed is that you just try out the patch on a hudson build.

If that works, then we can also extend the patch to take a screenshot before the VM is killed (à la Snippet215).
Comment 5 Kim Moir CLA 2011-04-18 10:28:25 EDT
Sorry I missed the bugzilla message with the patch. Thanks Markus, I'll try a test build with it :-)
Comment 6 Kim Moir CLA 2011-04-18 10:47:09 EDT
I've started a build here
https://hudson.eclipse.org/hudson/job/eclipse-equinox-test-N/344/

It will just run tests on the mac and only the jdt.ui.refactoring suite to save time.
Comment 7 Kim Moir CLA 2011-04-18 13:07:42 EDT
Actually, the test build is here.  Had problems with the build failing due to other bugs earlier.

https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-equinox-test-N/347/
Comment 8 Dani Megert CLA 2011-04-19 03:47:09 EDT
(In reply to comment #7)
> Actually, the test build is here.  Had problems with the build failing due to
> other bugs earlier.
> 
> https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-equinox-test-N/347/

No results:
Status Code: 404
Exception:
Stacktrace:

(none)


Generated by Winstone Servlet Engine v0.9.10 at Tue Apr 19 03:46:42 EDT 2011
Comment 9 Kim Moir CLA 2011-04-19 08:02:05 EDT
Yes, that one failed due to a network timeout.

I ran another one last night, try 
https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-JUnit-mac/91/

here is the workspace with the test results
https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-JUnit-mac/ws/ws/2011-04-18_18-38-36/eclipse-testing/results/
Comment 10 Markus Keller CLA 2011-04-19 08:30:27 EDT
(In reply to comment #9)
> here is the workspace with the test results
> https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-JUnit-mac/ws/ws/2011-04-18_18-38-36/eclipse-testing/results/

I checked the logs from that run, but I didn't see the expected stacktraces. It just stopped at the same place as before.

When looking around for more information, I apparently started another build (didn't intend to do that). Looks like this wiped the previous workspace again.

So either my patch was not working, was not installed correctly, or there was another problem that prevented it from dumping the stack before the timeout happened.

Maybe the VM was just running out of memory?
It could also be problem with the 64 bit VM (I only have a 32 bit machine, and regular tests are also running on 32 bit).
Comment 11 Kim Moir CLA 2011-04-19 08:46:44 EDT
Where is the stacktace supposed to go?  Perhaps it resides outside the hudson workspace?  I have access to the machine's filesystem via vnc, let me know and I can try to connect and look for it.
Comment 12 Markus Keller CLA 2011-04-19 09:24:22 EDT
It's printing to System.err, which AFAIK ends up here:
http://download.eclipse.org/eclipse/downloads/drops/N20110418-2000/testresults/consolelogs/macosx.cocoa_consolelog.txt
Comment 13 Kim Moir CLA 2011-04-19 13:10:38 EDT
I reran the tests using the same artifacts from yesterday and they passed.

https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-JUnit-mac/93/testReport/

I happened to be logged into the machine at the same time via VNC to check the java versions (by the way - they are 32 bit, not 64bit).  I wonder if this is an issue.  

I'm going to rerun them but while not logged in via VNC and see if this makes a difference.
Comment 14 Markus Keller CLA 2011-04-19 13:29:46 EDT
Thanks. I could swear I saw x86_64 in the VM name somewhere in the workspace from comment 9, but I can't find this any more.
Comment 15 Kim Moir CLA 2011-04-19 15:49:57 EDT
So I ran another build where I wasn't logged in via Xvnc.  The test timed out.  So I think we can safely assume that the problem isn't with the jdt.ui.tests.refactoring tests but a problem with XVnc on the mac.  There are similar issues with debug, p2 and p2ui tests on the mac.  I've opened bug 343308 for this issue in the releng bucket.  Thanks for all your help troubleshooting this Markus. Not sure where to go from here, not sure why the Mac is behaving this way when the same tests pass on Linux (run the same way, just different OS).

*** This bug has been marked as a duplicate of bug 343308 ***
Comment 16 Markus Keller CLA 2011-04-20 08:18:43 EDT
Yipee, now the stackdumps from the patch (comment 3) actually worked:

https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-JUnit-mac/ws/ws/2011-04-19_13-10-49/eclipse-testing/macosx.cocoa_consolelog.txt contains the stacktrace below.

It looks like the problem is that nobody watches Schrödinger's cat: Display#sleep() waits forever unless a real display client is connected and sends some OS events once in a while.


java-test:
     [echo] Running org.eclipse.jdt.ui.tests.refactoring.all.AllAllRefactoringTests. Result file: /Users/hudsonbuild/workspace/eclipse-JUnit-mac/ws/2011-04-19_13-10-49/eclipse-testing/results/macosx.cocoa.x86_5.0/org.eclipse.jdt.ui.tests.refactoring.all.AllAllRefactoringTests.xml.
     [java] EclipseTestRunner almost reached timeout '7200000'.
     [java] Thread dump at 2011-04-19 12:21:48 -0700:
     [java] java.lang.Exception: Thread-0
     [java] 	at org.eclipse.swt.internal.cocoa.OS.objc_msgSend_bool(Native Method)
     [java] 	at org.eclipse.swt.internal.cocoa.NSRunLoop.runMode(NSRunLoop.java:42)
     [java] 	at org.eclipse.swt.widgets.Display.sleep(Display.java:4562)
     [java] 	at org.eclipse.jface.operation.ModalContext$ModalContextThread.block(ModalContext.java:174)
     [java] 	at org.eclipse.jface.operation.ModalContext.run(ModalContext.java:388)
     [java] 	at org.eclipse.jface.window.ApplicationWindow$1.run(ApplicationWindow.java:759)
     [java] 	at org.eclipse.swt.custom.BusyIndicator.showWhile(BusyIndicator.java:70)
     [java] 	at org.eclipse.jface.window.ApplicationWindow.run(ApplicationWindow.java:756)
     [java] 	at org.eclipse.ui.internal.WorkbenchWindow.run(WorkbenchWindow.java:2642)
     [java] 	at org.eclipse.jdt.ui.tests.refactoring.DocumentChangeTest.testDocumentChange(DocumentChangeTest.java:180)
     [java] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     [java] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
     [java] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     [java] 	at java.lang.reflect.Method.invoke(Method.java:597)
     [java] 	at junit.framework.TestCase.runTest(TestCase.java:168)
     [java] 	at junit.framework.TestCase.runBare(TestCase.java:134)
     [java] 	at junit.framework.TestResult$1.protect(TestResult.java:110)
     [java] 	at junit.framework.TestResult.runProtected(TestResult.java:128)
     [java] 	at junit.framework.TestResult.run(TestResult.java:113)
     [java] 	at junit.framework.TestCase.run(TestCase.java:124)
     [java] 	at junit.framework.TestSuite.runTest(TestSuite.java:243)
     [java] 	at junit.framework.TestSuite.run(TestSuite.java:238)
     [java] 	at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
     [java] 	at junit.extensions.TestSetup$1.protect(TestSetup.java:23)
     [java] 	at junit.framework.TestResult.runProtected(TestResult.java:128)
     [java] 	at junit.extensions.TestSetup.run(TestSetup.java:27)
     [java] 	at junit.framework.TestSuite.runTest(TestSuite.java:243)
     [java] 	at junit.framework.TestSuite.run(TestSuite.java:238)
     [java] 	at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
     [java] 	at junit.extensions.TestSetup$1.protect(TestSetup.java:23)
     [java] 	at junit.framework.TestResult.runProtected(TestResult.java:128)
     [java] 	at junit.extensions.TestSetup.run(TestSetup.java:27)
     [java] 	at junit.framework.TestSuite.runTest(TestSuite.java:243)
     [java] 	at junit.framework.TestSuite.run(TestSuite.java:238)
     [java] 	at junit.framework.TestSuite.runTest(TestSuite.java:243)
     [java] 	at junit.framework.TestSuite.run(TestSuite.java:238)
     [java] 	at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
     [java] 	at junit.extensions.TestSetup$1.protect(TestSetup.java:23)
     [java] 	at junit.framework.TestResult.runProtected(TestResult.java:128)
     [java] 	at junit.extensions.TestSetup.run(TestSetup.java:27)
     [java] 	at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:495)
     [java] 	at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:259)
     [java] 	at org.eclipse.test.UITestApplication$2.run(UITestApplication.java:197)
     [java] 	at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35)
     [java] 	at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:134)
     [java] 	at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:3927)
     [java] 	at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3604)
     [java] 	at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2696)
...
Comment 17 Kim Moir CLA 2011-04-20 13:56:24 EDT
As an aside, I tagged org.eclipse.test so that this patch will be available in 3.7M7.  Thanks Markus, this is a really useful contribution.  I'm running the other tests on the Hudson Mac slave to generate stack traces for the other test suites (p2, p2.ui and debug) that timeout on the Mac hudson slave.
Comment 18 David Williams CLA 2011-05-21 02:40:52 EDT
We started using this too, in WTP unit tests, in our RC2 builds and tonight had our first hang, and it worked well ... still don't know why the test hung :) ... but really appreciate this capability to capture the state of the threads. It should really help track down the many intermittent hangs and timeouts we get. 

Thanks!
Comment 19 Markus Keller CLA 2011-05-22 12:35:30 EDT
Thanks for the flowers. FWIW, all hangs we've found and fixed up to now were due to calls to Display#sleep().

In some cases, this was easy to see because the stacktrace was hanging there. In an other case (bug 342697), the stacktrace looked completely unrelated, but additional tracing in setUp() methods revealed that some tests were delayed by about 30 minutes by calls to Display#sleep(). Those long delays only occurred on one particular Windows machine with no display connected. The delays disappeared as soon as a remote desktop client was connected.