Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 343308 - some tests timeout while running via XVnc on the Mac Hudson slaves
Summary: some tests timeout while running via XVnc on the Mac Hudson slaves
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Releng (show other bugs)
Version: 3.7   Edit
Hardware: PC Mac OS X
: P3 normal (vote)
Target Milestone: 4.2 M7   Edit
Assignee: Platform-Releng-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 342052 342053 (view as bug list)
Depends on:
Blocks: 295393
  Show dependency tree
 
Reported: 2011-04-19 15:45 EDT by Kim Moir CLA
Modified: 2015-02-16 21:21 EST (History)
10 users (show)

See Also:


Attachments
mac log from running p2.ui, p2 and debug tests (111.09 KB, text/plain)
2011-04-21 09:14 EDT, Kim Moir CLA
no flags Details
screen shot of hudson mac config (8.39 KB, image/jpeg)
2011-04-21 09:58 EDT, Kim Moir CLA
no flags Details
debug stack trace (9.54 KB, text/plain)
2011-04-25 13:57 EDT, Kim Moir CLA
no flags Details
debug stack trace from 2011-04-27_14-40-09 build (19.87 KB, text/plain)
2011-04-28 10:24 EDT, Kim Moir CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kim Moir CLA 2011-04-19 15:45:55 EDT
The tests in question are the 

jdt.ui refactoring, debug, p2 and p2ui tests

These tests do pass if they are run on Hudson, and I login remotely via XVnc to the Mac Hudson slave. The timeout if I don't login via vnc as the Hudson user.

Not sure how to fix this at all.
Comment 1 Kim Moir CLA 2011-04-19 15:49:57 EDT
*** Bug 342053 has been marked as a duplicate of this bug. ***
Comment 2 Markus Keller CLA 2011-04-20 08:25:46 EDT
See bug 342053 comment 16. Looks like the problem is that Display#sleep() blocks forever unless a real display client is connected and sends some OS events once in a while:

java-test:
     [echo] Running
org.eclipse.jdt.ui.tests.refactoring.all.AllAllRefactoringTests. Result file:
/Users/hudsonbuild/workspace/eclipse-JUnit-mac/ws/2011-04-19_13-10-49/eclipse-testing/results/macosx.cocoa.x86_5.0/org.eclipse.jdt.ui.tests.refactoring.all.AllAllRefactoringTests.xml.
     [java] EclipseTestRunner almost reached timeout '7200000'.
     [java] Thread dump at 2011-04-19 12:21:48 -0700:
     [java] java.lang.Exception: Thread-0
     [java]     at org.eclipse.swt.internal.cocoa.OS.objc_msgSend_bool(Native
Method)
     [java]     at
org.eclipse.swt.internal.cocoa.NSRunLoop.runMode(NSRunLoop.java:42)
     [java]     at org.eclipse.swt.widgets.Display.sleep(Display.java:4562)
     [java]     at
org.eclipse.jface.operation.ModalContext$ModalContextThread.block(ModalContext.java:174)
     [java]     at
org.eclipse.jface.operation.ModalContext.run(ModalContext.java:388)
     [java]     at
org.eclipse.jface.window.ApplicationWindow$1.run(ApplicationWindow.java:759)
     [java]     at
org.eclipse.swt.custom.BusyIndicator.showWhile(BusyIndicator.java:70)
     [java]     at
org.eclipse.jface.window.ApplicationWindow.run(ApplicationWindow.java:756)
     [java]     at
org.eclipse.ui.internal.WorkbenchWindow.run(WorkbenchWindow.java:2642)
     [java]     at
org.eclipse.jdt.ui.tests.refactoring.DocumentChangeTest.testDocumentChange(DocumentChangeTest.java:180)
Comment 3 Kim Moir CLA 2011-04-20 09:15:05 EDT
So how would you suggest fixing this?  Would it make sense to try to run the tests without VNC, or is there another way around this?
Comment 4 Markus Keller CLA 2011-04-20 09:56:02 EDT
The problem could also be between ModalContextThread#block() and Display#readAndDispatch() and Display#sleep():

ModalContextThread#run() ends with:
	...
	continueEventDispatching = false;

	// Force the event loop to return from sleep () so that
	// it stops event dispatching.
	display.asyncExec(null);
}


ModalContextThread#block() reads the volatile field continueEventDispatching here and then calls readAndDispatch() and sleep():
	while (continueEventDispatching) {
		// Run the event loop. Handle any uncaught exceptions caused
		// by UI events.
		try {
			if (!display.readAndDispatch()) {
				display.sleep();
			}
			exceptionCount = 0;
		}
		...

ModalContextThread#block() currently assumes that either
a) Display#readAndDispatch() returns true iff there was an event in the queue that got processed (e.g. the asyncExec(null)), or
b) Display#sleep() gets interrupted by an asyncExec(null) that has been called right before sleep() or while sleep() is waiting.

If none of these assumptions is correct, then the code above blocks.

If (b) is not correct, then I guess ModalContextThread#run() should not call "display.asyncExec(null);" at the end, but rather call "display.wake()";.

If only (a) is not correct, then a fix could be to turn ModalContextThread#block() line 173 into this:

	if (!display.readAndDispatch() && continueEventDispatching) {

Silenio, what's SWT's take on these assumptions?
Comment 5 Kim Moir CLA 2011-04-21 09:14:47 EDT
Created attachment 193815 [details]
mac log from running p2.ui, p2 and debug tests

The debug tests have the same stack trace that the jdt.ui.refactoring tests had.  The p2 and p2 ui tests have different errors, I'll open a bug with them.
Comment 6 Markus Keller CLA 2011-04-21 09:48:00 EDT
(In reply to comment #3)
> So how would you suggest fixing this?  Would it make sense to try to run the
> tests without VNC, or is there another way around this?

This statement, comment 0, and the bug summary are out of sync. Shouldn't the bug summary be:
some tests timeout while running without XVnc on the Mac Hudson slaves
                                  ^^^^^^^
Comment 7 Kim Moir CLA 2011-04-21 09:58:05 EDT
No.  We run the tests on the Mac slave with the Xvnc plugin enabled. I'll attach a screen shot.
Comment 9 Felipe Heidrich CLA 2011-04-21 10:28:37 EDT
(In reply to comment #4)

> ModalContextThread#block() currently assumes that either
> a) Display#readAndDispatch() returns true iff there was an event in the queue
> that got processed (e.g. the asyncExec(null)), or
> b) Display#sleep() gets interrupted by an asyncExec(null) that has been called
> right before sleep() or while sleep() is waiting.
>
> Silenio, what's SWT's take on these assumptions?

Assumption b) is correct. 

Assumption a) is not on cocoa. It might be the case on Windows. Try this snippet. Only the first asyncExec makes the next readAndDispatch() return true. I believe if you depend on that behaviour, you will need to use 1) instead of 2) or 3).

import org.eclipse.swt.widgets.*;

public class SyncThread {
public static void main(String[] args) {
	final Display display = new Display();
	Shell shell = new Shell(display);

	new Thread() {
		public void run() {
			while (!display.isDisposed()) {
				//1
				display.asyncExec(new Runnable() {
					public void run() {
					}
				});
				
				//2
//				display.asyncExec(null);
				
				//3
//				display.wake();
				try {
					Thread.sleep(500);
				} catch (Throwable e) {}
			}
		}
	}.start();

	shell.pack();
	shell.open();
	while (!shell.isDisposed()) {
		boolean events = display.readAndDispatch();
		System.out.println("loop=" + events);
		if (!events)
			display.sleep();
	}
	display.dispose();
}
Comment 10 Silenio Quarti CLA 2011-04-21 10:39:58 EDT
Previous comment was mine.

I believe we made a effort to guarantee that ayncExec(null) causes the next readAndDispatch() to return true on other platforms. I am going to fix this on cocoa as well.
Comment 11 Silenio Quarti CLA 2011-04-21 10:55:02 EDT
I opened bug#343560 to address the SWT problem. I am not sure those changes will fix this problem. Please close as a duplicate if/once the test run successfully.
Comment 12 Kim Moir CLA 2011-04-21 13:11:52 EDT
Thanks Silenio.  I'm running a test build now.
Comment 13 Michael Rennie CLA 2011-04-25 11:47:48 EDT
*** Bug 342052 has been marked as a duplicate of this bug. ***
Comment 14 Kim Moir CLA 2011-04-25 13:56:58 EDT
The SWT patch fixed the jdt.ui.refactoring tests.  It didn't fix the debug tests.  I'll attach a log.
Comment 15 Kim Moir CLA 2011-04-25 13:57:38 EDT
Created attachment 194009 [details]
debug stack trace
Comment 16 Markus Keller CLA 2011-04-25 14:43:01 EDT
The debug tests contain code like this:

        while (!fListener.isFinished()) if (!fDisplay.readAndDispatch ()) fDisplay.sleep ();

I don't think that's safe to use in an automated test. If there's no UI that sends OS events once in a while, and no other thread calls sync/asyncExec(), the call to Display#sleep() can block forever.

You should use

        while (display.readAndDispatch()) { /*loop*/ }

or add a short Thread.sleep(..) to help other threads proceed more quickly.
Comment 17 Kim Moir CLA 2011-04-25 14:50:39 EDT
Michael,   

Can you try changing the debug tests as Markus describes in comment 16 and I will try to run them again on the Mac as see if this resolves the timeout issue?
Comment 18 Michael Rennie CLA 2011-04-25 17:26:59 EDT
(In reply to comment #17)
> Michael,   
> 
> Can you try changing the debug tests as Markus describes in comment 16 and I
> will try to run them again on the Mac as see if this resolves the timeout
> issue?

Pawel and I will try the suggested changes.
Comment 19 Pawel Piech CLA 2011-04-26 17:08:05 EDT
(In reply to comment #18)
> (In reply to comment #17)
> > Michael,   
> > 
> > Can you try changing the debug tests as Markus describes in comment 16 and I
> > will try to run them again on the Mac as see if this resolves the timeout
> > issue?
> 
> Pawel and I will try the suggested changes.

I updated the tests to use Thread.sleep(0) instead of Display.sleep().  Let's see if this unblocks the tests.
Comment 20 Kim Moir CLA 2011-04-28 10:24:40 EDT
Created attachment 194274 [details]
debug stack trace from 2011-04-27_14-40-09 build

The latest test build on the mac still had the debug tests timeout.

https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-JUnit-mac/ws/ws/2011-04-27_14-40-09/eclipse-testing/results/
Comment 21 Pawel Piech CLA 2011-04-28 11:48:13 EDT
(In reply to comment #20)
> The latest test build on the mac still had the debug tests timeout.

I guess this means that the suggested fix from comment #16 is not sufficient.  Although it seems that other tests in the debug suite that use the same wait loop passed.  

I see no other option but to disable the UI-dependent tests.  Mike, do you agree?
Comment 22 Michael Rennie CLA 2011-04-28 12:07:14 EDT
(In reply to comment #21)
> (In reply to comment #20)
> 
> I see no other option but to disable the UI-dependent tests.  Mike, do you
> agree?

Yes I agree, we could move them to the 'run locally only' bucket like we do for the the eval test suite for the JDT debug tests.
Comment 23 Markus Keller CLA 2011-04-28 12:24:53 EDT
The new stacktraces look like it's now blocking in Display#readAndDispatch(). That shouldn't happen.

    [java] java.lang.Exception: Thread-0
     [java] 	at org.eclipse.swt.internal.cocoa.OS.objc_msgSendSuper(Native Method)
     [java] 	at org.eclipse.swt.widgets.Display.applicationNextEventMatchingMask(Display.java:4864)
     [java] 	at org.eclipse.swt.widgets.Display.applicationProc(Display.java:5211)
     [java] 	at org.eclipse.swt.internal.cocoa.OS.objc_msgSend(Native Method)
     [java] 	at org.eclipse.swt.internal.cocoa.NSApplication.nextEventMatchingMask(NSApplication.java:94)
     [java] 	at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3596)
     [java] 	at org.eclipe.debug.tests.viewer.model.ContentTests.testLabelUpdatesCompletedOutOfSequence1(ContentTests.java:191)
Comment 24 Pawel Piech CLA 2011-04-28 12:28:54 EDT
(In reply to comment #22)
> (In reply to comment #21)
> > (In reply to comment #20)
> > 
> > I see no other option but to disable the UI-dependent tests.  Mike, do you
> > agree?
> 
> Yes I agree, we could move them to the 'run locally only' bucket like we do for
> the the eval test suite for the JDT debug tests.

For now I've moved the UI-dependent tests out of the automated suite.  However, I left the tests which don't depend on the jface viewers but still run the display dispatch loop.  So if those tests may still block in readAndDispatch() if there's a problem there.
Comment 25 Markus Keller CLA 2012-05-09 13:21:14 EDT
David, I don't know how the Mac tests are currently running on Hudson, but maybe they need "Run Xvnc during build" checked (comment 8)?
Comment 26 David Williams CLA 2012-05-12 03:22:18 EDT
I think you are right. I would have sworn I'd check that on all the boxes/jobs, but had only on the linux ones.
Comment 27 David Williams CLA 2015-02-16 21:21:13 EST
Given the age of this bug, and reading through the comments, especially the last comment, I suspect this was fixed long long ago and have arbitrarily picked "4.2 M7" meaning mostly "fixed during 4.2" (which is my best guess).