Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 369873 - Display.post test failure in latest build
Summary: Display.post test failure in latest build
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: SWT (show other bugs)
Version: 3.8   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 4.3 M3   Edit
Assignee: Platform-SWT-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 381873
  Show dependency tree
 
Reported: 2012-01-26 16:56 EST by Silenio Quarti CLA
Modified: 2012-10-15 08:54 EDT (History)
6 users (show)

See Also:


Attachments
Device manager screenshot (18.67 KB, image/png)
2012-08-31 15:55 EDT, Denis Roy CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Silenio Quarti CLA 2012-01-26 16:56:20 EST
We are failing in the Display.post test on WinXP. We have tried running this test locally and on the the actual test machine and it consistently passes. We are wondering if something has changed in the machine or in how the tests are run.

=======================

junit.framework.AssertionFailedError: null
at org.eclipse.swt.tests.junit.Test_org_eclipse_swt_widgets_Display.test_postLorg_eclipse_swt_widgets_Event(Test_org_eclipse_swt_widgets_Display.java:724)
at org.eclipse.swt.tests.junit.Test_org_eclipse_swt_widgets_Display.runTest(Test_org_eclipse_swt_widgets_Display.java:1175)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:501)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:259)
at org.eclipse.test.CoreTestApplication.runTests(CoreTestApplication.java:36)
at org.eclipse.test.CoreTestApplication.run(CoreTestApplication.java:32)
at org.eclipse.equinox.internal.app.EclipseAppContainer.callMethodWithException(EclipseAppContainer.java:587)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:198)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:352)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:624)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:579)
at org.eclipse.equinox.launcher.Main.run(Main.java:1433)
at org.eclipse.equinox.launcher.Main.main(Main.java:1409)
at org.eclipse.core.launcher.Main.main(Main.java:34)
Comment 1 Markus Keller CLA 2012-08-30 05:58:34 EDT
This test still fails in all branches (master, 4.2.1, 3.8.1) on hudson.eclipse.org, but passes locally.

It looks like the Hudson slave that runs the tests has some setup problems. I don't know anything about Windows remote access or whatever technology Hudson uses to control the slave, but looking at the screenshots I took for bug 379026, maybe the problem is similar to the missing window manager that caused trouble on GTK.

I've added org.eclipse.ui.workbench.texteditor.tests.ScreenshotTest that tries to gather interesting information by taking screenshots, but all screenshots are just 800x600px of light gray (248,248,248).

I'm sure that my screenshot capturing code works fine, since it works locally and it also works on Hudson on Mac and Linux.
Comment 2 David Williams CLA 2012-08-30 18:27:03 EDT
Its my understanding there is no "window manager" for windows, and, on Hudson, essentially all Apps/tests are running in literally the same Display. I think the failure to get a screen capture might have been related to all the cases we tested (sort of by chance) all had no workspace open yet?) 


Seeing the code for the test that fails: 

// Test key events (down/up)
event = new Event();
event.type = SWT.KeyDown;
event.keyCode = -1;  				// bogus key code
assertTrue(display.post(event));	// uses default 0 character
// don't test KeyDown/KeyUp with a character to avoid sending to 
// random window if test shell looses focus

Could it be that windows 7 doesn't like the "bogus key code"? (Where maybe before, for XP, it didn't care?) 

Could it be, you'd get this effect if there was literally no keyboard (or mouse) attached to that machine? (Though, seems there are similar cases, just not with "bogus key code"?) 

I think this test would be a good one to disable, until if/when someone could sort it out (does not seem to be testing a critical function?)
Comment 3 Markus Keller CLA 2012-08-31 11:04:08 EDT
(In reply to comment #2)
> Its my understanding there is no "window manager" for windows, and, on
> Hudson, essentially all Apps/tests are running in literally the same
> Display. I think the failure to get a screen capture might have been related
> to all the cases we tested (sort of by chance) all had no workspace open
> yet?) 

I'm 100% sure that a workbench window was open in the screenshots I took here:
    download.eclipse.org/eclipse/downloads/drops4/N20120830-2000/logs.php
Search for "ScreenshotTest" and compare the PNGs from other platforms.

> Could it be that windows 7 doesn't like the "bogus key code"? (Where maybe
> before, for XP, it didn't care?) 

No, I added a test that posts valid events (e.g. tries to wiggle the mouse before taking a screenshot):
http://git.eclipse.org/c/platform/eclipse.platform.text.git/tree/org.eclipse.ui.workbench.texteditor.tests/src/org/eclipse/ui/workbench/texteditor/tests/ScreenshotTest.java

testWindowsTaskManagerScreenshots() even presses Ctrl+Shift+Esc on Windows to open the Windows Task Manager, but that also didn't succeed. All display.post(event) calls returned false.

> I think this test would be a good one to disable, until if/when someone
> could sort it out (does not seem to be testing a critical function?)

I agree that this test is not critical, but it shows that something is wrong with the setup of the Windows test environment.

As I said, I don't know how the Windows slave is set up, so I can only speculate on the reasons. Maybe the problem is that Windows is too smart and doesn't run a complete UI session as long as nobody is looking at the screen. I've read that it needs a VNC or RDC client connected to go the full way.
Comment 4 Silenio Quarti CLA 2012-08-31 11:24:24 EDT
(In reply to comment #2)
> Could it be that windows 7 doesn't like the "bogus key code"? (Where maybe
> before, for XP, it didn't care?) 

The test passes on my local Windows 7 machine.

> 
> Could it be, you'd get this effect if there was literally no keyboard (or
> mouse) attached to that machine? (Though, seems there are similar cases,
> just not with "bogus key code"?) 
> 
> I think this test would be a good one to disable, until if/when someone
> could sort it out (does not seem to be testing a critical function?)

I temporarily removed the test that sends a bogus key code, but I suspect it is going to fail in the next test that sends a good key code.

Another reason this could be failing is that we are not allowed to generate fake events due to security. Maybe the connection we have to the machine is restricted some how.
Comment 5 David Williams CLA 2012-08-31 11:44:11 EDT
> 
> As I said, I don't know how the Windows slave is set up, so I can only
> speculate on the reasons. Maybe the problem is that Windows is too smart and
> doesn't run a complete UI session as long as nobody is looking at the
> screen. I've read that it needs a VNC or RDC client connected to go the full
> way.

Thanks for the extra information. I tend to agree, something isn't quite right. I tried an internet search for "hudson windows slaves UI tests" and didn't find anything current (just lots of old problems doing it :) 

I temporarily had RDP access to the machine to figure out bug 372880 and noticed at that time, my RDP connection would always be dropped after 3 to 15 seconds (so, I had to work quick! :) Always assume that "was me", but might be another sign something isn't working as expected. 

Webmasters, do you have a reference or "how to" you set up the Windows Slave? Perhaps that's help someone evaluate the setup and/or try to replicate the setup or issue?
Comment 6 David Williams CLA 2012-08-31 15:45:01 EDT
FWIW, that windows slave instance is "virtualized" ... I asked Denis some details and he said "The Windows slave is running as a virtual machine on a SLES 11 server with Xen.  Similar in concept to Windows running on a VMWare host."

As far as I know, that should not effect Eclipse or these tests, in theory, but I can imagine it adds a layer of complication, especially in trying to reproduce on a "similar system" since I doubt we could set that up easily? 

I'm just passing this on as "FYI" in case Silenio or Markus (or, someone else) knows of some specific complications this causes? I myself don't know enough about "virtualization" to be much help, but is perhaps another case where some expert needs to "gain direct access", temporarily, to help diagnose?
Comment 7 Denis Roy CLA 2012-08-31 15:55:56 EDT
Created attachment 220623 [details]
Device manager screenshot

> but is perhaps another case
> where some expert needs to "gain direct access", temporarily, to help
> diagnose?

FWIW, "direct access" is via RDP or VNC.  Attacking a screen, keyboard and mouse to the server will give you a Linux prompt.  We actually installed Windows over VNC if memory serves me correctly.

But as far as Windows is concerned, all the hardware it needs is there and is functional.
Comment 8 Markus Keller CLA 2012-08-31 16:58:44 EDT
Yeah, I agree the virtualization shouldn't matter. I think the problem is more in the area of how Hudson starts the test script on the slave. 

If it starts the script like a real Windows user (log in to Windows desktop via RDP or VNC and then start the script), then I have no idea why it would fail.

But if Hudson has a service running on the machine and then launches the test script in a separate user account, then nobody sees the screen of that user session, and Windows may think it can take shortcuts. Is there a way to connect to the session that runs the tests and see the screen? I bet this problem disappears as soon as somebody watches.

Bug 328952 comment 8 sounds like exactly the same problem.
Comment 9 Markus Keller CLA 2012-09-03 09:21:18 EDT
I just reproduced the "false" result from Display#post(..) and the blank screenshot on my Windows 7 machine: I connected via RDP, started a test, and then immediately closed the RDP client (i.e. leaving no screen session connected).

=> The issue is indeed that Windows stops drawing and processing certain UI events when no screen session is open.

Denis: Could another eclipse.org machine open an RDP connection to the Windows slave (using the Hudson slave user) and keep it open?
Comment 10 David Williams CLA 2012-09-04 00:01:15 EDT
(In reply to comment #9)
> I just reproduced the "false" result from Display#post(..) and the blank
> screenshot on my Windows 7 machine: I connected via RDP, started a test, and
> then immediately closed the RDP client (i.e. leaving no screen session
> connected).
> 
> => The issue is indeed that Windows stops drawing and processing certain UI
> events when no screen session is open.
> 
> Denis: Could another eclipse.org machine open an RDP connection to the
> Windows slave (using the Hudson slave user) and keep it open?

Thanks for the concrete investigation. But for my own clarity, it this the case because someone logs in to the Windows machine with an ID other than "hudsonbuild" starts the hudson slave, and then Hudson runs tests under the user hudsonbuild? I'm just wondering if it all started with webmaster logging in as hudsonbuild, and then started the slave agent, if then when "hudsonbuild" ran a test the Display would be "active"? Or is this unrelated to who starts the agent?
Comment 11 Markus Keller CLA 2012-09-04 06:02:46 EDT
I did not experiment with a second user, but I don't think that matters.

The important point in my scenario that I didn't write down explicitly is that when I connect via RDP, then my screen session on the real machine is automatically closed, and I see the Windows "Press CTRL+ALT+DELETE..." screen. My login session (running programs) is still open, but I don't see it any more. When I log in again on the real machine, the session resumes and the RDP connection is closed.

(In reply to comment #10)
I guess the scenario on the Hudson slave is a bit different but ends up in the same state, namely that a user is logged in and runs tests, but no (virtual or real) screen is connected. Under Hudson, the slave starts the tests without ever opening a screen session; in my case, I start the tests and then disconnect.

Note that just locking the screen on a real machine (Ctrl+Alt+Delete, Enter) does not have the same effect as stealing the connection via RDP. With a local screen lock, the local screen session is still there, but via RDP, it's gone.
Comment 12 David Williams CLA 2012-09-04 15:33:20 EDT
FWIW, see bug 388710. The windows slave was restarted today, our I build tests are running and Denis "sees all kinds of Eclipse stuff going on". We'll see how tests compare with last week. [If different, I'd say depends on "how started". If same, we may have to arrange time and place to run one specific test while our remote agent (i.e. Denis or Matt :) watches the screen :) 

If you are patient (or, is it impatient? :) you can check directly on Hudson under workspace results, but hard to know when relevant test will run so have to check back frequently. 

https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/JUnit-win2/ws/workarea/I20120904-0800/eclipse-testing/results/html/
Comment 13 David Williams CLA 2012-09-05 00:37:45 EDT
The test results from today's I build are at 

http://download.eclipse.org/eclipse/downloads/drops4/I20120904-0800/testResults.php

and seemed there were no SWT failures (on windows). I know you were going to disable that one "post event" test, but apparently none of the others failed? 

Markus, is there any "screen shots" built in you can check or do we have to run the tests with real short time-out to test that function again? 

In any case, if things are better ... we need to make sure we know how Denis started the machine and how "left it running"? (e.g. assuming we can't have screen saver, or "blank the screen" to save power?) [I'm guessing, just hoping we can find a path that works].
Comment 14 Markus Keller CLA 2012-09-05 05:48:05 EDT
The last test run was a full success. SWT's test_postLorg_eclipse_swt_widgets_Event was not disabled and it passed for the first time on Hudson!

I kept ScreenshotTest.testWindowsTaskManagerScreenshots() in org.eclipse.ui.workbench.texteditor.tests, and this one also succeeded, see
http://download.eclipse.org/eclipse/downloads/drops4/I20120904-0800/testresults/win32.win32.x86_7.0/org.eclipse.ui.workbench.texteditor.tests.ScreenshotTest.testWindowsTaskManagerScreenshots1.png and
http://download.eclipse.org/eclipse/downloads/drops4/I20120904-0800/testresults/win32.win32.x86_7.0/org.eclipse.ui.workbench.texteditor.tests.ScreenshotTest.testWindowsTaskManagerScreenshots2.png

=> Screenshots worked, and my Display.post(..) events that pressed Ctrl+Shift+Esc also worked.

> In any case, if things are better ... we need to make sure we know how Denis
> started the machine and how "left it running"? (e.g. assuming we can't have
> screen saver, or "blank the screen" to save power?) [I'm guessing, just
> hoping we can find a path that works].

+1. I just tested the behaviors when the screensaver runs and when I lock the screen (Win+L): In both cases, Display.post(..) returns false (does *not* succeed), but the GC.copyArea(..) I use to take screenshots still works.
Comment 15 Denis Roy CLA 2012-09-05 06:38:31 EDT
> The test results from today's I build are at 
> 
> http://download.eclipse.org/eclipse/downloads/drops4/I20120904-0800/
> testResults.php
> 
> and seemed there were no SWT failures (on windows).


I had left my RDC connection open all night, and I could see Eclipse and all the tests run.

I have just now disconnected my RDC client, but as usual, the hudson build user is logged in.
Comment 16 David Williams CLA 2012-09-05 21:49:49 EDT
Seems there is (and was) a post event failure on 3.8 builds, 

test_postLorg_eclipse_swt_widgets_Event

download.eclipse.org/eclipse/downloads/drops/M20120829-1000/testresults/html/org.eclipse.swt.tests_win32.win32.x86_7.0.html

So, I think next question is a) are we sure there is no "screen saver" or "power saver" set on the Windows display? (The screen capture tests have not ran, yet, to compare with Markus's experiments). 

And/or, we need to find a way to leave RDP (RDC?) connection open all the time? to avoid any doubt? 
[I'd offer, but as I previously mentioned, even when I could, the connection would stay open only a few seconds, so suspect something "closer" to real box is needed?
Comment 17 Denis Roy CLA 2012-09-05 21:57:01 EDT
(In reply to comment #16)
> Seems there is (and was) a post event failure on 3.8 builds, 
> 
> test_postLorg_eclipse_swt_widgets_Event

Did that failure occur while I had the RDP connection open or after I had closed it?



> So, I think next question is a) are we sure there is no "screen saver" or
> "power saver" set on the Windows display?

I have not seen any in the 12-or-so hours I had an RDP connection opened.


> And/or, we need to find a way to leave RDP (RDC?) connection open all the
> time? to avoid any doubt?

I'm willing to investigate a solution if we can conclude with some certainty that an opened RDP connection is required.
Comment 18 David Williams CLA 2012-09-05 22:13:05 EDT
> 
> I'm willing to investigate a solution if we can conclude with some certainty
> that an opened RDP connection is required.

Assuming you leave it closed :) we should know for sure Thursday afternoon, once all (same) tests run (again). 

Thanks Denis.
Comment 19 David Williams CLA 2012-09-06 15:28:32 EDT
> Assuming you leave it closed :) we should know for sure Thursday afternoon,
> once all (same) tests run (again). 

I was wrong, we won't know until Friday (the important test(s) run only in our nightly and I builds.)
Comment 20 Denis Roy CLA 2012-09-06 16:04:43 EDT
> I was wrong, we won't know until Friday (the important test(s) run only in
> our nightly and I builds.)

How will we know Friday?  If the tests have been failing, wouldn't we want to try keeping the session opened?
Comment 21 David Williams CLA 2012-09-06 16:20:00 EDT
(In reply to comment #20)
> > I was wrong, we won't know until Friday (the important test(s) run only in
> > our nightly and I builds.)
> 
> How will we know Friday?  If the tests have been failing, wouldn't we want
> to try keeping the session opened?

I just meant we wouldn't be able to "prove" it was due to the session being closed until we ran the tests again, which will be tonight (with the session closed). 

At this point, I think it is either a) we need the session, period. Or b) the session prevents "screen saver" from coming on or "power saver" shutting down the display. 

Our preference would be just to have the RDP open/connected/visible all the time and stop worry about it ... but ... thought you wanted more "proof" ... more "certainty that an opened RDP connection is required"
Comment 22 Denis Roy CLA 2012-09-06 16:31:03 EDT
I am confused.  So everything worked well when the RDC session stayed open?  Is it just the one test that is failing (out of 5000+)?

I ask because maintaining an open RDC session permanently may be difficult.
Comment 23 David Williams CLA 2012-09-06 17:00:42 EDT
(In reply to comment #22)
> I am confused.  So everything worked well when the RDC session stayed open? 

Yes, everything worked well for first time ever on  build.eclipse.org when connection stayed open (of these several cases ... still unrelated problems on build.eclipse.org. 

> Is it just the one test that is failing (out of 5000+)?

Several tests fail outright 3 or 5?, but, the way they are set up, once one fails, it stops trying to do others in that particular "test case" so not sure what the exact count it. It might be hundreds (not thousands) ... just to give an order of magnitude, I'm guessing. 

> I ask because maintaining an open RDC session permanently may be difficult.

If only we had an X-server for windows :)
Comment 24 Denis Roy CLA 2012-09-06 17:04:36 EDT
> Yes, everything worked well for first time ever on  build.eclipse.org when
> connection stayed open (of these several cases ... still unrelated problems
> on build.eclipse.org. 

Ah.


> 
> > Is it just the one test that is failing (out of 5000+)?
> 
> Several tests fail outright 3 or 5?

Ah.


> > I ask because maintaining an open RDC session permanently may be difficult.
> 
> If only we had an X-server for windows :)

We can install something like the free version of Xming?
Comment 25 Markus Keller CLA 2012-09-06 17:20:17 EDT
Yes, the open RDC session fixes two problems that only show up on Hudson.

I am already now convinced that the RDC session was the fix, and I don't need any further proofs. I reproduced this locally (comment 9) and it was also shown on Hudson (comment 14). The failure mentioned in comment 16 occurred after comment 15 was posted (I checked the timestamps). I don't think a screensaver or other power saver is currently an issue.

The two problems are:

a) Display.post(..) always returns false:
This indeed only affects one test at this time. But that's because many SWT tests that would use this are disabled (SwtTestCase.fTestConsistency = false), and because we currently don't run the performance tests, which heavily depend on this feature.

Display.post(..) is the API that allows for automated GUI tests that simulate real keyboard and mouse events.

b) GC.copyArea(..) only captures a light gray instead of a full screenshot.
The screenshots proved helpful in tracking down problems that could not be reproduced locally.


> I ask because maintaining an open RDC session permanently may be difficult.

It could probably also be VNC if that's easier. We just need something that convinces Windows that somebody could see the screen. The RDP server is apparently too smart for us. But running a stupid VNC server could already be enough.
Comment 26 David Williams CLA 2012-09-06 19:53:15 EDT
> 
> We can install something like the free version of Xming?

Not familiar with xming, and sounds very putty oriented? 
I'm more familiar with x11vnc and in fact think its already in use on the Mac. Might check if its on windows already? Not sure if it "just running" is enough, and doubt that Hudson is integrated (with any Windows VNC, from the little bit I've read) but maybe there is some way we can "connect" at beginning of our tests? (and "disconnect" when done). But, not sure if "connecting to vnc from a headless build" fools it into thinking someone is looking at screen. But maybe. [And, not sure why we couldn't do that with RDP, if we could do it with VNC). 


(And, of course, I know nothing of security issues it involves ... guess could do cert-only SSH connection from build ID only from internal network?)
Comment 27 Denis Roy CLA 2012-09-18 14:08:44 EDT
I've got an open VNC connection to the Windows slave from the host server itself.  This is suitable for a permanent solution if it works.
Comment 28 Denis Roy CLA 2012-09-18 14:17:10 EDT
> We just need something that convinces Windows that somebody could see the screen.

While the VNC connection was opened, Matt and I connected as "Administrator" via RDP.  Windows told us that someone else was indeed logged on, and that if we continued, that user would be logged off.

We canceled, but I think Windows is convinced that someone can see the screen.
Comment 29 Markus Keller CLA 2012-09-18 14:29:13 EDT
Sounds very promising, thanks!

The result can be seen on the Console Output Logs page for I-builds: Search for testWindowsTaskManagerScreenshots2.png on that page. Here, it's still empty:
http://download.eclipse.org/eclipse/downloads/drops4/I20120917-0800/logs.php
On success, it would show a shiny Windows Task manager.
Comment 30 Denis Roy CLA 2012-09-19 14:00:34 EDT
> On success, it would show a shiny Windows Task manager.

I guess we are not successful?  All I see is a white screen.
Comment 31 Markus Keller CLA 2012-09-19 16:44:24 EDT
On the contrary: The VNC solution is a full success!

My link was to I20120917-0800, which happened before you opened the VNC connection.

The Windows results for the next build (I20120919-0330) are not yet available on the download page, but I see a good screenshot when looking at Hudson right now (note that this link will expire as soon as the next test run starts):
https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/JUnit-win2/ws/workarea/I20120919-0330/eclipse-testing/results/win32.win32.x86_7.0/org.eclipse.ui.workbench.texteditor.tests.ScreenshotTest.testWindowsTaskManagerScreenshots2.png
Comment 32 Markus Keller CLA 2012-10-15 06:23:33 EDT
(In reply to comment #27)
> I've got an open VNC connection to the Windows slave from the host server
> itself.  This is suitable for a permanent solution if it works.

This makes the tests pass, closing this bug.
Comment 33 David Williams CLA 2012-10-15 08:54:57 EDT
(In reply to comment #32)
> (In reply to comment #27)
> > I've got an open VNC connection to the Windows slave from the host server
> > itself.  This is suitable for a permanent solution if it works.
> 
> This makes the tests pass, closing this bug.

It turns out that having a VNC connection open on one of these test boxes appears to slow down the whole machine considerably. See bug 389857 comment 57. 
We may have to reconsider this VNC connection requirement if it is done primarily for this one test.