Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 401307 - Test failures in I20130219-1600 due to missing window manager
Summary: Test failures in I20130219-1600 due to missing window manager
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Releng (show other bugs)
Version: 4.3   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: 4.3 M6   Edit
Assignee: David Williams CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-20 08:08 EST by Dani Megert CLA
Modified: 2013-05-24 09:26 EDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dani Megert CLA 2013-02-20 08:08:34 EST
The window manager is missing or died.
Comment 2 Markus Keller CLA 2013-02-20 08:27:02 EST
http://download.eclipse.org/eclipse/downloads/drops4/I20130219-1600/testresults/consolelogs/linux.gtk.x86_64_6.0_consolelog.txt says:

---------------------------------------------------------------------------
Check if any window managers are running (xfwm|twm|metacity|beryl|fluxbox|compiz):
Window Manager processes: wtpBuild 25007     1  0 Feb11 ?        00:00:09 metacity --display=:9 --replace --sm-disable
55011    29040     1  0 12:14 ?        00:00:01 metacity --replace --sm-disable

Existing window manager found running, so did not force start of metacity

Current metacity processes running (check for accumulation):
wtpBuild 25007     1  0 Feb11 ?        00:00:09 metacity --display=:9 --replace --sm-disable
55011    29040     1  0 12:14 ?        00:00:01 metacity --replace --sm-disable
---------------------------------------------------------------------------

But later, there are console entries like this:

Xlib:  extension "RANDR" missing on display ":100.0".


Maybe you were running tests in parallel on the same machine but different displays (:9 vs. :100)? Does the check at the beginning of the test script ensure the window manager process is running for the right display? If the test script doesn't start a window manager because one is running on another display, then our tests can be executed without a window manager.
Comment 3 David Williams CLA 2013-02-20 15:02:48 EST
> But later, there are console entries like this:
> 
> Xlib:  extension "RANDR" missing on display ":100.0".
> 
> 
> Maybe you were running tests in parallel on the same machine but different
> displays (:9 vs. :100)? Does the check at the beginning of the test script
> ensure the window manager process is running for the right display? If the
> test script doesn't start a window manager because one is running on another
> display, then our tests can be executed without a window manager.

The check will start metacity if none are found, but, that logic is really just a hold over ... from when I used to run UI tests using only xvfb. For it, you did have to start it in a specific display, and start your own window manager. 

For Xvnc, that Hudson uses, it is supposed to manage all that for you, drawing from a pool of Display's. I imagine if we followed it, we'd see a different "display" number each time. 

So, my guess is, it just died this run, for some reason? 

Do you want me to re-run those linux tests? It should simply "replace" the test results from that I20130219-1600 build.
Comment 4 David Williams CLA 2013-02-20 15:44:35 EST
> 
> Do you want me to re-run those linux tests? It should simply "replace" the
> test results from that I20130219-1600 build.

Since we want a "final PDE build" to be relatively clean, I have restarted those tests. So, if you specifically do NOT want the current test results replaced, be sure to say to in the next 4 or 5 hours!
Comment 5 Markus Keller CLA 2013-02-20 16:12:05 EST
Either way is fine for me.

I don't know whether other builds are running on the same machine, nor whether you can control that. I still assume this is the root of the problem.

For all other recent builds (I, N, CBI), the GTK log said:
"No window managers processes found running, so will start metacity"

So it looks like it's still necessary to start a window manager manually. Maybe runtests.sh should use "wmctrl -m" to see if a window manager is running. But I don't know whether that also lists all running WMs or just the one for the current $DISPLAY.
Comment 6 Dani Megert CLA 2013-02-21 09:58:16 EST
(In reply to comment #4)
> > 
> > Do you want me to re-run those linux tests? It should simply "replace" the
> > test results from that I20130219-1600 build.
> 
> Since we want a "final PDE build" to be relatively clean, I have restarted
> those tests. So, if you specifically do NOT want the current test results
> replaced, be sure to say to in the next 4 or 5 hours!

Did they run? The tests are still indicated as failed.
Comment 7 David Williams CLA 2013-02-21 10:21:24 EST
(In reply to comment #6)
> (In reply to comment #4)
> > > 
> > > Do you want me to re-run those linux tests? It should simply "replace" the
> > > test results from that I20130219-1600 build.
> > 
> > Since we want a "final PDE build" to be relatively clean, I have restarted
> > those tests. So, if you specifically do NOT want the current test results
> > replaced, be sure to say to in the next 4 or 5 hours!
> 
> Did they run? The tests are still indicated as failed.

They ran, but did not finish before "build.eclipse.org" (and Hudson) started having trouble due to hardware failure. I plan to try again later today, but unless you say otherwise, don't plan to delay moving to cbi builds.
Comment 8 David Williams CLA 2013-02-21 10:24:54 EST
(In reply to comment #5)

> 
> So it looks like it's still necessary to start a window manager manually.
> Maybe runtests.sh should use "wmctrl -m" to see if a window manager is
> running. But I don't know whether that also lists all running WMs or just
> the one for the current $DISPLAY.

wmctrl looks interesting and potentially useful, but doesn't seem to be part of "standard installs". So, I hate to ask for it to be installed on each Hudson, unless this turns out to be a frequent problem.
Comment 9 David Williams CLA 2013-02-23 23:21:57 EST
From the number of failures, I see that build I20130222-2000 had similar problem. 

http://download.eclipse.org/eclipse/downloads/drops4/I20130222-2000/testResults.php

For the next build, I tried changing slaves from '2' , to '1' and see if it makes a difference ... but a) that machine seems very slow b) its initial log indicated one was "already running", so I'd expect it to be a problem again. 

I think I'll try two things. I guess in two steps, just to be systematic. 

First, I'll put a simply 'env' in that file so all environment variables are captured in log. Maybe that'll give some insight. 

Second, what I think might solve it, is not to "check if one is running", but to simply always call it, but without the --replace option. That way, if there really is one running, metacity call should fail saying there already is one for that display. Else it will start one up. 

I suppose the only thing that won't solve, is if there actually is a window manger running, even for our DISPLAY, such as twm, and it is just that twm is inadequate for our tests? If that is the case, then we will need to "replace", in which case a check of ENV would help (make sure we have a DISPLAY) and perhaps even let metacity fail once, and if it fails, then call it with --replace. 

Besides removing the "if" logic, the call would become
metacity --display=$DISPLAY --sm-disable  &

I'm sure "current display" is default, I'm just including it so it shows up in later ps query. 

Oh, I also turned on "capture screen at end", in Hudson. That only stays on build machine (currently) but might give some insight if there is something funny about our exiting state ... such as, I wonder if "we" are leaving a window open, even after our tests complete? 

Comments welcome. 

The "env" log should be in 0224-2000 build.
Comment 10 David Williams CLA 2013-02-23 23:34:49 EST
Meant to mention ... I do see these failures on my home setup too. But, doesn't seem like every time (not really studied that systematically). And my "env" is fairly different than build.eclipse.org (Hudson 3.0, Ubuntu 12/04) and from the console log, I think the "check if already running logic" is simply the wrong approach for Xvnc environment on Hudson (Its obviously listing my "desktop", but the tests do not run on my desktop ... I never see them :) -- unless I turn off Xvnc. If I have time, I'll try the new logic on my home system, and if seems to work, then put in the fixes for Sunday night's build. 

= = = = 

Check if any window managers are running (xfwm|twm|metacity|beryl|fluxbox|compiz):
Window Manager processes: davidw    3334  3262  0 02:32 ?        00:11:08 compiz
davidw    3456  3334  0 02:32 ?        00:00:00 /bin/sh -c /usr/bin/compiz-decorator

Existing window manager found running, so did not force start of metacity

Current metacity processes running (check for accumulation):

Triple check if any window managers are running (at least metacity should be!):
Window Manager processes: davidw    3334  3262  0 02:32 ?        00:11:08 compiz
davidw    3456  3334  0 02:32 ?        00:00:00 /bin/sh -c /usr/bin/compiz-decorator
Comment 11 David Williams CLA 2013-02-24 15:03:33 EST
The changes I tried didn't help the tests (on my local machine) ... 

http://git.eclipse.org/c/platform/eclipse.platform.releng.aggregator.git/commit/?id=bb5ca7de83ecf9e2b3355c533aeec1194b5329a8

But might help the diagnosis on Hudson. (And ... might even help there?) 

First try, just listing the variables, and take at a screen shot at end, didn't change tests much (as expected), but the final screen shot had a "pure console" with a message about "not being able to run the screen saver". Thought that might be a clue, so turned off screen saver completely. 

Net run, where I also started a WM without --replace did in fact start one (i.e. none were running) ... but, still has same failures. This time, though, the "final screen shot" had a clear "desktop" in the image, with a modal dialog about "you need to enter your password for keyring". (Often the case for "VNC sessions".) In theory, that might have been "left over" from before? When trying to unlock screen saver or something. In both runs, it was using "Display :10" on my local system. 

So ... I'll leave the "runTests" script as is, and see if we get better diagnostics, if nothing else. (I do think its "more correct" the way it is). 
The "runTests" script is only the one we use on production machine, not the one that's "shipped" in test framework zip. 
I also recalled why I wrote code like this to begin with ... In the past, when running tests on my own machine, not using XVnc or xvfb, the window manager would always be replaced, even if I already had one, like "unity" running, so even if I could then see the tests run ... it'd "mess up" my desktop until I could restart. 

Hope this analysis helps. 
Advice welcome.
Comment 12 David Williams CLA 2013-02-25 03:02:43 EST
GRRRR ... I forgot, Hudson slaves do not support "take snapshot at end"  (bug 389378) and Hudson has the most unfriendly reaction to "fail the build"! (bug 389451).

At this time of night, easier for me to restart with that turned off, rather than "tweeze out" the test results by hand, but, there were 191 fewer failures, so I assume simply staring WM (without --replace) helped. 

You might peek at Hudson's overall log. 
https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/ep4-unit-lin64/505/consoleFull

With so many "gnome errors" is there a better window manager to use (in my experience, though, these are pretty common ... usually not so many ... but then again, I'm not running 1000's of windows :). 

You can see our normal "console log" directly on Hudson, 

https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/ep4-unit-lin64/505/artifact/workarea/I20130224-2000/eclipse-testing/results/consolelogs/linux.gtk.x86_64_6.0_consolelog.txt

And, even the test results
https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/ep4-unit-lin64/505/testReport/

If you want a "quick peek" before the next run completes and is published in familiar summary form.
Comment 13 Dani Megert CLA 2013-02-26 11:03:32 EST
Looks good on latest CBI build. Closing for now. If it happens again, we can find a more durable solution.
Comment 14 David Williams CLA 2013-02-26 17:02:28 EST
For the record, that problem I was seeing on my local machine (running its own version of Hudson) the default install of Hudson (all I ever use :) "automatically" finds my ordinary VNC which, while configured to be accessible only from local host (no pw) it is configured to start my normal desktop, which is why the "key ring password" was being required. 

Just so I better know what to look for in future.
Comment 15 David Williams CLA 2013-03-09 14:36:06 EST
FYI, for 0309-1500 build I added a "bit bucket pipe" for metacity, in hopes it will prevent the many (thousands) of warnings written to Hudson logs. 

http://git.eclipse.org/c/platform/eclipse.platform.releng.aggregator.git/commit/?id=bdd50da3b1e3cb00b9c7ab65bf88ade6f5d59c0a