Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 450388

Summary: Different executors on same slave can share the same display
Product: [Technology] Hudson Reporter: Bob Foster <bobfoster>
Component: PluginsAssignee: Bob Foster <bobfoster>
Status: RESOLVED FIXED QA Contact: Latha Amujuri <lamujuri>
Severity: major    
Priority: P3 CC: bobfoster, david_williams, eclipse.org, lamujuri, lidiam, malaperle, mygwaymark, rovarghe
Version: 3.0.0   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Attachments:
Description Flags
xvnc plugin with display allocator per node
none
Xvnc plugin with private transient final bug fix none

Description Bob Foster CLA 2014-11-06 13:33:25 EST
From xxx:

I noticed something interesting while looking at the console logs that could explain some of our build failures. It seems that sometimes, 2 Hudson executors can share the same display which seems like a bug in the Hudson Xvnc plugin. For example,

https://hudson.eclipse.org/linuxtools/job/linuxtools-gerrit/9140/consoleFull
and
https://hudson.eclipse.org/linuxtools/job/linuxtools-gerrit/9141/consoleFull

They were running at the same time. In build #9140, the display was :76
...
[workspace_3] $ Xvnc :76 -geometry 1024x768 -depth 24 -ac
...

In the build #9141, the same command is executed but with error:

[workspace] $ Xvnc :76 -geometry 1024x768 -depth 24 -ac
Fatal server error:
Server is already active for display 76
If this server is no longer running, remove /tmp/.X76-lock
and start again.

At the end of both console log, I can see that they ended up both using the same display:
Xlib: extension "RANDR" missing on display ":76.0".

Also, I have found somewhat similar bug report in Jenkins here: https://issues.jenkins-ci.org/browse/JENKINS-17550
With the fix here: jenkinsci/xvnc-plugin@6bdd609

Would it be possible/accceptable to port the fix from the Jenkins version of the plugin?
Comment 1 Bob Foster CLA 2014-11-06 13:35:33 EST
Oops, xxx should be Marc-Andre Laperle. Sorry. 

Further comment from Marc-Andre Laperle:

I also noticed there was a subsequent fix related to this problem:
jenkinsci/xvnc-plugin@3627859
Comment 2 Bob Foster CLA 2014-11-06 13:44:47 EST
Created attachment 248474 [details]
xvnc plugin with display allocator per node
Comment 3 Bob Foster CLA 2014-11-06 13:45:52 EST
The fix is to keep a static map of node to DisplayAllocator, so that active xvnc sessions on the same node never get assigned the same display number.

This one might be tricky to test. In the previous code, display numbers were randomly assigned from the allowable range. Rarely, it could happen that two executors on the same node drew the same random number while they were both building.
Comment 4 Marc-André Laperle CLA 2014-11-10 11:28:29 EST
Hi Bob. What are the next steps? Would you like me to help by testing the new plugin?
Comment 5 Bob Foster CLA 2014-11-10 12:24:47 EST
Marc, please do. The more eyes the better.

Latha, it should be easy to test the end case. Simply set the minimum and maximum display number to the same value, e.g., 10, and build two different xvnc-using jobs at the same time on the same slave or master.
Comment 6 Bob Foster CLA 2014-11-10 13:38:39 EST
Latha:

It's looking good in that case.

I just gave it a try after setting min and max display numbers to the same value.
Comment 7 Bob Foster CLA 2014-11-10 14:24:47 EST
Released as 1.13-h-2.
Comment 8 David Williams CLA 2014-12-22 17:05:50 EST
Mind if I ask, where is the source for this plugin? 

I looked "on eclipse.org", in 
http://git.eclipse.org/c/hudson/
But it didn't seem to be there? 

I ask because on an Eclipse.org Hudson instance, I "suddenly" started to get this error (see bug 455161 for long rambling issues). 

FATAL: null
java.lang.NullPointerException
	at hudson.plugins.xvnc.Xvnc.doSetUp(Xvnc.java:83)
	at hudson.plugins.xvnc.Xvnc.setUp(Xvnc.java:73)
	at hudson.model.Build$RunnerImpl.doRun(Build.java:129)
	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:524)
	at hudson.model.Run.run(Run.java:1493)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
	at hudson.model.ResourceController.execute(ResourceController.java:82)
	at hudson.model.Executor.run(Executor.java:137)

At first I assume some "config error" or "corruption" due to having "two instances running, but then lo and behold I started to get the same error on my home test machine (after a reboot of whole machine). 

So ... just wanted to see the source, to better know what to look at, to debug the issue. One thing that's common between the two setups (the perf1test machine at eclipse.org, and my home test machine), is that neither have "slaves", only "master". But, obviously, could be many other things as well. 
Home system is Hudson 3.2.1, and perf1test at eclipse.org is 3.1.2 -- but, I *think* another thing in common is that I had recently updated both (using "auto update") to update the Xvnc plugin to 1.13-h-2.

Thanks for any pointers (to source, and/or "how to debug").
Comment 9 David Williams CLA 2014-12-23 17:04:09 EST
I am re-opening, because I think the fix is  "incomplete" in some way -- perhaps only for "master only" installations? 

The NPE I mentioned in comment 8 "went away" if I downgraded back to Xvnc plugin level 1.13-h-1.
Comment 10 David Williams CLA 2014-12-23 17:15:39 EST
I wanted to mention, too, that after I back leveled (and restarted everything) I got a message about "junk" in the config, and it mentioned

hudson.model.Hudson CannotResolveClassException: hudson.plugins.xvnc.DisplayAllocator$Property

I suspect this is from the 1.13-h-2 version but am surprised plugins do not "clean up" after themselves (if that's the right word) with in turn makes me wonder if the 1.13-h-2 version updates itself properly. Perhaps the bug leading to an NPE is in the "update" process/code, rather than the plugin code, per se?
Comment 11 Bob Foster CLA 2014-12-24 02:43:45 EST
The source for the plugin is at https://github.com/hudson3-plugins/xvnc-plugin

Seriously, this should be fixed.
Comment 12 David Williams CLA 2014-12-24 04:08:54 EST
(In reply to Bob Foster from comment #11)
> The source for the plugin is at
> https://github.com/hudson3-plugins/xvnc-plugin
> 
> Seriously, this should be fixed.

"should" be? Was it tested on a Hudson with "master only"? 

Any ideas what else might lead to the NPE reported above, after updating to this  version? And, the NPE going away after down leveling? 

'just ask'in
Comment 13 Bob Foster CLA 2014-12-24 12:48:16 EST
It isn't at all obvious to me how an NPE on line 83 of Xvnc in the current source is possible. Nor that it has anything at all to do with "master only".

Please do look at the source.
Comment 14 Bob Foster CLA 2014-12-24 13:03:50 EST
Looking again at this class:

    /*package*/ static final class Property extends NodeProperty<Node> {
        private transient final DisplayAllocator allocator = new DisplayAllocator();
        /*package*/ DisplayAllocator getAllocator() {
            return allocator;
        }
    }

"private transient final" is a bad combination. If this were saved and restored by XStream, allocator would be null. And sure enough this bug is fixed in latest Jenkins version.
Comment 15 Marc-André Laperle CLA 2015-01-27 23:39:50 EST
We just started seeing the NPE on our Hudson instance, I created bug 458602.
Comment 16 Bob Foster CLA 2015-01-28 12:42:55 EST
Created attachment 250315 [details]
Xvnc plugin with private transient final bug fix
Comment 17 Bob Foster CLA 2015-01-28 12:46:11 EST
I have attached an xvnc.hpi with a fix for the private transient final bug. I'm kind of tied up right now. If someone could test this version and verify it fixes the NPE problem, I can release it forthwith. If you want to wait for me to test it, will take a few days.
Comment 18 Peter Janes CLA 2015-02-03 12:07:15 EST
The new hpi fixes the issue for me.
Comment 19 Lidia Marchioni CLA 2015-02-12 14:21:46 EST
We also hit this issue:

 java.lang.NullPointerException
	at hudson.plugins.xvnc.Xvnc.doSetUp(Xvnc.java:83)

and the attached xvnc plugin patch worked for us too.
Comment 20 Bob Foster CLA 2015-02-12 16:02:41 EST
Fixed the NPE. See bug 458602.