Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 367238 - improve Hudson stability and performance
Summary: improve Hudson stability and performance
Status: RESOLVED WORKSFORME
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: CI-Jenkins (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows 7
: P3 critical (vote)
Target Milestone: ---   Edit
Assignee: Eclipse Webmaster CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 335809 339416 360666 367491 367772 367774 369496 371039 371093 371094 371109 372343 372358 385909
Blocks:
  Show dependency tree
 
Reported: 2011-12-20 13:35 EST by Kim Moir CLA
Modified: 2012-07-25 02:57 EDT (History)
16 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kim Moir CLA 2011-12-20 13:35:23 EST
Right now, the Hudson UI is unusable.  Just clicking a link to look at a build configuration takes a minute or so.

Andrew Overholt mentioned in the architecture council that he asked the folks who manage the JBoss Hudson servers to work with the webmasters and try to address some of the ongoing Hudson issues we have been seeing.    It seems there are significant performance problems that necessitate frequent restarts.  I hope that their discussions can help address the underlying issues that destabilize our Hudson install.  Restarting Hudson frequently isn't addressing the root cause of these issues.
Comment 1 Matthias Sohn CLA 2011-12-21 04:09:40 EST
Yes, Hudson UI is painfully slow. 

Especially opening the job configuration page is horribly slow, when trying to open this page it takes at least a minute until the configuration page starts rendering but then it shows "Loading" for a long time while additional widgets to be rendered are popping up every minute or so and it takes ages (10-15 minutes) until I can change a parameter on the configuration page.

Bumping up the priority as this is in our way to release JGit and EGit 1.2.
Comment 2 Jesse McConnell CLA 2011-12-21 08:06:32 EST
I'll toss in that tweaking build configuration is where my largest pain point is as well, taking 5-10 minutes for a build configuration page to update so you can run the build is very rough.
Comment 3 Eclipse Webmaster CLA 2011-12-21 10:11:48 EST
Well I'm seeing some 'proxy' issues in the logs(between the Apache front end and the servlet), so I'm wondering if Winstone is simply unable to handle the current load.  

I finally started seeing the interface slow down today, so I restarted the master process and it seems much happier, and the logs are currently 'empty' of the above error.

I'm going to look into perhaps moving the sandbox to something more 'recent' (like Jetty) as suggested by the folks at Jboss.

I've also sent out some questions to some other OSS infra admins to see what their experiences may provide.

-M.
Comment 4 Adolfo Sanchez-Barbudo Herrera CLA 2011-12-21 11:00:26 EST
Much, much better now.

Thanks for taking care of this.

Regards,
Adolfo.
Comment 5 Nicolas Bros CLA 2011-12-21 11:09:06 EST
I am now getting a connection timeout error each time I try to access https://hudson.eclipse.org/hudson/
Comment 6 Denis Roy CLA 2011-12-21 11:51:54 EST
(In reply to comment #3)
> I restarted the master process and it seems much happier

(In reply to comment #4)
> Much, much better now.
> 


So, if restarting the master process makes things much better (and they are -- Hudson is much more responsive today than yesterday) I'm wondering what it is that we, the humble webmaster, can actually do to resolve the issue, other than automatic restarts every X days.  

As Linux sysadmins, these constant restarts are a foreign concept to us, where some services are only restarted every few years because the underlying hardware has failed. We can't help but assume there's some garbage collectors within Hudson that aren't working too hard.  Or there's a memory leak --  or something.

Back when Matt set up the current Hudson infra, our intention was to provision a fairly powerful master and have it devoted to being a master.  Since then, more and more projects are switching their builds to use the master only.  This is not helpful.

We will engage with the jBoss guys.  One big issue comes to mind:

NFS.

The master's workspace and scratchpad are on NFS (/shared) and I'm wondering why that is.  I'm curious to know how many other Hudson build farms use NFS to share files, when Hudson is design to work in a "disconnected" mode -- it uses SSH for all slave communications.  Having such a reliance on /shared also precludes us from building "in the cloud" as some would say.  What happens if we get some servers in a remote location somewhere? Why can't we use SSH for inter-server communications?

Don't get me wrong -- NFS is a proven performer.  But I'm wondering if Hudson+NFS is.
Comment 7 Nicolas Bros CLA 2011-12-22 09:06:35 EST
Hi,
I cannot access Hudson at all since yesterday from my IP: "82.234.4.64".
But it works well when using my corporate VPN, with this IP : "80.118.144.195"
It is only Hudson that I cannot access; other eclipse.org servers work as expected.
Comment 8 Eclipse Webmaster CLA 2011-12-22 09:50:44 EST
Looks like a local firewall block on the hudson master server.  I've cleared it.

-M.
Comment 9 Nicolas Bros CLA 2011-12-22 10:07:15 EST
(In reply to comment #8)
> Looks like a local firewall block on the hudson master server.  I've cleared
> it.
It works now. Thank you!
Comment 10 Benjamin Bentmann CLA 2011-12-30 14:23:58 EST
Regarding stability, the LinkageError mentioned in https://bugs.eclipse.org/bugs/show_bug.cgi?id=339027#c35 seems to be a recurring problem. From the stack trace, this might be something to take up with the Hudson developers.
Comment 11 Jesse McConnell CLA 2012-01-17 14:56:42 EST
I am seeing this problem again...

it really makes using hudson intolerable
Comment 12 Eclipse Webmaster CLA 2012-01-17 15:14:47 EST
I"ve restarted the master process.

-M.
Comment 13 Nicolas Bros CLA 2012-02-02 08:41:05 EST
Hudson is becoming very slow again when trying to save a job configuration. After I clicked "Save" it took more than a minute and then I got a "Bad Gateway" error and my changes were lost.

Maybe it needs to be restarted again?
Comment 14 Eclipse Webmaster CLA 2012-02-02 10:38:28 EST
I've restarted it. 

-M.
Comment 15 Denis Roy CLA 2012-06-18 10:02:09 EDT
Matt, I'm guessing now is a great time to start planning for a Hudson overhaul (upgrade, etc).
Comment 16 Eclipse Webmaster CLA 2012-07-04 11:03:35 EDT
Ok I have a plan for the upgrade(371039).

Overall I think things have been better since we switched to Jetty.  As such I'll close this.  If I'm being premature please reopen.

-M.