| Summary: | improve Hudson stability and performance | ||
|---|---|---|---|
| Product: | Community | Reporter: | Kim Moir <kim.moir> |
| Component: | CI-Jenkins | Assignee: | Eclipse Webmaster <webmaster> |
| Status: | RESOLVED WORKSFORME | QA Contact: | |
| Severity: | critical | ||
| Priority: | P3 | CC: | adolfosbh, bentmann, christian.campo, david_williams, dennis.huebner, eric.gwin, gdupe, gunnar, hrr, jesse.mcconnell, matthias.sohn, mistria, mknauer, nicolas.bros, sbouchet, stepper |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Windows 7 | ||
| See Also: |
https://bugs.eclipse.org/bugs/show_bug.cgi?id=339027 https://bugs.eclipse.org/bugs/show_bug.cgi?id=370917 |
||
| Whiteboard: | |||
| Bug Depends on: | 335809, 339416, 360666, 367491, 367772, 367774, 369496, 371039, 371093, 371094, 371109, 372343, 372358, 385909 | ||
| Bug Blocks: | |||
|
Description
Kim Moir
Yes, Hudson UI is painfully slow. Especially opening the job configuration page is horribly slow, when trying to open this page it takes at least a minute until the configuration page starts rendering but then it shows "Loading" for a long time while additional widgets to be rendered are popping up every minute or so and it takes ages (10-15 minutes) until I can change a parameter on the configuration page. Bumping up the priority as this is in our way to release JGit and EGit 1.2. I'll toss in that tweaking build configuration is where my largest pain point is as well, taking 5-10 minutes for a build configuration page to update so you can run the build is very rough. Well I'm seeing some 'proxy' issues in the logs(between the Apache front end and the servlet), so I'm wondering if Winstone is simply unable to handle the current load. I finally started seeing the interface slow down today, so I restarted the master process and it seems much happier, and the logs are currently 'empty' of the above error. I'm going to look into perhaps moving the sandbox to something more 'recent' (like Jetty) as suggested by the folks at Jboss. I've also sent out some questions to some other OSS infra admins to see what their experiences may provide. -M. Much, much better now. Thanks for taking care of this. Regards, Adolfo. I am now getting a connection timeout error each time I try to access https://hudson.eclipse.org/hudson/ (In reply to comment #3) > I restarted the master process and it seems much happier (In reply to comment #4) > Much, much better now. > So, if restarting the master process makes things much better (and they are -- Hudson is much more responsive today than yesterday) I'm wondering what it is that we, the humble webmaster, can actually do to resolve the issue, other than automatic restarts every X days. As Linux sysadmins, these constant restarts are a foreign concept to us, where some services are only restarted every few years because the underlying hardware has failed. We can't help but assume there's some garbage collectors within Hudson that aren't working too hard. Or there's a memory leak -- or something. Back when Matt set up the current Hudson infra, our intention was to provision a fairly powerful master and have it devoted to being a master. Since then, more and more projects are switching their builds to use the master only. This is not helpful. We will engage with the jBoss guys. One big issue comes to mind: NFS. The master's workspace and scratchpad are on NFS (/shared) and I'm wondering why that is. I'm curious to know how many other Hudson build farms use NFS to share files, when Hudson is design to work in a "disconnected" mode -- it uses SSH for all slave communications. Having such a reliance on /shared also precludes us from building "in the cloud" as some would say. What happens if we get some servers in a remote location somewhere? Why can't we use SSH for inter-server communications? Don't get me wrong -- NFS is a proven performer. But I'm wondering if Hudson+NFS is. Hi, I cannot access Hudson at all since yesterday from my IP: "82.234.4.64". But it works well when using my corporate VPN, with this IP : "80.118.144.195" It is only Hudson that I cannot access; other eclipse.org servers work as expected. Looks like a local firewall block on the hudson master server. I've cleared it. -M. (In reply to comment #8) > Looks like a local firewall block on the hudson master server. I've cleared > it. It works now. Thank you! Regarding stability, the LinkageError mentioned in https://bugs.eclipse.org/bugs/show_bug.cgi?id=339027#c35 seems to be a recurring problem. From the stack trace, this might be something to take up with the Hudson developers. I am seeing this problem again... it really makes using hudson intolerable I"ve restarted the master process. -M. Hudson is becoming very slow again when trying to save a job configuration. After I clicked "Save" it took more than a minute and then I got a "Bad Gateway" error and my changes were lost. Maybe it needs to be restarted again? I've restarted it. -M. Matt, I'm guessing now is a great time to start planning for a Hudson overhaul (upgrade, etc). Ok I have a plan for the upgrade(371039). Overall I think things have been better since we switched to Jetty. As such I'll close this. If I'm being premature please reopen. -M. |