| Summary: | New disk array for Hudson and build workspace storage | ||
|---|---|---|---|
| Product: | Community | Reporter: | Denis Roy <denis.roy> |
| Component: | FoE Disbursements | Assignee: | Eclipse FOE Disbursements <foe-disbursements-inbox> |
| Status: | RESOLVED WORKSFORME | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | CC: | caniszczyk, contact, david_williams, d_a_carver, gunnar, holger.staudacher, jesse.mcconnell, kim.moir, Mike_Wilson, nicolas.bros, Olivier_Thomann, pwebster, remy.suen, sbouchet, slewis, wayne.beaton |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Bug Depends on: | |||
| Bug Blocks: | 337068 | ||
|
Description
Denis Roy
I hope I am not being rude for asking ... and please do not spend more than 3 minutes educating naive desktop users such as myself ... but, it seems the immediate need is to "beef up" each build machine or slave. I think of a 10T "disk array" as something needed for massive database, such as net flicks, or IT shops that are supporting many different, always changing websites, were much flexibility/future growth is needed).
I ask only because $5.5K dollars sounds like a lot. Whereas a 6gbs 2 terabyte drive by itself would be maybe $300? So ... 5x2 == 10 .... $1500 < $5500.
Of course, if disk controllers, etc., on the current individual machines are already maxed out, then that'd explain why a "central" disk-array solution might be required, not just better. And, I am certainly not questioning your judgment; I really am just curious, asking for a small bit of education ... if you can do it in 3 minutes or less ... maybe 30 words or less :)
If not ... if you can tell already it'd be over my head ("if you don't already know, I can't explain it"?) ... then please do not feel obligated to explain detail. I'm all for the best possible solution.
Much thanks in advance for working on fixing this space issue.
> I hope I am not being rude for asking It is not any more rude than my asking why projects need 1+ GB workspaces, instead of storing build artifacts on /shared. I still have not received an answer. That strategy would cost us 0 dollars. But I will answer your questions, since it is often a misconception from non-sysadmins that "disk space is dirt cheap". > immediate need is to "beef up" each build machine or slave. I think of a 10T > "disk array" as something needed for massive database, such as net flicks, or > IT shops that are supporting many different, always changing websites, were > much flexibility/future growth is needed). Is future growth not needed here? If I add up all the space used by builds now, I see close to 1TB already. Platform hasn't fully moved their build and tests processes to Eclipse.org, and the number of Hudson jobs and Eclipse projects using our build services is growing weekly. I'm planning ahead to not go through this exercise again in 2 years. > I ask only because $5.5K dollars sounds like a lot. Whereas a 6gbs 2 terabyte > drive by itself would be maybe $300? So ... 5x2 == 10 .... $1500 < $5500. $5500 for about 10T usable, fully-redundant storage is $550 per terabyte. A single 2T disk is, well, a single disk. Works great on desktops with one user, but with a bunch of projects building at the same time, it will be swamped. And what will happen the day it fails? 5x2 SATA drives sounds like a great idea -- but where do I put them? The current build server has 3 SATA slots, all used. The two IBM build machines hosting the Hudson infra use 2.5" SAS drives, which are much more expensive, much smaller in terms of capacity (146G) and would require the purchase of hot-swap caddies from IBM. So for 5x2 drives I'd need a 2U server chassis to put them in. For an extra $300, why not get a 6th drive and have some fault tolerance? Oh, and I'd need a RAID controller to hook them 6 drives up to. So you see where this is going... *sigh* In the end, this is why in my wild fantasy world, folks wouldn't store 15GB of build artifacts in their workspace _on each slave_. I just cleared out my workspace on Hudson. That should free up 5MB or so. It's the little victories, right? Well, people that use tycho tend to use private repositories which means they essentially get eclipse + their deps in a maven repo within their workspace. More seriously, I believe that there is a tradeoff. By keeping the workspace, incremental builds should happen faster, require less network and disk access (as artifacts don't have to be pulled down), etc. I'm not sure that it's reasonable to ask folks to clear out all workspaces. Maybe some of them (like the UDC one which only runs every once-in-a-while) should be encouraged to clear out workspaces, but I don't think a broad mandate (or an automated process) serves the general case very well. (In reply to comment #2) > ... why projects need 1+ GB workspaces, > instead of storing build artifacts on /shared. > I do not know, but suspect it 80% of the answer is it would take some sort of extra work (from each project's releng "team" ... write some scripts? learn how to use hudson the right way? set some non-default flag settings (such as "save artifacts")? I am sure, overtime, we could learn to do better ... but will (probably) take some education and learning. (For example, the "private [cached] repository" mentioned in another comment sounds like the perfect thing to keep in /shared ... but might take some customization of some scripts/settings somewhere?) > > I'm planning ahead to not > go through this exercise again in 2 years. > Oh Denis ... won't you ever learn ... by then I'm sure we'll be insisting we absolutely need 20T of the latest in solid state drives. (ha ha, just kidding ... sounds like your proposed system could take that on ... but, I'm sure we'll come up with something creative :). Thanks for the extended explanation. I do think we need more storage (and like your long term thinking) ... and I think we are all willing to learn how to use our limited resources better. Perhaps some expert hudson users should put some "tips and tricks" on a wiki? Maybe a "confessions" page of "silly mistakes we've made and you can avoid" sort of tips? So, we appreciate the parallel efforts. If it seems to you, Denis, people are not learning, over time, to use hudson effectively, say .. in a couple of months, I'd propose a cross-project team of committer release engineers be formed to focus on the problem, and investigate some "case studies" of the worst offenders to see if improvements could be made, or if there are valid reasons for the need. Thanks again, Just to be clear: I am in no way assuming, or pretending that committers do not use Hudson efficiently. Perhaps it is I who simply does not understand. I understand the need to keep external dependencies to avoid downloading the world for each build, and I appreciate that. But if each project will maintain multiple local copies of the same bits, I can't help but believe we can optimize that. Then again, perhaps the hassle of optimizing further is simply offset by large disk space. But I have to question this: hudson-slave1:/opt/users/hudsonbuild/workspace # find . -name 'eclipse-SDK*' ! -wholename '*eclipse-equinox-test*' ./cbi-scout_rt-3.7.0-nightly/org.eclipse.scout.builder/download/eclipse-SDK-3.7M3-win32.zip ./cbi-mat-nightly/build/downloads/eclipse-SDK-3.6-linux-gtk-x86_64.tar.gz ./eclipse-JUnit-Linux/ws/2011-02-07_19-58-38/eclipse-testing/eclipse-SDK-N20110207-1511-linux-gtk.tar.gz ./cbi-wtp-wst.xml/build/downloads/eclipse-SDK-3.7M2a-linux-gtk-x86_64.tar.gz ./cbi-soa-jwt-integration/build/downloads/eclipse-SDK-3.7M5-linux-gtk-x86_64.tar.gz ./swtbot-e37/org.eclipse.swtbot.releng/externals/eclipse-SDK-3.7M4-linux-gtk-x86_64.tar.gz ./cbi-soa-jwt-stable/build/downloads/eclipse-SDK-3.6.1-linux-gtk-x86_64.tar.gz ./gef-latest-release/eclipse-SDK-3.6.1-linux-gtk.tar.gz ./cbi-tm-3.2-nightly/build/downloads/eclipse-SDK-3.7M4-linux-gtk-x86_64.tar.gz hudson-slave2:/opt/users/hudsonbuild/workspace # find . -name 'eclipse-SDK*' ! -wholename '*eclipse-equinox-test*' ./cbi-scout-3.7/org.eclipse.scout.releng/download/eclipse-SDK-3.6.1-win32.zip ./cbi-scout-3.7/org.eclipse.scout.releng/download/eclipse-SDK-3.7M5-win32.zip ./cbi-tm-3.2-nightly/build/downloads/eclipse-SDK-3.7M4-linux-gtk-x86_64.tar.gz ./gef-latest-release/eclipse-SDK-3.6.1-linux-gtk.tar.gz ./cbi-wtp-wst.xml/build/downloads/eclipse-SDK-3.7M2a-linux-gtk-x86_64.tar.gz ./cbi-mat-nightly/build/downloads/eclipse-SDK-3.6-linux-gtk-x86_64.tar.gz ./emf-eef-0.8/build/downloads/eclipse-SDK-3.6-linux-gtk.tar.gz ./swtbot-e35/org.eclipse.swtbot.releng/externals/eclipse-SDK-3.5-linux-gtk-x86_64.tar.gz ./swtbot-e34/org.eclipse.swtbot.releng/externals/eclipse-SDK-3.4.2-linux-gtk-x86_64.tar.gz ./swtbot-e36/org.eclipse.swtbot.releng/externals/eclipse-SDK-3.6-linux-gtk-x86_64.tar.gz So CBI-SCOUT has at least three copies of the 175+ MB SDK in its workspace, and I have every reason to believe that each of these zip files live in an unzipped directory somewhere in the workspace. These ZIP files are accessible from all of our servers directly from the file system at /home/data/httpd/download.eclipse.org/eclipse/downloads/drops... In the end, if this is how Hudson is supposed to be used, that's fine -- it will simply cost us disk space. I am only asking questions here, nothing more. (In reply to comment #2) > It is not any more rude than my asking why projects need 1+ GB workspaces, > instead of storing build artifacts on /shared. Denis, can you explain a bit what you mean by "/shared"? Is it a separate disk array that projects can use? If yes, where should a project put its things there? Just wondering because I know I have a folder in download-staging.priv for my project but not outside. Anyway, a build workspace is a temporary thing. I'm pretty sure that none of the build scripts out there perform any kind of clean up. From an administrator point of view, is it possible to limit the space from Hudson and also to limit the number of builds kept by Hudson? On the other hand, I'm not sure it's always the committers to blame. The tooling side is pretty bad in this case. Most build tools assume a workflow of [start > download the whole web > build]. They do allow customization of the workflow but... Oh and for the record ... +1 to improving the current situation. I think the disk array is the best option. We must also consider webmasters' time when installing/deploying a solution. Thus, even if the hardware looks expensive it should save more valuable webmaster time in the end than unscrewing and re-assembling hardware and re-sizing/formatting file systems. > > It is not any more rude than my asking why projects need 1+ GB workspaces, > > instead of storing build artifacts on /shared. > > Denis, can you explain a bit what you mean by "/shared"? Is it a separate disk > array that projects can use? Yes, it is a separate array. I know our storage layout is not ideal, but in my fantasy world, this is how the data flows: 1. Build in Hudson's local workspace (fast) 2. Move build artifacts to /shared 3. Promote builds to download.eclipse.org 4. Archive releases to archive.eclipse.org Here is a diagram I did some time ago to help illustrate this: http://wiki.eclipse.org/images/1/1f/Build_infra_layout.png > If yes, where should a project put its things > there? Just wondering because I know I have a folder in download-staging.priv > for my project but not outside. I've created /shared/technology/gyrex for you -- all the build and Hudson servers can access that path. This disk array should also provide us with the space we need to implement Bug 337068 "Please set up maven.eclipse.org". What would be the difference in possibly looking at using Amazon EC2 for storage space and additional slave machines? (In reply to comment #11) > What would be the difference in possibly looking at using Amazon EC2 for > storage space and additional slave machines? In addition might be able to leverage S3 as well. http://aws.amazon.com/s3/ Amazon EC2 instances don't support Windows 7 or Macs (I've never found a mac in the cloud, maybe others know where this is supported) which are needed to run our JUnit tests. http://aws.amazon.com/ec2/#os Dave speaks a lot of sense. Unless there is some requirement that eclipse always own the bare metal for everything, moving certain services out into AWS could make a lot of sense and I am suspect Amazon would love the press it would yield and make a deal. (In reply to comment #13) > Amazon EC2 instances don't support Windows 7 or Macs (I've never found a mac in > the cloud, maybe others know where this is supported) which are needed to run > our JUnit tests. > > http://aws.amazon.com/ec2/#os You could use S3 for storage though. Sure you have to look at things on a case by case basis, but if we are talking about storage, you can use S3 on any of the platforms. (In reply to comment #13) > Amazon EC2 instances don't support Windows 7 or Macs (I've never found a mac in > the cloud, maybe others know where this is supported) which are needed to run > our JUnit tests. > > http://aws.amazon.com/ec2/#os Oh and just found this link for Windows on EC2. http://aws.amazon.com/windows/ In all honesty, it's hard to justify using Amazon AWS when we get lots of hardware for free. I fear using S3 for storage when the actual servers are at Eclipse.org would introduce latency in the builds and increase our bandwidth consumption noticeably. Moving the entire Hudson farm to AWS would be interesting, excluding the fact that we'd need to do something about centralized authentication. But I'm not convinced it would be cheaper. If anyone wants to take a stab at annual pricing, please be my guest. regarding comment #16 and Windows instances at Amazon Dave, they only support Windows Server, not Windows 7. We want to test Eclipse on the operating system that a developer will have on his or her desktop, not the server. As an aside, I tried used AWS to run JUnits tests about a year ago, and found that the time to copy files from eclipse.org to my Amazon instance negated all the value from having a build in the cloud. I didn't use any of their persistent storage and maybe the connections speeds have improved in the interim. There is a real value in having the eclipse filesystem on the same network as the build machines. Here is my concern regardless of whether this Storage Array is bought or not The Foundation is always saying we have limited resources to maintain things, but there are more and more demands from the community being put on those limited resources. Budgets from my understanding are still tight, and new staff in general is not being hired to replace those that have left. So instead of trying to do everything itself, I'm suggesting that partnerships should be pursued. Take my comments with a grain of salt though, as I haven't donated to Friends of Eclipse (I give enough time to the community in other ways that I think it balances out), but I think we should look at all alternatives available for storage. The other question is, hardware eventually fails. What happens when the 10TB array gets a hardware glitch and has to be replaced, where does the money come from. I just want to make sure we are exploring all options for the amount of money we are talking about spending and making sure we are getting the most value for the money we spend. FWIW, I'm against spending FOE money on a disk array. Not because it isn't needed in an absolute sense, but rather because: 1) traditionally for the EF things like hw for shared EF computing resources have been made through technology company contributions...e.g. from IBM, Google, etc...and I don't see why that can't/shouldn't continue...rather than using FOE resources instead. It's relatively easy and cheap for hw and sw vendors to contribute those things. 2) A number of projects (mine at least) doesn't even use the Hudson build server...and provide our own hw, network, build resources...and so those projects would receive no benefit from a new disk array. 3) It seems to me that lots more 'bang for the small buck' would be gotten from targeting assistance with releng for SR projects that don't have dedicated releng resources...or perhaps for all SR projects. For example, someone to help David Williams with coordinating/solving SR builder issues (for all SR projects)...or someone to work with Denis and David to come up with and implement disk space efficiency improvements in the use of the existing builder disk space. My vote is to buy the new storage array today. We currently have significant storage issues when running builds. A new disk array costs ~$6000. It doesn't take long for the community to rack up $6000 in people costs due to broken and delayed builds and the resulting decline in productivity. As Denis says, if you you are interested in AWS, do some estimates on pricing. The reality is that switching our builds to run in the cloud will require non-trival amounts of testing time so it's not an immediate solution. (In reply to comment #21) > My vote is to buy the new storage array today. We currently have significant > storage issues when running builds. A new disk array costs ~$6000. It doesn't > take long for the community to rack up $6000 in people costs due to broken and > delayed builds and the resulting decline in productivity. > > As Denis says, if you you are interested in AWS, do some estimates on pricing. > The reality is that switching our builds to run in the cloud will require > non-trival amounts of testing time so it's not an immediate solution. There are a couple of things that we can also do from a community aspect. Make sure the builds are cleaning up after themselves. I will say that Maven builds when using local repositories (i.e. each build uses their own local repo) can consume a lot of disk space. The problem with using a shared repo is that you can get unexpected artifacts in the repo, causing unpredicatable results at times. I did some initial calculations for AWS S3, and you are looking at about $1000 for a year, if you get a 500 TB storage space, and calculate in data transfer rates. As I said, if the Foundation was interested they could possibly negotiate with Amazon better rates, but I'm not involved with that. An EC2 instance running 24 hrs a day at their standard rate, would be about $845, but EC2 instances don't need to run 24hrs a day. Again negotiation with Amazon could possibly bring rates down even more. While I like disk space myself, I think the people using Hudson need to do a better job of doing good release engineering practices. Too many don't consider that the space is being used by more than themsleves. Also, there is a plugin for Hudson that can help with cleanup of Jobs on the slaves: http://wiki.hudson-ci.org/display/HUDSON/Hudson+Distributed+Workspace+Clean+plugin Hudson itself will cleanup a workspace on a slave machine if it hasn't been used in over a month. But it sounds like the above plugin might be of use. first 1TB ~ $0.93 per month another 9TB ~ $7.47 per month $100.80 per year for 10TB. That's cheap indeed. But I don't see how this could be technically useful. I know there is a S3 FUSE driver ([1]) but that thing has a lot of warnings on its site and this method isn't even supported by Amazon. I also doubt that people will rewrite their build scripts to download and upload stuff to S3. Cleaning workspaces is a good and necessary. I just hit a build issue today with old stuff in a workspace on a different slave. However, the only problem with plug-ins is that it's impossible for webmasters to force them enabled/configured on jobs. [1] http://code.google.com/p/s3fs/wiki/FuseOverAmazon (In reply to comment #22) > I did some initial calculations for AWS S3, and you are looking at about $1000 > for a year, if you get a 500 TB storage space, (In reply to comment #23) > first 1TB ~ $0.93 per month > another 9TB ~ $7.47 per month > > $100.80 per year for 10TB. That's cheap indeed. Yeah, it sounds good and cheap, doesn't it? Until you start using it. Before we talk about terabytes, let's look at August 2010 -- this is our actual use case. We'd do an rsync every 2 hours, moving download.eclipse.org -> AWS -- everything, including the3 many nightly builds. This does not include bulk client downloads from cloudfront -- just xfer to S3. Amazon S3 service - US Standard Region $0.150 per GB - first 50 TB / month of storage used 360.831 GB-Mo 54.12 $0.01 per 1,000 PUT, COPY, POST, or LIST requests 2,072,162 Requests 20.72 $0.01 per 10,000 GET and all other requests 2,570,322 Requests 2.57 TOTAL August 2010: 77.41 AWS data transfer - US East (Northern Virginia) and US Standard Regions $0.000 per GB - data transfer in (Until October 31st, 2010) 471.283 GB 0.00 $0.000 per GB - first 1 GB / month data transfer out 1.000 GB 0.00 $0.150 per GB - up to 10 TB / month data transfer out 649.269 GB 97.39 TOTAL August 2010 97.39 So that's $174 USD per month, for 360G, including transit for nightly builds. That's $2100/year -- for 360G. Yes, the transit is actually more expensive than the storage. After three years of this, we'd have shelled out $6300 and we still only have 360G, and we'd still be on the hook for paying out $174/month. Right now we have close to 1 terabyte of build data. If we talk about even 2 terabytes, we need to scale up the transit costs as well, because we'll be moving a lot of bytes across the Internet at that scale, which would also necessitate increasing Eclipse.org's bandwidth. (In reply to comment #14) > Amazon would love the press it would yield and make a deal. I'm not as optimistic as you are about that. Please don't get me wrong, I don't want to spend money. Personally, I don't want a new disk array. If we can keep the local Hudson workspaces clean and small, we can go back to fast builds and zero expense for a good while. (In reply to comment #23) > > Cleaning workspaces is a good and necessary. I just hit a build issue today > with old stuff in a workspace on a different slave. However, the only problem > with plug-ins is that it's impossible for webmasters to force them > enabled/configured on jobs. With Hudson new Jobs can be setup based on templates of existing Jobs. If the plugin is installed, you can go through the jobs and make sure the plugin is enabled. Yes this means going through all existing jobs and setting the plugin to be enabled. In the past the Community maintained this, but now it's the webmasters responsibility. Basically, I'd prefer getting people to clean up their workspaces and if they need resources after a build is done use the Archive Artifacts option within Hudson to do it. A workspace should not be 6GB when a build completes, as either the workspace should be deleted, or it should clean up what it doesn't need to be kept around. It's just good build practices when you are sharing a community system. (In reply to comment #24) > Before we talk about terabytes, let's look at August 2010 -- this is our actual > use case. We'd do an rsync every 2 hours, moving download.eclipse.org -> AWS > -- everything, including the3 many nightly builds. This does not include bulk > client downloads from cloudfront -- just xfer to S3. > > Amazon S3 service - US Standard Region > $0.150 per GB - first 50 TB / month of storage used 360.831 GB-Mo 54.12 > $0.01 per 1,000 PUT, COPY, POST, or LIST requests 2,072,162 Requests > 20.72 > $0.01 per 10,000 GET and all other requests 2,570,322 Requests 2.57 > TOTAL August 2010: 77.41 > > AWS data transfer - US East (Northern Virginia) and US Standard Regions > $0.000 per GB - data transfer in (Until October 31st, 2010) 471.283 GB > 0.00 > $0.000 per GB - first 1 GB / month data transfer out 1.000 GB 0.00 > $0.150 per GB - up to 10 TB / month data transfer out 649.269 GB 97.39 > TOTAL August 2010 97.39 Denis you are forgetting one thing, there are not transit costs from EC2 to S3 if done within the same region. So if master and slaves are all in EC2, then there are not transit costs. Anyways, my suggestion is to tackle the root problem and not the symptom. Get people to be better build maintainers, install the necessary plugin to help manage things, and proceed forward from there. >A workspace should not be 6GB when a build completes, as either the workspace
should be deleted, or it should clean up what it doesn't need to be kept
around. It's just good build practices when you are sharing a community
system.
Last night's Eclipse and Equinox build on my local build machine (not eclipse.org) generated 12G of build artifacts. This number doesn't include the zips that are available for download.
-bash-3.00$ pwd
/builds
-bash-3.00$ du -sh N201102162000
12G N201102162000
I have removed several platforms in 3.7 but there are still 15 platforms to build on our plan. This consumes disk space. I keep my build workspace after the build completes because if there are problems, it's useful to look at the build artifacts for troubleshooting purposes. Of course, I've been cleaning my hudson workspace regularly when I've been testing the build running on eclipse.org hardware. However, once the build moves to eclipse.org, I will need more disk space. And since people have been complaining for years that the Eclipse project build is not open, it would be nice if we had adequate disk space to become more transparent :-)
As for AWS transport charges, we would still need to get the source from eclipse.org to build it, and then transport back the build result from eclipse.org. So I question whether the cost is exactly zero :-)
I'm not saying you aren't cleaning up, but of the 12 GB, your workspace still ends up being 15GB when the build completes. This 15GB stays around and if the build runs on multiple slaves, that is 15GB + 12GB (being taken up on Master for the archived artifacts). What we need to do is educate the rest of the community that just blindly leaves things as is, and doesn't do the necessary clean up of their workspaces. The plugin I mentioned for Hudson will help but isn't the ultimate solution. Educating those that are doing the builds and making sure their builds are doing the cleanup necessary is something the Architecture Council should strive to do. Eventually we need more disk space, but we need to make sure the builds are playing nice with the resources we have. (In reply to comment #28) > I'm not saying you aren't cleaning up, but of the 12 GB, your workspace still > ends up being 15GB when the build completes. This 15GB stays around and if the > build runs on multiple slaves, that is 15GB + 12GB (being taken up on Master This is something we've recently learned as well. If we have eight slaves, we'll end up with lots of duplicate workspaces. (In reply to comment #20) > 1) traditionally for the EF things like hw for shared EF computing resources > have been made through technology company contributions That is true, and the solicitations for such hardware have started long ago. So. I have a proposition. Since slave1 and slave2 are served off the same physical host (with 4 CPU cores for each slave), we don't we: - eliminate slave2 - allocate slave2's CPU cores and RAM to slave1 - increase slave1's threads from 4 to 8 - allocate all of slave2's disk space, and any other remaining storage on that host, to slave1. BENEFITS: - Zero immediate cost (other than some webmaster time) - Local workspace on local disks = fast builds - Available workspace disk space would be in excess of 200G -- actually, close to 300G if we play this right - Easier troubleshooting -- no more 'it works on slave1 but not on slave2' - Less administration -- one fewer slave to maintain - Reduced number of local workspaces means less disk waste - Buys us time to solicit hardware and investigate other options DRAWBACKS: - Even 300G will fill up eventually (but if we all play nice, it should last us a long time) How about it? (In reply to comment #29) > How about it? Disclaimer: I know nothing about the builds. If we consolidate the two slaves, does that mean if slave 1 dies it's pretty much game over with all the load on the master? Whereas having two slaves means a better distribution of work even if one slave dies? I'm not sure what you mean by slave1 'dying'. Both slaves are currently on the same physical machine, so right now, if that machine dies, both slaves go away. We do keep a 'blank slave' server image handy. This means we can spin up additional slaves on different hardware in under an hour. But to answer your question, should the consolidated slave1 die, we'd still have the master and "Fastlane" for builds until we can either fix slave1 or simply discard it and bring a new one back to life. Again, we're talking hours here, not days. (In reply to comment #31) > Both slaves are currently on the > same physical machine, so right now, if that machine dies, both slaves go away. Oh, they're actually on the same physical machine. Well then. > We do keep a 'blank slave' server image handy. This means we can spin up > additional slaves on different hardware in under an hour. > > But to answer your question, should the consolidated slave1 die, we'd still > have the master and "Fastlane" for builds until we can either fix slave1 or > simply discard it and bring a new one back to life. Again, we're talking hours > here, not days. Okay, that addresses my concerns. Thank you for your information, Denis. >
> How about it?
+1 i think that is a good idea.
IMO, slaves should be used builds/tests artifacts in differents platform. ( mac/win/linux, 32/64, fedora/ubuntu whatever...).
It's a hudson best practice to scale horizontally versus vertically. http://www.slideshare.net/cloudbees/7-ways-to-optimize-hudson-in-production However, given that the we lack hardware resources, it seems like a good compromise to reduce the number of slaves by one to conserve disk space. +1 Plus I would still recommend installing the Distributed Workspace cleanup plugin at least test it out somewhere in the sandbox or something. Now would be a good time to try re-configuring, now that Helios SR2 is done, if that still seems like a good idea (and, it does to me). Of course, M6 +0 is coming up soon (about 2 weeks, 3/11) so wouldn't want to get too close to that. IMHO, some time between 2/28 and 3/4 would probably be ok ... with the usual "notice to community", etc. Plus ... be sure to allow time to revert ... there's always some chance we might find something else is worse or broken with that configuration, I guess? Some users will have to 'react' to the change, though, right?. For example, if they have configured their job to use "slave2" and now suddenly there is no slave2, not sure if there job just won't run ... or, something worse? In an ideal world, some little sed script could be written stream through the config.xml files in jobs, and correct some of the common cases ... or, maybe use grep to list the config files that mention slave2, so people will know if they need to react or not? I guess that'd be nearly everyone .. so, maybe just give notice and see what happens? But, mostly just wanted to say I think we have about a week's window to "experiment" ... after that, should probably wait until after M6 ... which ends on 3/18 ... then there's EclipseCon ... so, up to you ... I'm just pointing out the near significant dates. I'm withdrawing this request. Matt has eliminated hudson-slave2, and hudson-slave1 has been stretched to support 350G of local disk space. It also has been boosted to 28G of RAM and it currently has 6 executor threads. If projects play nice, the 350G for local workspaces should last us for a long time while offering us decent build performance. |