Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 335809 - How do we keep Hudson clean to avoid "No space left on device"
Summary: How do we keep Hudson clean to avoid "No space left on device"
Status: RESOLVED FIXED
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: CI-Jenkins (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P2 major (vote)
Target Milestone: ---   Edit
Assignee: Eclipse Webmaster CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 367238
  Show dependency tree
 
Reported: 2011-01-31 02:20 EST by Eike Stepper CLA
Modified: 2014-02-11 10:22 EST (History)
14 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eike Stepper CLA 2011-01-31 02:20:04 EST
https://hudson.eclipse.org/hudson/job/emf-cdo-integration/1100/console :

[java] [ant] You can check signing status by tailing /home/data/httpd/download-staging.priv/arch/signer.log [java] [ant] Waiting for signing to complete. This may take more then 60 minutes
[java] [ant] Obtaining signed file from staging area
[java] An exception occurred while writing to the platform log: [java] java.io.IOException: No space left on device
Comment 1 Denis Roy CLA 2011-01-31 08:49:28 EST
Current disk hog is eclipse-equinox:

==== hudson-slave1.eclipse.org ====
/dev/xvda1             55G   55G   92M 100% /
-> Usage exceeding 1GB for: Hudson workspace on hudson-slave1 (50G capacity) (2011-01-31T08:21)
  15.3G eclipse-equinox-test-N
   1.8G cbi-papyrus-integration
   1.8G cbi-papyrus-0.7-nightly
   1.4G virgo.kernel.snapshot
   1.3G eclipse-JUnit-Linux
   1.3G cbi-mat-nightly
   1.1G cdt-nightly
   1.1G cdt-release
   1.0G cbi-wtp-wst.xml
==== END: hudson-slave1.eclipse.org ====
Comment 2 Kim Moir CLA 2011-01-31 09:06:16 EST
I only keep three builds from each of my streams. I just deleted some so they are down to 2 builds kept.  I think we need more disk space.  During milestone weeks, we will have several builds running simultaneously. Each of our builds consumes at least 6GB.  I can't store the build artifacts in /shared/eclipse because the mac and windows test machines don't have access to these volumes.
Comment 3 Denis Roy CLA 2011-01-31 09:09:32 EST
(In reply to comment #2)
> I can't store the build artifacts in /shared/eclipse
> because the mac and windows test machines don't have access to these volumes.

Sure they can.. http://build.eclipse.org/eclipse/
Comment 4 Denis Roy CLA 2011-01-31 09:10:49 EST
> Each of our builds consumes at least 6GB


I thought that was for Platform... This is equinox we're talking about.
Comment 5 Kim Moir CLA 2011-01-31 09:25:55 EST
Eclipse and Equinox are the same build.  

I didn't know about http://build.eclipse.org/eclipse/.  Last time, I asked about I was told there wasn't a way to access /shared/eclipse from the non-linux slaves :-)

https://bugs.eclipse.org/bugs/show_bug.cgi?id=329830#c21
Comment 6 Denis Roy CLA 2011-01-31 09:32:50 EST
> I didn't know about http://build.eclipse.org/eclipse/.  Last time, I asked
> about I was told there wasn't a way to access /shared/eclipse from the
> non-linux slaves :-)

The docs always take precedence over what we say  ;)

http://wiki.eclipse.org/IT_Infrastructure_Doc#Builds  



Marking as fixed ... for now.
Comment 7 Eike Stepper CLA 2011-02-01 02:21:49 EST
Blocker: It's not fixed: https://hudson.eclipse.org/hudson/job/emf-cdo-integration/1108/console
Comment 8 Nicolas Bros CLA 2011-02-01 03:41:33 EST
I confirm the issue. I also get:
java.io.IOException: No space left on device

This is a blocker for both Indigo M5 and Helios SR2 RC2. Today (Tuesday) is +2 and tomorrow +3.
Comment 9 Bouchet Stéphane CLA 2011-02-01 03:57:40 EST
Same here for ATL..
Comment 10 Nicolas Bros CLA 2011-02-01 04:29:06 EST
Which disk exactly is full? When I type "df" when connected to build.eclipse.org in ssh, I don't see anything near 100%. It's not mounted on build.eclipse.org?

Anyway, I wiped my workspace and removed old builds, and I could start a build afterwards. So, the problem seems solved, at least temporarily.
Comment 11 Denis Roy CLA 2011-02-01 08:04:57 EST
> Which disk exactly is full? When I type "df" when connected to
> build.eclipse.org in ssh, I don't see anything near 100%. It's not mounted on
> build.eclipse.org?

Each Hudson slave has its own storage for the workspace.  This storage is only meant to be temporary, since build artifacts should either be moved to /shared or promoted to download.

Please see this diagram for a better view:

http://wiki.eclipse.org/Image:Build_infra_layout.png
Comment 12 Denis Roy CLA 2011-02-01 08:28:46 EST
I'll rename this bug to capture the bigger problem: Now that more and more projects are jumping on Hudson, what can we do to ensure the place stays clean?

Here are the two problems we'll encounter:


1. Large workspaces spanning multiple slaves.


2. Look at how many jobs are on the Hudson home page.  I have no way of knowing if any jobs are orphaned.


3. Multiple jobs for the same project, all with "small" workspaces, adds up


Potential solutions for 1)

- I wipe _all_ workspaces clean every Sunday.  No exceptions.
- We get a 10TB disk array and have no problems for a couple of... months? years?


Potential solutions for 2)

- I automatically delete jobs that have not run in 60 days


Potential solutions for 3)

- We limit the number of jobs one project can have
- We get a 10TB disk array and have no problems for a couple of... months? years?
Comment 13 Christian Campo CLA 2011-02-01 08:41:34 EST
So am all for removing artifacts (not the jobs) after 30 days    +1

And lets start collecting money for that 10 GB disk    +1 

Wiping all workspaces on sunday I believe is not so good

- christian

(In reply to comment #12)
> I'll rename this bug to capture the bigger problem: Now that more and more
> projects are jumping on Hudson, what can we do to ensure the place stays clean?
> 
> Here are the two problems we'll encounter:
> 
> 
> 1. Large workspaces spanning multiple slaves.
> 
> 
> 2. Look at how many jobs are on the Hudson home page.  I have no way of knowing
> if any jobs are orphaned.
> 
> 
> 3. Multiple jobs for the same project, all with "small" workspaces, adds up
> 
> 
> Potential solutions for 1)
> 
> - I wipe _all_ workspaces clean every Sunday.  No exceptions.
> - We get a 10TB disk array and have no problems for a couple of... months?
> years?
> 
> 
> Potential solutions for 2)
> 
> - I automatically delete jobs that have not run in 60 days
> 
> 
> Potential solutions for 3)
> 
> - We limit the number of jobs one project can have
> - We get a 10TB disk array and have no problems for a couple of... months?
> years?
Comment 14 Dennis Huebner CLA 2011-02-01 09:02:00 EST
> - We limit the number of jobs one project can have
I think this is not a solution, one can occupate enough space with only one job running.

> - I automatically delete jobs that have not run in 60 days
Delete the job? ...or just older builds and maybe ws? Just imagine one have a maintanance job for Helios, so it could happen that this job will be deleted cause there were any commits between the SR1 and first SR2 milestone. 

> - We get a 10TB disk array and have no problems

+1
Comment 15 Bouchet Stéphane CLA 2011-02-01 09:30:35 EST
> Potential solutions for 1)
> 
> - I wipe _all_ workspaces clean every Sunday.  No exceptions.

The projects can configure their jobs to only keep artifact for X days and/or the last X builds. Only the manually locked builds are kept forever ( but i am not sure this is a good idea to keep artifact on hudson and not in other place like download or shared space ) 


> - We get a 10TB disk array and have no problems for a couple of... months?
> years?

+0 giving more space will push people to use it and not clean their workspaces.

> 
> 
> Potential solutions for 2)
> 
> - I automatically delete jobs that have not run in 60 days

Same remark has above, hudson will delete any old builds automatically according to the configuration.

> 
> 
> Potential solutions for 3)
> 
> - We limit the number of jobs one project can have
> - We get a 10TB disk array and have no problems for a couple of... months?
> years?
Comment 16 Nicolas Bros CLA 2011-02-01 10:00:09 EST
(In reply to comment #12)
> - I wipe _all_ workspaces clean every Sunday.  No exceptions.
+1 this shouldn't cause any problems since each job should be able to work from an empty workspace. But we should avoid wiping workspaces while a job is running.

> - We get a 10TB disk array
+1 for bigger disk (though 10TB is probably overkill)
50GB doesn't seem much, given the number of jobs.

> - I automatically delete jobs that have not run in 60 days
"Automatically" seems dangerous. But maybe send a mail to the person or team that requested the job to ask them whether it can be deleted.
Also, be aware that Hudson seems to lose the "last run" information when deleting all builds from a project.

> - We limit the number of jobs one project can have
The problem is not so much the number of jobs as the space they occupy.
Maybe add a disk quota per project.
Comment 17 Kim Moir CLA 2011-02-01 10:06:43 EST
> - I wipe _all_ workspaces clean every Sunday.  No exceptions.

No thank you.  I run test builds on Sunday because traffic is light :-).  When
we have resolved all the issues with the Windows and Mac slaves and can run our
builds on eclipse.org hardware, we run builds on Sunday too.

> - We get a 10TB disk array and have no problems for a couple of... months?
> years?

I can't estimate how much space you need but it's not realistic to expect that
everyone builds at Eclipse on Hudson while the amount of disk space doesn't
increase.

> - I automatically delete jobs that have not run in 60 days

No thank you.  As someone has already mentioned, there are old jobs that are
kept around for maintenance purposes.  If they are configured to delete old
jobs, they won't consume much space.  If I'm not using a build for a while, I
flush the workpace too so it doesn't consume disk space.

- We limit the number of jobs one project can have

I don't expect that most people have extra jobs lying around just for fun :-) I
only have jobs that exist for a specific purpose.

I think the key is to encourage projects to only keep a few builds on Hudson to
reduce the disk space fingerprint.
Comment 18 Dennis Huebner CLA 2011-02-01 10:19:41 EST
(In reply to comment #16)
> (In reply to comment #12)
> > - I wipe _all_ workspaces clean every Sunday.  No exceptions.
>  1 this shouldn't cause any problems since each job should be able to work 
from
> an empty workspace. But we should avoid wiping workspaces while a job is
> running.
After next job run ws will be full again, so you will free 
space for a couple of hours.
> 
> > - We get a 10TB disk array
>  1 for bigger disk (though 10TB is probably overkill)
> 50GB doesn't seem much, given the number of jobs.
> 
> > - I automatically delete jobs that have not run in 60 days
> "Automatically" seems dangerous. But maybe send a mail to the person or 
team
> that requested the job to ask them whether it can be deleted.
> Also, be aware that Hudson seems to lose the "last run" information when
> deleting all builds from a project.
There are jobs that are scheduled by hudson itself using cron, to run once a day or week whithout to check SCM. Such jobs will not be recognized as unused.

I think we should have a kind of script that scan all the job configs for important setting like:
- keeped builds (max count | date)
- scheduling by using SCM check or changed URL (cron is not alowed)
- and maybe "Abort the build if it's stuck" (does not save space but should be set)

> 
> > - We limit the number of jobs one project can have
> The problem is not so much the number of jobs as the space they occupy.
> Maybe add a disk quota per project.
Comment 19 Denis Roy CLA 2011-02-01 10:33:18 EST
> Wiping all workspaces on sunday I believe is not so good

Why not?  Jobs are supposed to run cleanly with an empty workspace.  This happens when we add slaves.



(In reply to comment #15)
> The projects can configure their jobs to only keep artifact for X days

Yes, but they don't.  Then we run out of disk space.

 
> +0 giving more space will push people to use it and not clean their workspaces.

Exactly.



(In reply to comment #16)
> > - I wipe _all_ workspaces clean every Sunday.  No exceptions.
> +1 this shouldn't cause any problems since each job should be able to work from
> an empty workspace. But we should avoid wiping workspaces while a job is
> running.

Exactly.  And agreed.


> > - We get a 10TB disk array
> +1 for bigger disk (though 10TB is probably overkill)
> 50GB doesn't seem much, given the number of jobs.

At this point, there's not much price difference between 5T and 10T.


> > - We limit the number of jobs one project can have
> The problem is not so much the number of jobs as the space they occupy.
> Maybe add a disk quota per project.

Unless Hudson has a quota mechanism, at the file system, all files belong to 'hudsonBuild', so it's difficult to map a job to a project.


(In reply to comment #17)
> I can't estimate how much space you need but it's not realistic to expect that
> everyone builds at Eclipse on Hudson while the amount of disk space doesn't
> increase.

In the last year to year-and-a-half, we've added 3TB for downloads/archives, 1TB for /shared and 500G for Hudson.  The 4TB downloads is almost 30% full, /shared is over 70% full and the 500G for Hudson ... well, not a ton of that left either.

In short, I'm not adding another Megabyte of disk space until we can come up with a concrete way to ensure we're cleaning up.  Wasting disk space is just too easy, and too costly for the Foundation.


> I don't expect that most people have extra jobs lying around just for fun :-)

I wouldn't expect project to maintain Gigabytes of files just for fun, but /shared is 700GB strong.


> I think the key is to encourage projects

For years I've been encouraging projects to clean up and to avoid wasting disk space.  And yet here we are.


(In reply to comment #18)
> After next job run ws will be full again, so you will free 
> space for a couple of hours.

Not as full -- some (many?) projects have a workspace several GB in size, containing build artifacts from months ago.

Those 'maintenance' jobs -- the ones that only run a few times a year -- are those workspaces cleaned up, or do they occupy several GB for several months?
Comment 20 Eike Stepper CLA 2011-02-02 01:19:18 EST
(In reply to comment #12)

> wipe _all_ workspaces clean

I only need my workspaces during and "some time" after actual build runs. The time after a build is needed to investigate possible build problems. If a build is triggered by SCM change our world-wide team distribution  may cause delays of some hours until I have the chance to investigate problems. But I could also just kick a fresh build if the gap is larger. I could live with my workspaces being wiped some hours after the last build. My jobs wipe them initially anyway.

> delete jobs that have not run in 60 days

I don't see how this adds value if the workspaces are being wiped periodically.

> limit the number of jobs one project can have

I don't see how this adds value if the workspaces are being wiped periodically.

> limit the number of kept builds per job

Depends on the number. My jobs keep less than 30MB per build and I prefer to keep a maximum of let's say 10 builds (some stable builds that have been promoted plus "volatile" CI builds). That makes a total of 600MB for my two jobs (given the workspaces are wiped out frequently). Would that be too much?
Comment 21 David Williams CLA 2011-02-02 22:47:59 EST
This is bad. And too many comments for me to follow it all ... but I can confirm this is a blocking problem. I just waited a few hours to (finally) get a green build and got a build failed, instead, with 

java.io.IOException: No space left on device

near the end of the log. At that point, I looked explicitly from a shell, and 

df /shared -h 

reports about 30% free ... so, assume someone (or a cron job?) cleaned up something, so will try the build again. (My jobs run on a 'slave', but mostly use disk space on 'shared'. 

The solution seems so simple. Get more diskspace. If it were me, I'd see how much is in use now, and plan on 5 times that much for the next 6 months or so. 

Then, if you are worried about too much "waste" over time, I think the measurement of waste needs to be based on time. If a job/project grows larger by 1000 Megs on 'shared' week after week after week after week, then that is a fairly obvious waste of resources (as far as I know). If, on the other a project uses a whole lot more, say 30G, but pretty much always uses that much (or less), then there is probably a real need there. (I picked 30G, since that's about the limit that WTP uses on 'shared' ... and we have for years and years). 

So, my suggestion is to get more disk space immediately, since this is blocking ... and then measure each project over time and spot the problem-projects that way. I'm sure I'm oversimplifying things and we all have a lot to learn (over time) ... but, something needs to be done and some plan put in place. I don't think "let things fail until someone cleans something up" is working. 

How can we help? Need a credit card number? :) 

Long term, I'd also suggest checking with some other OS projects (apache?) and see if they'd tell you how much disk they have per hudson job, etc. ... get some idea if we are in the same neighborhood? I tried to look myself, and didn't see any mention of disk space, at 

http://wiki.apache.org/general/Hudson

but sounds like they do have some scripts that _enforce_ setting time outs, etc. (And, yes, yes, they also have 5 hudson admins :) ... maybe a good measure is "disk space per admin" :)
Comment 22 Eclipse Webmaster CLA 2011-02-03 10:08:01 EST
I don't think the outage warning has anything to do with /shared, but is a error because slave1 has run out of space on the 'system' drive.  I say that because I've started getting errors out of postfix to that effect.

If you bind the job to 'fastlane'(since nobody should be using it, except for train type builds) does it still fail?

-M.
Comment 23 Denis Roy CLA 2011-02-03 10:23:41 EST
> df /shared -h 
> 
> reports about 30% free ... so, assume someone (or a cron job?) cleaned up
> something, so will try the build again. 

Just to be clear: /shared has never been full.  Ever.










But it is getting there.  That's 1 Terabyte for 'temporary storage'.
Comment 24 David Williams CLA 2011-02-03 10:33:05 EST
Since I lasted posted, but job ran fine, but now has failed again .. 

Guess the error msg is kind of deceiving then (no fault of yours). When you say "system" do you mean "/tmp" by any chance? I could imagine the "unpack" operation is trying to use /tmp? I'll try fastlane (but, the problem isn't exactly reproducible, so not sure a success there proves anything ... unless there's more that you see ... greatest thanks for your help.  

= = = = 

The following errors occured when building Helios:

org.eclipse.core.runtime.CoreException: Unable to unpack artifact osgi.bundle,org.eclipse.gmf.bridge.ui,1.3.0.v20101217-1532 in repository file:/shared/helios/aggregation/final/aggregate: Unable to read repository at file:/shared/helios/aggregation/final/aggregate/plugins/org.eclipse.gmf.bridge.ui_1.3.0.v20101217-1532.jar.pack.gz.
Caused by: java.io.IOException: No space left on device
Comment 25 Denis Roy CLA 2011-02-03 10:48:17 EST
> Guess the error msg is kind of deceiving then (no fault of yours). When you say
> "system" do you mean "/tmp" by any chance? I could imagine the "unpack"
> operation is trying to use /tmp?

Yes, it's having a hard time unpacking because the Hudson slave's disk is full... be it /tmp or its local workspace.  I've cleaned up old files in /tmp, but that hasn't really liberated lots of space.

Filesystem            Size  Used Avail Use% Mounted on
/dev/xvda1             55G   54G  1.9G  97% /
Comment 26 Denis Roy CLA 2011-02-03 10:51:06 EST
One solution that we could easily implement is to host each slave workspace area on /shared.  The Master's workspace is already there.

/shared/hudson/ws-slave1/
/shared/hudson/ws-slave2/
/shared/hudson/ws-fastlane/

The second benefit is that everyone could browse the Hudson workspaces directly from build.eclipse.org.  Currently, your only way of accessing your workspace is from the Hudson UI.

The problem is that /shared only has ~300G left, so it will be full in no time.
Comment 27 Denis Roy CLA 2011-02-03 11:20:55 EST
As a temporary measure, I'm in the process of creating two 750G virtual hard drives inside the 3T of space we have for download.eclipse.org.  I'll mount these drives inside hudson-slave1 and hudson-slave2 and move the workspaces to the new "drives".

This should give us some breathing room until we can figure this out.
Comment 28 David Williams CLA 2011-02-03 22:39:37 EST
I think both the previous ideas are pretty good, as temporary emergency "breathing room". Recent aggregation builds took twice as long as normal (4 hours instead of 2) but 1) even that's better than "out space" errors once or twice a day, and 2) Denis assures me that's just one data point in time, and it won't always be that slow :) 

I'm quite willing to help somehow if I can, but still not sure exactly what the problem is. I suspect there will be several prongs to the solution and will take some time to figure out a manageable process. 

Thanks so much for your help and temporary solution, especially during these last few weeks of Helios SR2, when the impact of failed builds is so disruptive.
Comment 29 David Williams CLA 2011-02-03 22:42:27 EST
I'm assigning this to "webmaster" primarily to avoid so much bugzilla mail going out, as is our convention, but don't mean to imply the problem is entirely the responsibility of our webmaster. But, we will obviously need his guidance as we improve our practices.
Comment 30 Denis Roy CLA 2011-02-04 09:01:19 EST
> As a temporary measure, I'm in the process of creating two 750G virtual hard
> drives inside the 3T of space we have for download.eclipse.org.

Both drives are created, and I'm rsync'ing the local Hudson workspaces from slave1 and slave2 to them.  When it's done, I'll take the slaves offline to resync, then mount the new workspace areas in their place.

That will take care of the disk issue in the immediate term and, although it will be slower than local disk access, it will provide us with the opportunity to observe the growth patterns of local workspaces.
Comment 31 Denis Roy CLA 2011-02-07 11:45:30 EST
slave1 and slave2 now have 750G (each, 1.5T total) of usable workspace.  As mentioned, this disk space is 'borrowed' from the downloads/archives area and is on a remote server, which will be slower than the local disk array.
Comment 32 Chris Aniszczyk CLA 2011-02-07 11:46:08 EST
What can we do to get you more space Denis?
Comment 33 Denis Roy CLA 2011-02-07 13:28:25 EST
I need money to purchase hardware, so I'm not sure what you can do in that area.

But before doing that, I'm interested in knowing how we can ensure the storage we have is used efficiently.
Comment 34 Nicolas Bros CLA 2011-02-10 08:37:41 EST
(In reply to comment #31)
> which will be slower than the local disk array

Indeed! The MoDisco nightly build started on Feb 8 on hudson-slave1 finished in 3h30. It used to complete in about an hour on hudson-slave1 using the local hard disk.

And the build started on Feb 9 on hudson-slave1 is still not finished today, *23 hours later* (it looks almost finished, although that could still be a few hours at this rate...).

Each operation is extremely slow, taking 1 or 2 orders of magnitude longer:

BUCKMINSTER SETPREF took : 175s (×10)
BUCKMINSTER IMPORT took : 16315s (×10)
BUCKMINSTER RESOLVE took : 23295s (×156)
BUCKMINSTER BUILD took : 5241s (×81)
BUCKMINSTER JUNIT took : 2006s (×3.5)
BUCKMINSTER PERFORM 'site.p2' took : 26135s (×47)

Compare this to the previous "normal" times:
BUCKMINSTER SETPREF took : 17s
BUCKMINSTER IMPORT took : 1563s
BUCKMINSTER RESOLVE took : 149s
BUCKMINSTER BUILD took : 64s
BUCKMINSTER JUNIT took : 567s
BUCKMINSTER PERFORM 'site.p2' took : 551s

And the other builds currently running on hudson-slave1 don't seem to be faring better:
cbi-wtp-inc-xquery : Started 18 hr ago
cdt-nightly : Started 17 hr ago
jetty : Started 22 hr ago
Comment 35 Denis Roy CLA 2011-02-10 14:43:41 EST
Please see bug 336864
Comment 36 David Williams CLA 2011-02-14 13:49:00 EST
I fully agree we can and should do a better job of "cleaning up", but I've learned it is not as easy as it sounds. For many reasons. But, thought I would document how I have learned the current "shortage" of space, besides the outright failures it causes, has caused other problems or delays. As people have mentioned, one of the reasons for using lots of space is to "cache" other's build/repos, etc. While it seems to some that should not be required, I've run across a few issues lately that show that it is, and that illustrate how it costs us "down time" when cached pre-reqs are cleaned up too aggressively. 

One was documented in bug 336897#c5. In that case, sounds like the "old pre-req" really did "disappear". tsk tsk.  

Another I hit when deleting an old GEF 3.4.2 pre-req from our WTP cache. Shortly after, someone needed a patch build with that old code, and that build failed. Sure, GEF had been moved to "archive" ... easy enough to know that, but then I discovered GEF apparently doesn't use the same URL conventions in 'archive' as it does in 'downloads' ... it is flatter in 'archive'. So, we had to change our build scripts to use the archive URL for that old version. No big deal ... but ... cost us about a day with failing builds, debugging, and rebuilding ... all to save 5 Megbytes in "extra" storage.  

The other complicated case is we sometime purposely use an "intermittent" build from one of our pre-reqs ... say an M build from January ... so, by February, those M builds have been deleted (not even moved to archives) but yet we may not be ready to move up to a new M build yet. Sure, maybe we could, but some times are better than others, so best if we can pick the time, instead of just discovering suddenly that month-old M build was deleted by another project. 

I'm just saying cleaning up is complicated. Not that we should not make efforts to stay cleaner.

Long term, one thing that might help in the Hudson workspace case, is have an "opt-out" (or opt-in?) plugin that "cleans workspace" ... with some variable field, such as "after n days". So, if someone knows they can not/should not clean their workspace, they can "opt out", whereas others will have their workspaces cleaned after n number of days of no modifications ... so, would only effect those not very active, Just an idea ... and I know, I know ... someone would have to write it :) (if it doesn't already exist). 

Given comments elsewhere, where not everyone has their own project space on /shared, maybe it'd be easier to have a special space on "shared" with similar permissions as "/tmp" so any Hudson job could have their own writable "cachedprereq" space on /shared without webmasters having to explicitly define a /shared directory for each and every project or job that needs it. Say /shared/cachedprereqs and encourage people to use distinct subdirectory, such as their job name? Then, unlike /tmp it would not be cleaned up every 14 days. And ... the cache subdirectories could be monitored or reviewed from time to time to see if anyone was getting carried away or using it for things other than caching build requirements? 

Just some thoughts. Just trying to help.
Comment 37 Martin Oberhuber CLA 2011-03-01 11:43:04 EST
What's the current status here?

- Do people feel that Denis' "nag E-Mails" helped or are we still at risk?
- Do we need a "how to keep Hudson clean" HOWTO?
- Anything else?

Having Hudson run out of disk space causes lots of disruption and waste so we should really get this fixed. From a high level, it looks like there's 3 possible measures, (a) educate folk, (b) create software/tools to help stay clean, (c) just add disks.

Have we done all the easy steps?
Adding AC for consideration.
Comment 38 Nicolas Bros CLA 2011-05-20 10:06:08 EDT
Hi,

I've done my best to reduce the disk space used by the Hudson jobs I'm responsible for (MoDisco & EMF Facet).

I'd like to remove the built zip once it's archived by Hudson ("Archive the artifacts"). The problem is that it is a "Post build action", so that I can't execute anything after that.

I've found that there is a plug-in to allow this:
http://wiki.hudson-ci.org/display/HUDSON/Post+build+task

Would it be possible to install it?
Or is there another way?
Comment 39 Denis Roy CLA 2014-02-11 10:22:10 EST
Almost three years later, I think we're in a happy place.  Of course, additional terabytes of storage and HIPP make this much easier.