Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 209563

Summary: suggest robots.txt on build.eclipse.org
Product: Community Reporter: David Williams <david_williams>
Component: ServersAssignee: Eclipse Webmaster <webmaster>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard:

Description David Williams CLA 2007-11-12 17:10:11 EST
As far as I know, it could be 

User-agent: *
Disallow: /

That is, I don't know of anything that needs robots or crawlers on "build.eclipse.org". 

The potential problem I am seeing is that we have CruiseControl logs, etc., stored there, for up to 10 days (from time of the build). And, it's relatively normal for users/committers to 'request' these (via http). 

But, I see a lot of traffic of requests being made for old logs that have already been deleted ... which would not happen from an actual user at their browser. 

I'm sure there could be several explanations but one is that is is robots that have at one time snagged the file name, and then is going back later to do some sort of indexing (or, check if it's still there). 

This traffic causes "not found" exceptions in our cruise control log, and it can become pretty cluttered with this noise. 

So, not critical, but suspect it'd be a good practice ... unless someone has something there what should be indexed by some gophers, in which case feel free to restrict the restriction, and disallow only /shared/webtools/

Thanks,
Comment 1 Denis Roy CLA 2008-05-28 14:40:35 EDT
Agreed. Done.