| Summary: | can robots exclude list allow "link checker"? | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Community | Reporter: | David Williams <david_williams> | ||||
| Component: | Wiki | Assignee: | Eclipse Webmaster <webmaster> | ||||
| Status: | RESOLVED FIXED | QA Contact: | |||||
| Severity: | enhancement | ||||||
| Priority: | P3 | CC: | chris.guindon | ||||
| Version: | unspecified | ||||||
| Target Milestone: | --- | ||||||
| Hardware: | PC | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Attachments: |
|
||||||
|
Description
David Williams
We use robots.txt to try and prevent our servers from getting hosed by search engines/crawlers(at least those that respect robots.txt). What are you trying to achieve? -M. (In reply to comment #1) > We use robots.txt to try and prevent our servers from getting hosed by search > engines/crawlers(at least those that respect robots.txt). > > What are you trying to achieve? > > -M. Oh, sorry, I thought everyone knew what "w3c link checker" was :) Its a tool which scans the "links" in a page (usually tags like <a href="someURL">link</a> and provides a "summary" of links that are "broken" ... that is, the "someURL" no long exists or otherwise returns some error response. Then, the links can be fixed. Its a handy tool when a page has dozens (or hundreds) of links, especially when the page "lives" for years so some links that once worked no longer do. Actually, the "excluded list" seems much shorter today, when I tried again just now (did you fix already? :) Or, maybe I was reading it wrong yesterday?) The list of messages saying "link was not checked due to exclusion rule" only came from URLs for http://build.eclipse.org, such as http://build.eclipse.org/juno/simrel/reports/nonUniqueVersions.txt and https://bugs.eclipse.org/, such as https://bugs.eclipse.org/bugs/show_bug.cgi?id=217339 [This later one, for "bugs" would never break over time ... bugs aren't removed from database ... but, might still be broken due to typos]. This only effected about 7 links out of the 100 on the page, so isn't too bad to "check manually". So, at this point, consider a very low priority request (I still think it'd be nice to add to build.eclipse.org and bugs.eclipse.org. Not sure why it doesn't work there, but does on other wiki and main pages. I'll attach a "log" of the checking it does ... appears to always use HEAD requests. (I think there is a way to tell it to "check recursively" which, I suppose, should end up causing tens of thousands of HEAD requests, if "misused"). And feel free to close as won't fix if you fear "opening the flood gates" or anything. Like I said, it seems better today ... I would have sworn yesterday it was listing all 100 links as "excluded" ... I could have been looking at it wrong. Created attachment 208098 [details]
sample log from running "check links"
The check "w3c link checker" is currently working for me. Closing this bug! |