| Summary: | p2 repository links on project download pages confuse some users | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Community | Reporter: | Wayne Beaton <wayne.beaton> | ||||||||
| Component: | Architecture Council | Assignee: | eclipse.org-architecture-council | ||||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||||
| Severity: | normal | ||||||||||
| Priority: | P3 | CC: | anthony.dahanne, caniszczyk, contact, david_williams, denis.roy, dennis.huebner, heiko.boettger, jan.sievers, lerch, matthew, mknauer, mober.at+eclipse, pascal, remy.suen, sbouchet, slewis | ||||||||
| Version: | unspecified | ||||||||||
| Target Milestone: | --- | ||||||||||
| Hardware: | PC | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | stalebug | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Wayne Beaton
This is currently what we do now: http://download.eclipse.org/egit/updates/ A default html page seems like a reasonable thing to do. There is an XSLT available that creates a static web page displaying the repository contents. This can be quite useful: http://wiki.eclipse.org/Equinox_p2_Browsable_Repository_Index (In reply to comment #1) > This is currently what we do now: > > http://download.eclipse.org/egit/updates/ > > A default html page seems like a reasonable thing to do. I just get a standard 404. Am I missing something? (In reply to comment #2) > There is an XSLT available that creates a static web page displaying the > repository contents. This can be quite useful: > > http://wiki.eclipse.org/Equinox_p2_Browsable_Repository_Index That looks very promising to me. When I read the bug report I thought about 'someone' who could create a PHP page that parses the content of the p2 repository and presents it to the users in a nice way, but this (XSLT + static HTML) is even better. I wonder if this is something that can be done on the server. Can we intercept the 404 and generate a reasonable page if the directory looks like an update site? We could just look for a plugin/features directory; or the existance of a site.xml file would be a dead giveaway. There will be (reasonable) push-back from webmaster as I believe we try to keep the 404 experience lightweight due to the rather large volume of them we get in a day. Just putting the idea out there. +1 for making p2 repos browsable by transforming with the XSLT. (In reply to comment #5) > I wonder if this is something that can be done on the server. Can we intercept > the 404 and generate a reasonable page if the directory looks like an update > site? We could just look for a plugin/features directory; or the existance of a > site.xml file would be a dead giveaway. There will be (reasonable) push-back > from webmaster as I believe we try to keep the 404 experience lightweight due > to the rather large volume of them we get in a day. Just putting the idea out > there. Web servers typically have an order of files to search when the request path is a directory (index.html, index.php, etc). I don't know if there is a way to hook an XSLT into that, but if it's possible then surely Denis knows how. The problem is that since it involves unzipping the content.jar it is relatively expensive to do on demand. This is much more efficient when done at build time. Maybe an alternative would be for the 404 page to link to instructions for projects to do this themselves. (In reply to comment #7) > Web servers typically have an order of files to search when the request path is > a directory (index.html, index.php, etc). I don't know if there is a way to > hook an XSLT into that, but if it's possible then surely Denis knows how. The > problem is that since it involves unzipping the content.jar it is relatively > expensive to do on demand. This is much more efficient when done at build time. > Maybe an alternative would be for the 404 page to link to instructions for > projects to do this themselves. Any discussion of generating these on the fly is probably academic for reasons I stated in comment 5. FWIW, I carefully avoided suggesting that we unzip the content.jar. The current 404 page states that the link may be an update site (its the first bullet). Projects committers don't tend to find these sorts of errors because they don't tend to make them (they know what the link means). In the end I believe that we need to either create a best practice to include a reasonable index.(html|php) with the repository, modify the build technology to generate something automatically, or create a post-processor (Maven task) that breaks open the content.jar and generates the file. I'm sort of liking the last option. Maybe we can do something in Dash that generates a consistent page with eclipse.org branding, links back to the project, etc. And we should not forget that many (most?) projects are organising their repositories with composite repositories; this case should be handled as well. Or it could be implemented in a 'static' way, similar to the file system corrections on the download area that are done by a script from time to time. Whenever the script recognises a p2 repository it generates the required customised directory listing. (In reply to comment #9) > Or it could be implemented in a 'static' way, similar to the file system > corrections on the download area that are done by a script from time to time. > Whenever the script recognises a p2 repository it generates the required > customised directory listing. +1 I like options that don't require that I stalk 250+ projects. We can just look for content.jar files without a corresponding index.(html|php) file. In case this info can be helpful...
Download.e.o serves between 4 million to 6 million 404's per day; most are from bots or have a Java/ browser signature.
Our DirectoryIndex is this:
DirectoryIndex index.html index.php site.xml
If a suitable index is not found, a 404 handler is intercepted with:
ErrorDocument 404 /errors/404.php
The 404.php page examines the browser signature. If it's a bot or Java/, a minimalist 404 message is sent to reduce bandwidth. If it's from a browser, you see a pretty page with words on it like this one.
<?php
$browser = $_SERVER['HTTP_USER_AGENT'];
if(
strpos($browser, "Jakarta") !== FALSE
|| strpos($browser, "Java/") !== FALSE
|| strpos($browser, "Slurp/") !== FALSE
|| strpos($browser, "msnbot/") !== FALSE
|| strpos($browser, "Googlebot/") !== FALSE
|| strpos($browser, "apacheHttpClient") !== FALSE
|| strpos($browser, "Baiduspider") !== FALSE
|| strpos($browser, "Apache-Maven/") !== FALSE
) {
echo "404 Not Found";
}
else {
include($_SERVER['DOCUMENT_ROOT'] . "/errors/404.html");
}
?>
We discussed this on the AC call today: http://wiki.eclipse.org/Architecture_Council/Meetings/September_15_2011 Essentially, we felt that the current 404 page is a big improvement already and probably good enough for most end users. The extra XSLT would mostly provide value for committers to validate repos and their content. One idea was to add a little section to the current 404 page like this: "If you are a committer and want to improve this page, here is what you can do: (link into wiki)" A Wiki page there would then explain how to add an index.html to the repo for providing project-specific instructions for end users, and/or how to add XSLT for providing more info for committers. Like the TM repo: http://download.eclipse.org/tm/updates/3.3 http://dev.eclipse.org/viewcvs/viewvc.cgi/org.eclipse.tm.rse/releng/org.eclipse.rse.updatesite/?root=Tools_Project Who could draft such instructions on the Wiki, such that the link can be added to the 404 page ? (In reply to comment #12) > Who could draft such instructions on the Wiki, such that the link can be added > to the 404 page ? http://wiki.eclipse.org/Equinox_p2_Browsable_Repository_Index Created attachment 203544 [details]
webpage for a p2 repo
Matt and I have been working on a tool that generates a webpage from a p2 repo. Find attached an example of the generated webpage.
(In reply to comment #14) > Created attachment 203544 [details] > webpage for a p2 repo > > Matt and I have been working on a tool that generates a webpage from a p2 repo. > Find attached an example of the generated webpage. Very nice. Can you add some words that describe what the user is looking at and how they might go about actually getting the software? Is your implementation something that can be easily branded? FWIW, use of Dojo in this context probably falls into a gray area and will *probably* not require a CQ. (In reply to comment #15) > (In reply to comment #14) > > Created attachment 203544 [details] [details] > > webpage for a p2 repo > > > > Matt and I have been working on a tool that generates a webpage from a p2 repo. > > Find attached an example of the generated webpage. > > Very nice. Can you add some words that describe what the user is looking at and > how they might go about actually getting the software? > > Is your implementation something that can be easily branded? > > FWIW, use of Dojo in this context probably falls into a gray area and will > *probably* not require a CQ. Looks very nice ! the Help button provides a direct link to eclipse help describing how to install with p2 repo, users can copy paste the adress bar link to eclipse. I suggest also to have a title like "hello, you are trying to browse a p2 repository, below are the content of the repository. please use Eclipse to install these projects. " or similar. Branding should be possible, the code uses StringTemplate to build the site from a template so it ought just be a matter of determining where people might want to customize it. (In reply to comment #14) I agree the webpage looks very nice. Compared to the not-so-fancy XSL from http://wiki.eclipse.org/Equinox_p2_Browsable_Repository_Index however it doesn't really answer the original question "What exactly is inside a particular p2 repository?" Information on feature version is missing and there is no information about contained bundle IUs (maybe also contained features). IMHO this is an important scenario if I get a resolution error from p2. I'm not looking for categories in this case, but rather for a very specific version of a bundle or feature. Can we add this information to the generated webpage? Also, in the original HTML it was easy to just use Ctrl-F to search on the page. This doesn't work with the collapsed category tree. Not sure if a search engine bot would index all invisible (unexpanded) content of the dynamic page. if the webpage generator could be made available in some form (e.g. as a p2 publisher or similar), I would like to integrate this into tycho. We could add a flag "generateIndexHtml" to the maven plugin wich publishes the p2 repo. At the moment I have the code on Github for simplicity: https://github.com/mpiggott/ca.piggott.p2.webview/ (In reply to comment #18) > (In reply to comment #14) > I agree the webpage looks very nice. > > Compared to the not-so-fancy XSL from > http://wiki.eclipse.org/Equinox_p2_Browsable_Repository_Index > > however it doesn't really answer the original question > > "What exactly is inside a particular p2 repository?" > > Information on feature version is missing and there is no information about > contained bundle IUs (maybe also contained features). The versions are present but not currently shown, originally I had written it with the idea of building an xml document for the new p2 import feature. I've intentionally limited the amount of information encoded into the page, we need to be reasonably careful so that larger repos don't end as large as the content.xml both for downloading and rendering. I believe at the moment I've only included categorized features. For smaller repositories this may be a little too aggressive, particularly if people are really looking for transitive dependencies. > > IMHO this is an important scenario if I get a resolution error from p2. I'm not > looking for categories in this case, but rather for a very specific version of > a bundle or feature. > > Can we add this information to the generated webpage? > > Also, in the original HTML it was easy to just use Ctrl-F to search on the > page. > This doesn't work with the collapsed category tree. This is true, I'm not sure if Dojo has a solution for filtered trees, though it might also be possible to add something to toggle between the tree and a flat list. > Not sure if a search engine bot would index all invisible (unexpanded) content > of the dynamic page. I'm not a SEO expert, but I believe as I'm outputting the IUs into standard html elements in the body of the page Google shouldn't have any issues indexing it. (Some simple custom javascript builds a Dojo model and programmatically creates the tree.) This does remind me I've meant to include something like 'p2 webview' in the page to help find indexed pages. (In reply to comment #20) > I've intentionally limited the amount of information encoded into the page, we > need to be reasonably careful so that larger repos don't end as large as the > content.xml both for downloading and rendering. I believe at the moment I've > only included categorized features. For smaller repositories this may be a > little too aggressive, particularly if people are really looking for transitive > dependencies. the detail level we have right now seems appropriate for repos the size of eclipse release train repos. As you stated in comment #17, if the template file would be configurable, this would allow for customization e.g. in case I want more details. The template right now only has variables for categories and repo name. Rather than exposing all other possible variables in the existing antlr StringTemplate, I wonder if using an XSLT template instead would be easier as it gives you full access to the metadata. (removing the dependency to antlr) I'm currently working on an enhancement to this tool that will output two additional html pages that are only aimed at search engines. One page will contain the complete list of capabilities provided in a repository (aimed at helping ppl looking a for a missing requirement), the other will contain the complete list of artifacts found in the repository. As for allowing for extensibility of the template itself, I think it is simpler to let the user of the tool provide its own stringtemplate file. That said, there are certainly things that we can do to make the oobox template nicer. And I'm all for the addition of the tycho task :) Created attachment 204484 [details] p2 update site for the generator I just committed the support for generating a page with all artifacts and one with all provided capabilities. For the rest, we welcome pull requests :) Again the repo is available at: https://github.com/mpiggott/ca.piggott.p2.webview To use the attached application: eclipse -application ca.piggott.p2.site.generateWebsite -r http://download.eclipse.org/releases/indigo -output /Users/Pascal/tmp/gengen11 Created attachment 209136 [details] index.php listing directory contents I wrote an index.php file that lists the contents of the current directory. It's possible to expand subdirectories to get a quick overview what's in there. A help text provides the information that the user is currently viewing an update site and may wants to access it with eclipse. A preview can be seen here: http://download.eclipse.org/recommenders/builds Note that this link is not intended to be stable and that it's actually not pointing to an update site. Our update site gets cleared on each successful job so i need to deploy the index.php file from within the job. I'm currently working on that. That a funny thing what's going on here. Today I tried to find the latest director download. Last time I found this link on the updatesite url, which is also descibed in the wiki. (http://wiki.eclipse.org/Installing_Headless_Buckminster). A 404, means that the website is not available in the browser. This is okay, if there is no webpage, however removing information from existing updatesite urls, seems to be a bad idea. At least when doing such thing you should consider updating your own wiki and website. This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. -- The automated Eclipse Genie. seems duplicate of https://bugs.eclipse.org/bugs/show_bug.cgi?id=385523 ?? (In reply to Stephane Bouchet from comment #28) > seems duplicate of https://bugs.eclipse.org/bugs/show_bug.cgi?id=385523 ?? I think so. We've added much functionality to the 404 pages on download/archive.eclipse.org (In reply to Wayne Beaton from comment #0) > 1) encourage projects to include an index.html/index.php in their repos to > describe what happens/what the user should do. That has never worked. > 2) encourage projects to not make links to p2 repositories clickable. That has never worked. In bug 385523 we've addressed these use-cases: - Moved to archives: http://download.eclipse.org/technology/babel/update-site/R0.12.0/kepler/features/ - No index page, offering instructions and a Directory Browser: http://archive.eclipse.org/technology/babel/update-site/R0.12.0/kepler/features/ - No index for a p2 repo, with instructions on how to use the repo: http://archive.eclipse.org/technology/babel/update-site/R0.12.0/kepler/ Suggest closing this bug as FIXED. This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. -- The automated Eclipse Genie. I agree with comment 29. I think the "custom 404" work Denis has done here works very well and suspect users are no longer confused by "p2 links". The issue mentioned about "drag and drop" should probably have its own bug if anyone thinks that is worth a large scale effort. |