Community
Participate
Working Groups
Currently, in Eclipse Help the search result ranking is calculated by hits per total words count for each topic/document separately. Unfortunately, in some cases irrelevant hits are listed before relevant hits. For example, in the search results for "console", Clear Console and Pin Console are displayed above Console View (see <http://help.eclipse.org/indigo/index.jsp?tab=search&searchWord=console&showSearchCategories=true&quickSearch=true&quickSearchType=QuickSearchToc&toc=/org.eclipse.jdt.doc.user/toc.xml>). TOC Hierarchy: If hit A is - according to the TOC - a subtopic of hit B then hit B should be boosted. Assumption: if a couple of topics are found in the same chapter then the main topic should be listed first. Linkage: Currently the link text is attributed (like normal text) to the topic that contains the link instead of - like in Google - the target topic. Especially, topics that contain the search expression in See also or Related topics section only should not be found.
Search ranking is a difficult bug to tackle. The new search processor extension point in 3.7 does allow for a modification of the search results before they are displayed to the user (including the ranked order) - that might help some of the known cases. However, for general term ranking, we would need some sort of criteria to affect the ranking. What in the help topic would suggest that Console View should be ranked higher then Clear Console?
(In reply to comment #1) > ... What in the help topic would suggest that Console View > should be ranked higher then Clear Console? > In the table of content Clear Console is a subtopic of Console View. If all found topics would boost their parent topic that is contained in the search result then the main topic of a chapter (in this example: Console View) will be ranked higher. See my blog post http://eclipsehowl.wordpress.com/2011/09/02/improving-the-eclipse-help-system-3-issues-and-my-2-cents/
This is an interesting idea. I think that there is a general impression that the search result ranking is not all that good and anything that could improve the ranking would be worth a try. The other problem is that it is not easy to determine whether the modified algorithm would give better results in the eyes of the users. I suppose that you could run both the old and new algorithms on a number of different searches and look for cases where the top hit changes and out of those cases see what percentage were improved and what percentage were worse when using the new algorithm.
(In reply to comment #3) > The other problem is that it is not easy to > determine whether the modified algorithm would give better results in the eyes > of the users. I suppose that you could run both the old and new algorithms on a > number of different searches and look for cases where the top hit changes and > out of those cases see what percentage were improved and what percentage were > worse when using the new algorithm. Is there a simple way to track how user work with the search results, i.e., give a query which elements did he actually visit? If we have such data (query, documents, ranking, and the information which element has been clicked), it's more or less pretty straight forward to evaluate which search algorithm works best. This, when run on the client side, requires some kind of usage data collector. On server side this is much easier. You just need to write a log file that gets evaluated offline...
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. If the bug is still relevant, please remove the "stalebug" whiteboard tag.