Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 366212

Summary: Searching orion client for the keyword _selection shows false hits
Product: [ECD] Orion Reporter: Ken Walker <ken_walker>
Component: ServerAssignee: John Arthorne <john.arthorne>
Status: CLOSED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: john.arthorne, johnjbarton, libingw
Version: 0.3   
Target Milestone: 0.4 M2   
Hardware: PC   
OS: Mac OS X - Carbon (unsup.)   
Whiteboard:
Attachments:
Description Flags
Ignore, wrong bug none

Description Ken Walker CLA 2011-12-09 10:50:34 EST
If I navigate to my client folder "org.eclipse.orion.client" then type "_selection" in the search field and search then I get back many results

(Files 1-35 of 35 found by keyword _selection in:	NavigatorSitesPluginsRepositories root / org.eclipse.orion.client)

However, if I try to expand some of these results I then the particular item name changes to item name plus (0 matches) so it's really a false match?
Comment 1 Ken Walker CLA 2011-12-09 10:53:42 EST
If I search for "selection" without the underscore, the results are all legitimate.

I noticed that in the "_selection" query that selection.js does in fact have search results that I expected.
Comment 2 Ken Walker CLA 2011-12-09 11:03:49 EST
Created attachment 208173 [details]
Ignore, wrong bug

ScreenShot of bug
Comment 3 Ken Walker CLA 2011-12-09 11:05:53 EST
Sorry wrong bug for the attachment
Comment 4 libing wang CLA 2011-12-09 11:14:33 EST
I tried both orion.eclipse and local host with _selection and other keywords starting with "_" as well.

Search engine always gives back some files that do not contain that key word.
I am wondering if a word starting with "_" means something else than just literal
. I even tried to escape it in the URL but still get he same result.

Possible solution:
1.Instead of doing "on demand in file search", do a full round of in-file-search for all the result and kick the mismatched out. But this will give a reduced number of files in the page.

2.We are already doing staled file check when a file's time stamp is different than the one returned from the indexer. We can try to do so for all the files without checking time stamp and grey them out instead of removing out. But this will give user a wrong implication that the grey file will be gone later.
Comment 5 John Arthorne CLA 2011-12-09 12:05:15 EST
Lucene processing on the query might also be munging this. I can take a look in M2.
Comment 6 libing wang CLA 2011-12-09 13:16:23 EST
(In reply to comment #5)
> Lucene processing on the query might also be munging this. I can take a look in
> M2.

Even so we may still want a second level hit-test in case there is a hole in Lucene. "0 matches" gives bad first impression.
You can kind of simulate this second level hit-test and see performance by clicking the "expand all" action on the right of the tool bar, right after the page loads up.
Comment 7 libing wang CLA 2011-12-19 10:52:50 EST
John, I put this depending on bug 366847.
But if you think the fix will just hit the right thing and remove the irrelevant hits, please make it dup.
Comment 8 John Arthorne CLA 2011-12-19 11:37:46 EST
The tokenizer on the server is discarding the leading underscore from the query, so it is just searching for "selection".
Comment 9 John Arthorne CLA 2011-12-19 11:52:22 EST
If we avoid the tokenizer that removes non-alphanumeric characters, it doesn't find a match on "_selection". This is because the document actually contains the word, "this._selection". If you search for "*_selection" you will get the smaller set of results that you expect.

I am thinking we should automatically add leading and trailing * to the query so that we match partial word segments.
Comment 10 libing wang CLA 2011-12-19 12:09:30 EST
(In reply to comment #9)
> If we avoid the tokenizer that removes non-alphanumeric characters, it doesn't
> find a match on "_selection". This is because the document actually contains
> the word, "this._selection". If you search for "*_selection" you will get the
> smaller set of results that you expect.
> 
> I am thinking we should automatically add leading and trailing * to the query
> so that we match partial word segments.

If we automatically add leading and trailing *, we will have to always use reg express for the "in file search" unless the URL tells us what the user originally typed in. 
This will significantly affect the match highlighting as the range gets wider.
E.g. "fullLineRedraw = ((getStyle() & SWT.FULL_SELECTION" will be highlighted instead of "_SELCTION".

Maybe we really need the original keyword as a special parameter in the search URL.
Comment 11 John Arthorne CLA 2011-12-19 13:53:31 EST
(In reply to comment #10)
> If we automatically add leading and trailing *, we will have to always use reg
> express for the "in file search" unless the URL tells us what the user
> originally typed in. 

I meant adding the wildcard on the server. I have tried this and it seems to work quite well. The client isn't aware of the wildcard but it doesn't need to be, since it is already doing a raw search of the file contents. I think leading/trailing wildcards shouldn't have any effect in the client's search.
Comment 12 John Arthorne CLA 2011-12-19 14:00:48 EST
I have pushed a fix for this that is looking good to me... please give it a spin in the next build:

http://git.eclipse.org/c/orion/org.eclipse.orion.server.git/commit/?id=a881e8b3c0899a91ca245a6f8d1a6d91ad7a738f
Comment 13 John Arthorne CLA 2011-12-19 14:01:46 EST
Marking fixed.
Comment 14 libing wang CLA 2011-12-19 16:10:13 EST
(In reply to comment #11)
> (In reply to comment #10)
> > If we automatically add leading and trailing *, we will have to always use reg
> > express for the "in file search" unless the URL tells us what the user
> > originally typed in. 
> 
> I meant adding the wildcard on the server. I have tried this and it seems to
> work quite well. The client isn't aware of the wildcard but it doesn't need to
> be, since it is already doing a raw search of the file contents. I think
> leading/trailing wildcards shouldn't have any effect in the client's search.

Nice. I just quickly tried it out.
The best part I like is that it treats the keyword as "occurs any where" instead of "occurs as whole word". Realistically, "matching partial word segments" is very useful for me. Before this fix, it blocked me in some cases where I can't really remember the whole word of the keyword and had to try really hard.
Comment 15 John Arthorne CLA 2012-01-12 09:56:34 EST
*** Bug 366755 has been marked as a duplicate of this bug. ***
Comment 16 Ken Walker CLA 2012-01-12 11:30:34 EST
I just tried this again and it seems much better.  I only get back 5 hits now.