Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 371038

Summary: Search for phrases
Product: [ECD] Orion Reporter: Susan McCourt <susan>
Component: ServerAssignee: John Arthorne <john.arthorne>
Status: RESOLVED FIXED QA Contact:
Severity: enhancement    
Priority: P3 CC: john.arthorne, libingw
Version: 0.4   
Target Milestone: 1.0 M2   
Hardware: PC   
OS: Windows 7   
Whiteboard:

Description Susan McCourt CLA 2012-02-08 23:37:28 EST
I was searching for who defines the command "Show in Navigator" but when I typed it in the search box, it searched for "showinnavigator" and therefore got no hits.
Comment 1 libing wang CLA 2012-02-09 10:06:31 EST
As far as I know white space search is not yet supported.
I even tried to change it to "show in navigator" in the URL.
Server does hit something but it seems to me that it is searching the first word instead of the whole phrase. So if you click on the file and do find, you will hit nothing for "show in navigator".

But it is not the end of world. A trick we may want to play:
If the query contains white space (or other special chars), we can ask server to give back result on the first word. Then because we know we are getting more than expected, we are forced to do "in-file-search" for every file.
One side effect might be the mismatched total number and you may see less than 40 results per page but you still have to show them in multiple pages.
I know it is expensive on client side but better than nothing.
Comment 2 libing wang CLA 2012-02-09 10:20:26 EST
I think this "refining coarse result on client" can also be applied to case sensitive search, where server has to give a looser result.
The cost is that client has to ask for file content for each file.
But we are already doing file meta request anyway, which may be less expensive but I think they are the same level of cost. We can definitely skip meta data request if we will do this "refining".
Comment 3 John Arthorne CLA 2012-02-09 17:27:01 EST
I haven't had a chance to try, but according to Solr documentation using double-quotes around a set of words should perform a phrase query. The issue might be that we are escaping quote characters on the client?
Comment 4 libing wang CLA 2012-02-10 10:15:56 EST
(In reply to comment #3)
> I haven't had a chance to try, but according to Solr documentation using
> double-quotes around a set of words should perform a phrase query. The issue
> might be that we are escaping quote characters on the client?

We can change that, for instance:
User typed "foo" in the search box we still respect that by passing query as \"foo\". But {"foo" bar} was typed, we can use "\"foo\" bar" as the query.
I tried in Orion project by changing URL to q="return this", server gives back more than expected. 
Then I googled and found an article :
http://stackoverflow.com/questions/7887820/solr-dismax-handler-whitespace-and-special-character-behaviour
Seems that if we use double quoto and replace white space with -, it returns right thing.
I changed to q="return-this" , seemed this time it returns me the right result.
Further for my curiosity I used "return this.open", this time it returns me only one file, which evidenced my guess.
I believe there must be other configurations we can poke on server side but I think for now it should be ok.
Comment 5 Susan McCourt CLA 2012-03-01 11:28:59 EST
It would help me tremendously if we implemented this pretty soon.  I'm doing a lot of searches where I'm trying to find out who declared a particular command or tooltip, and the space matters.  Such as "Open with" "Show in" etc....
Comment 6 libing wang CLA 2012-03-02 10:18:40 EST
I tried the "foo-bar" theory again today but seems it is not always true. 
E.g. q="open-with" gives back the result but q="shown-in" does not(eclipse hits "shown in").
q="open with" is even worse without giving back any thing.

Two thoughts here:
1.When user types {foo bar}, we can ask server to search on "foo" which will give back more results. Then on the client side, in the in-file-search, we walk through all the result files and search on "foo bar". The files that do not contain "foo bar" will be marked as stale.
Pros and cons:
When search string contains white space in the middle, in-file-search will be forced but the performance should be equivalent to expand all.
A lot of stale file will be introduced but the result will be accurate. 

2.How about search again in search result?
When you type foo , it gives you 100 files. Then in the search result page you can search on "foo bar" within the 40 files. 
Actually I sometimes used this by the browser search in the result page.
Comment 7 libing wang CLA 2012-03-02 10:27:26 EST
(In reply to comment #6)
> 
> 2.How about search again in search result?
> When you type foo , it gives you 100 files. Then in the search result page you
> can search on "foo bar" within the 40 files. 
> Actually I sometimes used this by the browser search in the result page.

To clarify, I meant to say:
We can think about adding "search within the results" command in the tool bar.
It will not behave like the browser's CTRL+F but will generate a subset of the original results.
Comment 8 Susan McCourt CLA 2012-03-02 11:06:44 EST
(In reply to comment #7)
> (In reply to comment #6)
> > 
> > 2.How about search again in search result?
> > When you type foo , it gives you 100 files. Then in the search result page you
> > can search on "foo bar" within the 40 files. 
> > Actually I sometimes used this by the browser search in the result page.
> 
> To clarify, I meant to say:
> We can think about adding "search within the results" command in the tool bar.
> It will not behave like the browser's CTRL+F but will generate a subset of the
> original results.

There might be some cases where this would be useful as a regular use case.  But it seems kind of like a hack to make the user do this to solve this problem.

I think it would be good to get to the bottom of the server side...why the cases that aren't returning results are failing.  Then we can strategize a client side workaround once we understand what's going on.
Comment 9 libing wang CLA 2012-03-02 11:26:36 EST
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > 
> > > 2.How about search again in search result?
> > > When you type foo , it gives you 100 files. Then in the search result page you
> > > can search on "foo bar" within the 40 files. 
> > > Actually I sometimes used this by the browser search in the result page.
> > 
> > To clarify, I meant to say:
> > We can think about adding "search within the results" command in the tool bar.
> > It will not behave like the browser's CTRL+F but will generate a subset of the
> > original results.
> 
> There might be some cases where this would be useful as a regular use case. 
Right. It is kind of generic way to narrow down the results.

> But it seems kind of like a hack to make the user do this to solve this
> problem.
The original issue should be resolved at the first place. But my use case is:
When I started search I only know there is something related to "foo". Then in the result page I realized "foo bar" was the exact term I wanted to search on. but I agree this is another story.

> 
> I think it would be good to get to the bottom of the server side...why the
> cases that aren't returning results are failing.  Then we can strategize a
> client side workaround once we understand what's going on.
I am holding off for the client workaround till we understand completely what happens on the server side.
Comment 10 libing wang CLA 2012-05-22 10:10:39 EDT
We talked about different options before but I have not yet concluded a solution. I will keep this as reminder for RC2 tasks but if there is no quick fix I will put it post 0.5
Comment 11 libing wang CLA 2012-09-04 16:52:03 EDT
talked to John shortly. Possibly lucene has support for phrase. If not we will detect white space and switch to crawler .
Comment 12 John Arthorne CLA 2012-09-04 16:54:11 EDT
Some pre-processing we are doing on the server is preventing this from working.
Comment 13 John Arthorne CLA 2012-09-06 16:11:03 EDT
Released a fix:

http://git.eclipse.org/c/orion/org.eclipse.orion.server.git/commit/?id=606aaf04161aa800cd4584e47d3e19b3670fed81

Regression tests:

http://git.eclipse.org/c/orion/org.eclipse.orion.server.git/commit/?id=24143f5b2b43b2223304b580eb4ec64cb08e216e

However the client is doing a number of things before sending to the server that prevent this from working. In particular it is encoding the quotes, and removing whitespace between words. Moving back to Libing for that part.
Comment 14 libing wang CLA 2012-09-07 15:59:49 EDT
Fixed client side with http://git.eclipse.org/c/orion/org.eclipse.orion.client.git/commit/?id=96331b9881498b59c632d9b60a54e5ed73224b0a.

I did some tests but it seems server is still giving back redundant results.
Test case 1:
1.In navigator, drill in to orion client code
2.Type 'string type' in the search box.
3.It gives back 7 files but only one file has that 'string type'.

this is the response from the server 

Request URL:http://libingw.orion.eclipse.org:8080/filesearch?sort=Path%20asc&rows=40&start=0&q=%22string%20type%22+Location:/file/R/OrionClient/*
Request Method:GET
Status Code:200 OK
Request Headersview source
Accept:application/json
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Cookie:JSESSIONID=15hw0wqveur9i1vfs48ki167s7
Host:libingw.orion.eclipse.org:8080
Orion-Version:1
Referer:http://libingw.orion.eclipse.org:8080/plugins/fileClientPlugin.html
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.81 Safari/537.1
X-Requested-With:XMLHttpRequest
Query String Parametersview URL encoded
sort:Path asc
rows:40
start:0
q:"string type" Location:/file/R/OrionClient/*
Response Headersview source
Content-Encoding:gzip
Content-Length:644
Server:Jetty(8.1.3.v20120522)
Via:1.1 (jetty)
Comment 15 libing wang CLA 2012-09-07 16:03:04 EDT
If you do the same search by regEx on, you will get 5 files that all contains the search term.
Comment 16 libing wang CLA 2012-09-07 16:11:07 EDT
Good example :
If you search on 'items instanceof Array' , both indexer and crawler(regEx) gave the same result.
Comment 17 John Arthorne CLA 2012-09-18 16:46:24 EDT
I think the "string type" example is a tokenizer issue. All of the false matches contain "string" followed by "type", with only special characters in between. For example:

commands.js: {String} type

messages.js: string):": "Type

csslint.js: {String} type
Comment 18 John Arthorne CLA 2012-09-25 17:46:18 EDT
Since basic phrase searching is working in 1.0 M2, I am going to mark this fixed. I have opened bug 390393 for the case in comment #17.