Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 334699

Summary: [search] index holds stale references to external locations that have been deleted
Product: [ECD] Orion Reporter: John Arthorne <john.arthorne>
Component: ClientAssignee: Jay Arthanareeswaran <jarthana>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: jarthana, malgorzata.tomczyk, susan
Version: 0.2Keywords: helpwanted
Target Milestone: 0.2Flags: john.arthorne: review+
Hardware: PC   
OS: Windows 7   
Whiteboard:
Attachments:
Description Flags
Proposed Patch jarthana: review?

Description John Arthorne CLA 2011-01-18 15:44:12 EST
the search server indexes content in the external locations.  Good.
But if I remove an external location, the search server seems not to know this.  Bad.
Comment 1 Susan McCourt CLA 2011-03-10 12:29:50 EST
*** Bug 339559 has been marked as a duplicate of this bug. ***
Comment 2 Susan McCourt CLA 2011-03-10 12:31:47 EST
the dup bug also mentions duplicate entries in the dialog.

>----- The file displays twice, the file links differ: one has root project
>directory uppercase and other lowercase
Comment 3 Jay Arthanareeswaran CLA 2011-03-28 07:48:36 EDT
I will take a look. We either have to keep the search framework in loop during  delete operations or during indexing, look for modified folders and re-index them it that's allowed.
Comment 4 Jay Arthanareeswaran CLA 2011-04-05 02:07:08 EDT
(In reply to comment #3)
> I will take a look. We either have to keep the search framework in loop during 
> delete operations or during indexing, look for modified folders and re-index
> them it that's allowed.

It's impossible for the search framework to know about deleted resources all the time. So that leaves us with only the second option and I can think of two approaches.

1. Before returning the search results, find whether the resources exists and if it doesn't remove that from the result and from the search index. The problem here is this will make the search operations slower.

2. In the background job, look for all the modified folders and remove clear the index for them. The problem with this is we may never be able to find out the exact delta and this will result in redundant work.

Anyone have any thoughts on the above?

The best way forward would be to have a resource change notification. Or do we already have one?
Comment 5 John Arthorne CLA 2011-04-05 09:02:41 EDT
There are no resource change events. The deletion could easily happen in another process so an event within a single server process won't necessarily help.

Currently the indexer crawls the file system and compares to the index. I think we would also need to do the inverse: crawl the index and see if each entry exists in the file system. If it is very expensive this could be done less frequently than the file->index synchronization. However I think if we could do this comparison it would be much more efficient than flushing the index and recomputing on any folder change.
Comment 6 Jay Arthanareeswaran CLA 2011-04-08 10:18:36 EDT
Created attachment 192837 [details]
Proposed Patch

Patch contains a new job as John suggested. Hope the patch is alright as this is my first using Git.
Comment 7 John Arthorne CLA 2011-04-08 15:47:57 EDT
Thanks Jay, the patch looks great. I tried it out on a server with about 20MB of files, and the purge job took 200ms vs 1.7s for indexing, so it is actually quite efficient. However I still like the idea of them being separate jobs so we can control their frequency independently if desired. I have set the default purge period to be 30s rather than the 3m that you wrote. Otherwise the fix works well and the patch is good!