Community
Participate
Working Groups
The search indexer currently has a hard-coded list of binary file types to avoid. It will eventually need code to detect binary vs. text content so it can index all text content regardless of file extension.
Note to myself. See: org.eclipse.search.internal.core.text.TextSearchVisitor#processFile for the platform code that determines if a file is binary for the purpose of searching.
Note in bug 348040 we switched from a blacklist of types to ignore to a whitelist of types to search. This was to address a problem where the search indexer was consuming unreasonable amount of CPU. Either way the problem remains of adding the proper binary vs text detection.
*** Bug 421692 has been marked as a duplicate of this bug. ***
Folks, this is a biggie. Pardon if I tweak the severity. The only workaround requires forking the Orion code base. I wouldn't expect anything fancy, but this list of searchable extensions needs to be externalized somehow so language tools built on top of Orion can provide search support. Making this configurable in orion.conf would be sufficient.
There are a few options here: 1) Invert the default assumption so that we index all files that are not known as binary, rather than only indexing files that are known as text. This will mean we waste a bit of resources attempting to index some files that are binary but we failed to detect it. 2) A server side extension like Rafael mentions. It is a quick solution but it does not help with client-side extensibility - say someone plugs language tooling into orionhub but doesn't have control over the server. 3) Consume Eclipse content type infrastructure and do intelligent analysis on whether file is binary or text. Potentially more expensive, but more likely to be right and is also pluggable on server side at least. I am currently thinking I will do 1) for 5.0 M2, mainly because that's all I have time to do. Rafael if you have interest in contributing something more sophisticated it would be welcome.
I would really like us to spend whatever time it takes to fix this. The current situation burns me basically *every* time I start looking at a new file type.
(In reply to Mike Wilson from comment #6) > I would really like us to spend whatever time it takes to fix this. The > current situation burns me basically *every* time I start looking at a new > file type. Fixed Bug 438727. Crawler now searched all the file types except the known binary and image files.
(In reply to John Arthorne from comment #1) > Note to myself. See: > > org.eclipse.search.internal.core.text.TextSearchVisitor#processFile for the > platform code that determines if a file is binary for the purpose of > searching. We need update the new search to skip binary files.
Added a very small update with the following commit that looks to do the trick: http://git.eclipse.org/c/orion/org.eclipse.orion.server.git/commit/?id=d7ed9685582288db9ec888d7652deec09712d34e