This Bugzilla instance is deprecated, and most Eclipse projects now use GitHub or Eclipse GitLab. Please see the deprecation plan for details.
Bug 407065 - [DBCS4.3] case-insensitive search does not work with Turkish dot less small i and dot capital I
Summary: [DBCS4.3] case-insensitive search does not work with Turkish dot less small i...
Status: VERIFIED FIXED
Alias: None
Product: Orion (Archived)
Classification: ECD
Component: Editor (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows 7
: P3 major (vote)
Target Milestone: 3.0 M2   Edit
Assignee: Bogdan Gheorghe CLA
QA Contact:
URL:
Whiteboard: GVT
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-02 07:57 EDT by Akihiko Takajo CLA
Modified: 2013-06-07 07:56 EDT (History)
7 users (show)

See Also:


Attachments
Turkish test data (129 bytes, text/plain)
2013-05-02 07:57 EDT, Akihiko Takajo CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Akihiko Takajo CLA 2013-05-02 07:57:50 EDT
Created attachment 230398 [details]
Turkish test data

case-insensitive search (Ctrl+F) in Orion Editor,
ı (dot less small i) does not match with I (dot less capital i)
İ (dot capital i) does not match with i (dot small i)

Steps:
1. create a text in Orion Editor (Please copy and past from attached text)
2. search with Turkish dot less small i "ı"
=> does not hit "I"

cf.
In eclipse editor, "ı" hit both dot and dot less i "ı I i İ".
Comment 1 Mark Macdonald CLA 2013-05-02 12:07:10 EDT
IIRC we rely on the browser's native RegExps for dealing with case sensitivity. And RegExp doesn't perform any intricate i18n case mapping, for example:

> new RegExp("İ", "i" /*case insensitive*/).exec("i")  // null

We'd have to move to something like [1] to get the correct behavior for non-regex searches.

[1] http://ecma-international.org/ecma-402/1.0/
Comment 2 Mark Macdonald CLA 2013-05-06 11:24:10 EDT
Actually, I guess JavaScript does have what we need already, independent of ECMA-402. We'll have to fix the editor Find implementation to use locale-aware methods of String.
Comment 3 Bogdan Gheorghe CLA 2013-05-06 17:35:40 EDT
One approach is to convert everything to lowercase but this has performance implications for large files. There doesn't seem to be anything else that the browser RegEx engine can do to help.
Comment 4 Bogdan Gheorghe CLA 2013-05-29 11:56:53 EDT
We released a workaround for this problem, please give it a try.
Comment 5 Akihiko Takajo CLA 2013-05-31 05:20:54 EDT
Hi, I tested on I20130530-2250.
The behavior is changed but it is not correct yet when case insensitive search with "i" and "I".

search word (case insensitive)  |  ı   |   I  |   i  |   İ  
====================================
ı (dot less small i)                       |  o  |  o  |  o  |  
I (dot less capital i)                    |      |  o  |  o  |       
i  (dot small i)                             |      |  o  |  o  |       
İ  (dot capital i)                          |      |  o  |  o  |  o

expected result (ideal):
search word (case insensitive)  |  ı   |   I  |   i  |   İ  
====================================
ı (dot less small i)                       |  o  |  o  |     |  
I (dot less capital i)                    |  o  |  o  |     |       
i  (dot small i)                             |      |      |  o |  o       
İ  (dot capital i)                          |      |      |  o |  o

on IES4.3
search word (case insensitive)  |  ı   |   I  |   i  |   İ  
====================================
ı (dot less small i)                       |  o  |  o  |  o  |  o  
I (dot less capital i)                    |  o  |  o  |  o  |  o       
i  (dot small i)                             |  o  |  o  |  o  |  o       
İ  (dot capital i)                          |  o  |  o  |  o  |  o  

If ideal result is difficult to implement, I think it is ok to follow the IES result.
Comment 6 Bogdan Gheorghe CLA 2013-05-31 15:00:37 EDT
OK, we released a patch to match the IES behavior.

As you know, the IES behavior is not strictly correct as dotless i should match I, and i should match capital dotted I.

Unfortunately, we aren't able to implement the real behavior without first knowing the language of the browser (well we can, but it breaks search for English text). We looked at various ways of determining browser language and weren't able to find a method that would work for all browser on all platform. For future reference, it seems that one method that others use that could possibly work is to add some sort of backend support since all browser send the language as part of the request header. Perhaps there is some way for the file client to forward that language info along with the actual file contents.
Comment 7 Akihiko Takajo CLA 2013-06-07 07:56:43 EDT
verified with I20130606-2230