| Summary: | TUR4.2: Fails to create both dotless i and doted i in the same resource name e.g. file name, project name | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] Platform | Reporter: | Kentaroh Noji <kennoji> | ||||||
| Component: | Resources | Assignee: | Szymon Brandys <Szymon.Brandys> | ||||||
| Status: | VERIFIED FIXED | QA Contact: | |||||||
| Severity: | major | ||||||||
| Priority: | P3 | CC: | camle, daniel_megert, harendra, john.arthorne, kennoji, maedera, pwebster, Szymon.Brandys | ||||||
| Version: | 4.2 | ||||||||
| Target Milestone: | 3.8.1 | ||||||||
| Hardware: | PC | ||||||||
| OS: | Windows 7 | ||||||||
| Whiteboard: | |||||||||
| Bug Depends on: | |||||||||
| Bug Blocks: | 386507 | ||||||||
| Attachments: |
|
||||||||
|
Description
Kentaroh Noji
What error do you see in the Eclipse error log? Created attachment 214372 [details]
screen shot of turkish I
Changed the version from 4.1 to 4.2. (In reply to comment #2) > Created attachment 214372 [details] > screen shot of turkish I Could you also check if there is any error logged in Error Log and copy/paste it here? To open Error Log go to Window > Show View > Error Log. Created attachment 214435 [details]
Error log
I can not get any error log when I reproduce this error in General project, but I got an error log when I reproduce this error in Java project. I attached the error log of Java project.
(In reply to comment #5) When I run Eclipse with -nl "tr" I noticed the following: - "i".toUpperCase() returns "İ" - "ı".toUpperCase() returns "I" - but "i".equalsIgnoreCase("ı") returns true Digging deeper I noticed that #equalsIgnoreCase uses Character#toUpperCase methods. And this method does the following: - Character.toUpperCase('i') returns 'I' - Character.toUpperCase('ı') returns also 'I' That's why #equalsIgnoreCase returns tru for "i.txt" and "ı.txt" I tested IBM vm 6 and 7 and Oracle vm 6. private String findVariant(String target, String[] list) { for (int i = 0; i < list.length; i++) { if (target.toUpperCase().equals(list[i].toUpperCase())) return list[i]; } return null; } (In reply to comment #6) [I pressed submit to early] The simplest workaround I see is to change Resource#findVariant as follows: private String findVariant(String target, String[] list) { for (int i = 0; i < list.length; i++) { if (target.toUpperCase().equals(list[i].toUpperCase())) return list[i]; } return null; } (In reply to comment #7) > (In reply to comment #6) > [I pressed submit to early] > > The simplest workaround I see is to change Resource#findVariant as follows: > > private String findVariant(String target, String[] list) { > for (int i = 0; i < list.length; i++) { > if (target.toUpperCase().equals(list[i].toUpperCase())) > return list[i]; > } > return null; > } I think I will add the workaround during RC1, but on the other hand there may be other places where #equalsIgnoreCase is called and we can't stop using it. (In reply to comment #8) > I think I will add the workaround during RC1, but on the other hand there may > be other places where #equalsIgnoreCase is called and we can't stop using it. Unfortunately there do are other places in Eclipse SDK with the same problem, see Bug 380116. Closing it as NOT_ECLIPSE, we need a fix in IBM and Oracle jvms. Thank you. I understand that qualsIgnoreCase() is locale insensitive. It will be for locale insensitive function such as system-facing. According to the Java doc at http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#toUpperCase%28char%29, Character.toUpper()/toLower() is locale insensitive. As qualsIgnoreCase() uses Character.toUpper()/toLower(), qualsIgnoreCase() is locale insensitive as the result. The java doc describes that String.toUpper()/toLower() should be used for locale sensitive function. However, I found that Windows file system handles Turkish ı and İ uniquely."i.txt" and "I.txt" are case insensitive, but Turkish specific "İ (Dotted uppercase I)" and "ı (Dotless lowercase i)" seems to be case sensitive in Windows files name. Therefore, Window supports the following set of file names: (i.txt, İxt. ı.txt), (I.txt, İxt. ı.txt) For example, the following sample code can create the files (i.txt, İxt. ı.txt) in Turkish Windows. import java.io.File; import java.io.IOException; class CreateF{ public static void main(String args[]){ File newfile = new File("ı.txt"); // Dotless lowercase i File newfile2 = new File("i.txt"); // Dotted lowercase i File newfile3 = new File("İ.txt"); // Dotted uppercase i // File newfile4 = new File("I.txt"); // Dotted uppercase i try{ newfile.createNewFile(); newfile2.createNewFile(); newfile3.createNewFile(); // newfile4.createNewFile(); }catch(IOException e){ System.out.println(e); } } } Eclipse should be consistent with Windows' file system, not Java's equalsIgnoreCase() method because this is file system's issue. In addition, 380116 might not be related to equalsIgnoreCase() because the same problem happens with English i and I. For example, "int i" and "int I" generate the same setter/getter method such as getI() and setI(). > Thank you. I understand that qualsIgnoreCase() is locale insensitive.
How do you understand that? The Javadoc says nothing about ignoring the Locale.
(In reply to comment #11) > > Thank you. I understand that qualsIgnoreCase() is locale insensitive. > > How do you understand that? The Javadoc says nothing about ignoring the Locale. I mean that I guess that equalsIgnoreCase() is locale insensitive because it uses locale insensitive functions Character.toUpper()/toLower(). The java doc for equalsIgnoreCase at http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#equalsIgnoreCase%28java.lang.String%29 describes: Two characters c1 and c2 are considered the same ignoring case if at least one of the following is true: The two characters are the same (as compared by the == operator) Applying the method Character.toUpperCase(char) to each character produces the same result Applying the method Character.toLowerCase(char) to each character produces the same result and The java doc for Character.toUpperCase() at http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#toUpperCase%28char%29 describes: In general, String.toUpperCase() should be used to map characters to uppercase. String case mapping methods have several benefits over Character case mapping methods. String case mapping methods can perform locale-sensitive mappings, context-sensitive mappings, and 1:M character mappings, whereas the Character case mapping methods cannot. Therefore, equalsIgnoreCase() will be locale insensitive I think. Just in case, please note that I can creates i.txt, İxt. ı.txt files in the same folder in Windows Turkish environment. This is very interesting implementation because it is neither Turkish's casing rule, nor locale insensitive rule. Kentaroh mentioned that they would like to have it fixed by 3.8.1/4.2.1, so I think we can move the discussion post-3.8/4.2. Leaving the bug open for now. (In reply to comment #12) > (In reply to comment #11) > > > Thank you. I understand that qualsIgnoreCase() is locale insensitive. > > > > How do you understand that? The Javadoc says nothing about ignoring the Locale. > > I mean that I guess that equalsIgnoreCase() is locale insensitive because it > uses locale insensitive functions Character.toUpper()/toLower(). Point taken. Replacing String.equalsIgnoreCase(String) might have an impact on performance. This needs to be considered/measured. We can easily apply the fix I suggested in comment 8 and see how the performance is affected. Should we apply the same fix across Eclipse SDK though? I think that many devs can use equalsIgnoreCase not being aware how it really works. As it was already mentioned the method does not directly say it is not locale- aware. I think we should have a util method to use instead String#equalsIgnoreCase. (In reply to comment #15) > We can easily apply the fix I suggested in comment 8 and see how the > performance is affected. Should we apply the same fix across Eclipse SDK > though? I'd say we have over thousands. Guess we have to fix this case by case. (In reply to comment #17) > Fixed with > http://git.eclipse.org/c/platform/eclipse.platform.resources.git/commit/ > ?id=e7ee8e0432872d2c67ce80e81564aae885305a1f. Thank you, I will verify it when the build containing the fix is available. Verified via code inspection. |