Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 377379

Summary: TUR4.2: Fails to create both dotless i and doted i in the same resource name e.g. file name, project name
Product: [Eclipse Project] Platform Reporter: Kentaroh Noji <kennoji>
Component: ResourcesAssignee: Szymon Brandys <Szymon.Brandys>
Status: VERIFIED FIXED QA Contact:
Severity: major    
Priority: P3 CC: camle, daniel_megert, harendra, john.arthorne, kennoji, maedera, pwebster, Szymon.Brandys
Version: 4.2   
Target Milestone: 3.8.1   
Hardware: PC   
OS: Windows 7   
Whiteboard:
Bug Depends on:    
Bug Blocks: 386507    
Attachments:
Description Flags
screen shot of turkish I
none
Error log none

Description Kentaroh Noji CLA 2012-04-23 04:41:29 EDT
Build Identifier: I20120315-1300

OS: Windows 7 SP1 Professional Turkish Edition
JDK: java full version JRE 1.7.0 IBM Windows 32 build pwi3270-20110906_01
Locale: Turkish

Turkish script has dotless i(U+0131) and dotted i (U+0069) in lowercase characters, and dotless I (U+0049) and dotted I (U+0130) in uppercase characters. After creating dotted i.txt file in a project e.g., general project, I can not create dotless i.txt file in the same project. This symptom happens in dotless/dotted I (uppercase) characters, too. As well, project name has the same problem. 

Additional information about Windows 7 file system:
In Windows 7 file system (Turkish windows), I can create dotless lower i.txt(U+0131) and dotted lower i.txt(U+0069) in the same folder e.g., abc. 
As well, I can create dotless Upper I.txt(U+0049) and dotted Upper I.txt(U+0130) in another same folder e.g., abc2. 
However, dotted i (U+0069) and dotless I (U+0049) can not coexist in the same folder as well as English environment.

 

Reproducible: Always

Steps to Reproduce:
1. Unzip eclipse in any folder on Turkish Windows 7.
2. Start the eclipse, and create a general project. 
3. Create a file named i.txt in the project. 
4. Try to create a file named dotless i.txt (U+0131).txt. It fails.
Comment 1 Szymon Brandys CLA 2012-04-23 04:47:35 EDT
What error do you see in the Eclipse error log?
Comment 2 Kentaroh Noji CLA 2012-04-23 04:47:41 EDT
Created attachment 214372 [details]
screen shot of turkish I
Comment 3 Kentaroh Noji CLA 2012-04-23 04:56:18 EDT
Changed the version from 4.1 to 4.2.
Comment 4 Szymon Brandys CLA 2012-04-23 05:19:01 EDT
(In reply to comment #2)
> Created attachment 214372 [details]
> screen shot of turkish I

Could you also check if there is any error logged in Error Log and copy/paste it here? To open Error Log go to Window > Show View > Error Log.
Comment 5 Kentaroh Noji CLA 2012-04-24 03:50:35 EDT
Created attachment 214435 [details]
Error log

I can not get any error log when I reproduce this error in General project, but I got an error log when I reproduce this error in Java project. I attached the error log of Java project.
Comment 6 Szymon Brandys CLA 2012-05-02 08:49:11 EDT
(In reply to comment #5)
When I run Eclipse with -nl "tr" I noticed the following:
- "i".toUpperCase() returns "İ"
- "ı".toUpperCase() returns "I"
- but "i".equalsIgnoreCase("ı") returns true

Digging deeper I noticed that #equalsIgnoreCase uses Character#toUpperCase methods. And this method does the following:
- Character.toUpperCase('i') returns 'I'
- Character.toUpperCase('ı') returns also 'I'

That's why #equalsIgnoreCase returns tru for "i.txt" and "ı.txt"

I tested IBM vm 6 and 7 and Oracle vm 6.

private String findVariant(String target, String[] list) {
for (int i = 0; i < list.length; i++) {
			if (target.toUpperCase().equals(list[i].toUpperCase()))
				return list[i];
		}
		return null;
	}
Comment 7 Szymon Brandys CLA 2012-05-02 08:50:52 EDT
(In reply to comment #6)
[I pressed submit to early]

The simplest workaround I see is to change Resource#findVariant as follows:

private String findVariant(String target, String[] list) {
  for (int i = 0; i < list.length; i++) {
    if (target.toUpperCase().equals(list[i].toUpperCase()))
      return list[i];
    }
  return null;
}
Comment 8 Szymon Brandys CLA 2012-05-02 09:55:37 EDT
(In reply to comment #7)
> (In reply to comment #6)
> [I pressed submit to early]
> 
> The simplest workaround I see is to change Resource#findVariant as follows:
> 
> private String findVariant(String target, String[] list) {
>   for (int i = 0; i < list.length; i++) {
>     if (target.toUpperCase().equals(list[i].toUpperCase()))
>       return list[i];
>     }
>   return null;
> }

I think I will add the workaround during RC1, but on the other hand there may be other places where #equalsIgnoreCase is called and we can't stop using it.
Comment 9 Szymon Brandys CLA 2012-05-21 08:01:10 EDT
(In reply to comment #8)
> I think I will add the workaround during RC1, but on the other hand there may
> be other places where #equalsIgnoreCase is called and we can't stop using it.

Unfortunately there do are other places in Eclipse SDK with the same problem, see Bug 380116. Closing it as NOT_ECLIPSE, we need a fix in IBM and Oracle jvms.
Comment 10 Kentaroh Noji CLA 2012-05-22 03:17:45 EDT
Thank you. I understand that qualsIgnoreCase() is locale insensitive. It will be  for locale insensitive function such as system-facing. 

According to the Java doc at http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#toUpperCase%28char%29,  Character.toUpper()/toLower() is locale insensitive. As qualsIgnoreCase() uses Character.toUpper()/toLower(), qualsIgnoreCase() is locale insensitive as the result. The java doc describes that String.toUpper()/toLower() should be used for locale sensitive function. 

However, I found that Windows file system handles Turkish ı and İ uniquely."i.txt" and "I.txt" are case insensitive, but Turkish specific "İ (Dotted uppercase I)" and "ı (Dotless lowercase i)" seems to be case sensitive in Windows files name. Therefore, Window supports the following set of file names: (i.txt, İxt. ı.txt), (I.txt, İxt. ı.txt)

For example, the following sample code can create the files (i.txt, İxt. ı.txt) in Turkish Windows. 

import java.io.File;
import java.io.IOException;

class CreateF{
  public static void main(String args[]){
    File newfile = new File("ı.txt");  // Dotless lowercase i
    File newfile2 = new File("i.txt"); // Dotted lowercase i
    File newfile3 = new File("İ.txt"); // Dotted uppercase i
//    File newfile4 = new File("I.txt"); // Dotted uppercase i
    try{
        newfile.createNewFile();
        newfile2.createNewFile();
        newfile3.createNewFile();
//        newfile4.createNewFile();
    }catch(IOException e){
        System.out.println(e);
    }
  
  }
}

Eclipse should be consistent with Windows' file system, not Java's equalsIgnoreCase() method because this is file system's issue. In addition, 380116 might not be related to equalsIgnoreCase() because the same problem happens with English i and I. For example, "int i" and "int I" generate the same setter/getter method such as getI() and setI().
Comment 11 Dani Megert CLA 2012-05-22 03:25:36 EDT
> Thank you. I understand that qualsIgnoreCase() is locale insensitive. 

How do you understand that? The Javadoc says nothing about ignoring the Locale.
Comment 12 Kentaroh Noji CLA 2012-05-22 04:39:54 EDT
(In reply to comment #11)
> > Thank you. I understand that qualsIgnoreCase() is locale insensitive. 
> 
> How do you understand that? The Javadoc says nothing about ignoring the Locale.

I mean that I guess that equalsIgnoreCase() is locale insensitive because it uses locale insensitive functions Character.toUpper()/toLower(). 

The java doc for equalsIgnoreCase at 
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#equalsIgnoreCase%28java.lang.String%29 describes: 

Two characters c1 and c2 are considered the same ignoring case if at least one of the following is true:

    The two characters are the same (as compared by the == operator)
    Applying the method Character.toUpperCase(char) to each character produces the same result
    Applying the method Character.toLowerCase(char) to each character produces the same result 

and 

The java doc for Character.toUpperCase() at
http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#toUpperCase%28char%29 describes: 

In general, String.toUpperCase() should be used to map characters to uppercase. String case mapping methods have several benefits over Character case mapping methods. String case mapping methods can perform locale-sensitive mappings, context-sensitive mappings, and 1:M character mappings, whereas the Character case mapping methods cannot. 

Therefore, equalsIgnoreCase() will be locale insensitive I think. 

Just in case, please note that I can creates i.txt, İxt. ı.txt files in the same folder in Windows Turkish environment. This is very interesting implementation because it is neither Turkish's casing rule, nor locale insensitive rule.
Comment 13 Szymon Brandys CLA 2012-05-22 05:08:07 EDT
Kentaroh mentioned that they would like to have it fixed by 3.8.1/4.2.1, so I think we can move the discussion post-3.8/4.2.
Leaving the bug open for now.
Comment 14 Dani Megert CLA 2012-05-22 05:30:24 EDT
(In reply to comment #12)
> (In reply to comment #11)
> > > Thank you. I understand that qualsIgnoreCase() is locale insensitive. 
> > 
> > How do you understand that? The Javadoc says nothing about ignoring the Locale.
> 
> I mean that I guess that equalsIgnoreCase() is locale insensitive because it
> uses locale insensitive functions Character.toUpper()/toLower(). 

Point taken.


Replacing String.equalsIgnoreCase(String) might have an impact on performance. This needs to be considered/measured.
Comment 15 Szymon Brandys CLA 2012-05-22 05:46:41 EDT
We can easily apply the fix I suggested in comment 8 and see how the performance is affected. Should we apply the same fix across Eclipse SDK though? I think that many devs can use equalsIgnoreCase not being aware how it really works. As it was already mentioned the method does not directly say it is not locale- aware. I think we should have a util method to use instead String#equalsIgnoreCase.
Comment 16 Dani Megert CLA 2012-05-22 06:35:21 EDT
(In reply to comment #15)
> We can easily apply the fix I suggested in comment 8 and see how the
> performance is affected. Should we apply the same fix across Eclipse SDK
> though?

I'd say we have over thousands. Guess we have to fix this case by case.
Comment 18 Kentaroh Noji CLA 2012-08-03 00:51:14 EDT
(In reply to comment #17)
> Fixed with
> http://git.eclipse.org/c/platform/eclipse.platform.resources.git/commit/
> ?id=e7ee8e0432872d2c67ce80e81564aae885305a1f.

Thank you, I will verify it when the build containing the fix is available.
Comment 19 Szymon Brandys CLA 2012-09-11 11:32:46 EDT
Verified via code inspection.