Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 328233

Summary: [nls tooling] Externalize string broken with \\uHHHH in string
Product: [Eclipse Project] JDT Reporter: Markus Keller <markus.kell.r>
Component: TextAssignee: Deepak Azad <deepakazad>
Status: VERIFIED FIXED QA Contact:
Severity: major    
Priority: P3 CC: daniel_megert, raksha.vasisht
Version: 3.7   
Target Milestone: 3.7 M3   
Hardware: PC   
OS: Windows 7   
Whiteboard:
Attachments:
Description Flags
fix
none
additional fix none

Description Markus Keller CLA 2010-10-20 07:10:52 EDT
HEAD, was OK in I20101005-0800

package source;
public class Snippet {
	public static void main(String[] args) {
		System.out.println("Hello \u00e4 \u2063, \\u00e4 \\u2063");
	}
}

- The Externalize Strings wizard wrongly says "Some values are invalid".
- The NLS hover in the .properties file wrongly says "Malformed \uxxxx encoding".


The encoding/decoding code needs some more cleanup:

- NativeToAscii is a bad name. The native2ascii tool is already misnamed, since it converts to "Latin1 + escaped Unicode", and not to ASCII. Our class does not even deal with a "native" encoding, so both parts of the name are wrong. It should rater be called something like PropertiesFileEscapes.

- The 2 conversion methods in NativeToAscii should both do the special handling for \t etc. . Callers of getEscapedAsciiString(char) shouldn't have to do that.
Comment 1 Deepak Azad CLA 2010-10-20 08:11:42 EDT
(In reply to comment #0)
> HEAD, was OK in I20101005-0800
> 
> package source;
> public class Snippet {
>     public static void main(String[] args) {
>         System.out.println("Hello \u00e4 \u2063, \\u00e4 \\u2063");
>     }
> }
> 
> - The Externalize Strings wizard wrongly says "Some values are invalid".
> - The NLS hover in the .properties file wrongly says "Malformed \uxxxx
> encoding".
This is a bug in NativeToAscii.isValidNativeString(String), I will fix it.
 
> The encoding/decoding code needs some more cleanup:
> 
> - NativeToAscii is a bad name. The native2ascii tool is already misnamed, since
> it converts to "Latin1 + escaped Unicode", and not to ASCII. Our class does not
> even deal with a "native" encoding, so both parts of the name are wrong. It
> should rater be called something like PropertiesFileEscapes.
...and then what do you want to call the two methods? escape(..) and unescape(...)?
Dani, did not like the names getNativeString() and getEscapedNativeString(). He had suggested - native2Ascii() and ascii2Native(). I agree NativeToAscii is technically incorrect, but it is a well known name and immediately suggests what the code will do.
 
> - The 2 conversion methods in NativeToAscii should both do the special handling
> for \t etc. . Callers of getEscapedAsciiString(char) shouldn't have to do that.
I can introduce a boolean parameter to do this.
Comment 2 Markus Keller CLA 2010-10-20 09:46:32 EDT
> ...and then what do you want to call the two methods? escape(..) and
> unescape(...)?

Yes. The Javadoc of the class should tell that this class converts between Java chars and the escaped form that must be used in .properties files.

Again, "native" is not involved here at all. In the native2ascii tool, "native" refers to the system's default encoding (which can be configured with the -encoding switch). In our code, "native" is not involved at all.

What we do is simple escaping and unescaping. So using the term native2ascii does not suggest what we do - it's just wrong.


> > - The 2 conversion methods in NativeToAscii should both do the special handling
> > for \t etc. . Callers of getEscapedAsciiString(char) shouldn't have to do that.
> I can introduce a boolean parameter to do this.

No, don't add a boolean. The methods should be symmetrical, and they currently are not. All callers of getEscapedAsciiString(char) already escape \r, \n, \t, and \f, and that should simply be moved into the method. The other escaping can stay where it is if it is legitimate in that context.
Comment 3 Dani Megert CLA 2010-10-20 09:51:02 EDT
>Dani, did not like the names getNativeString() and getEscapedNativeString(). He
>had suggested - native2Ascii() and ascii2Native(). 
I also suggested escape***(...). So that would be fine by me as well.
Comment 4 Deepak Azad CLA 2010-10-21 10:24:38 EDT
Created attachment 181396 [details]
fix

(In reply to comment #1)
> This is a bug in NativeToAscii.isValidNativeString(String), I will fix it.
phew.. the \es again. This method is just wrong, got rid of it. Things should work perfectly now..

- Renamed to PropertiesFileEscape, escape(..) and unescape(...)
- Made escape(..) and unescape(..) symmetrical
- the patch also contains the 2 line fix for bug 328223
Comment 5 Deepak Azad CLA 2010-10-21 10:28:22 EDT
(In reply to comment #4)
> Created an attachment (id=181396) [details] [diff]
> fix

Fixed in HEAD.
Comment 6 Markus Keller CLA 2010-10-22 06:22:44 EDT
Now we have only one remaining bad transformation: "\b" is externalized as \b, but it should be \u0008.

And BTW: PropertiesFileAutoEditStratergy contains a typo.

Example with nasty cases:

package pack;
public class Snippet {
    public static void main(String[] args) {
        p("Hello \\b\\t\\f\\r\\n\\\\:\\!\\# !# \b\t\f\r\n\\"); //$NON-NLS-1$
        p("Hello \\b\\t\\f\\r\\n\\\\:\\!\\# !# \b\t\f\r\n\\");

        p("\t\tHello \u00e4 \u2603, \\u00e4 \\u2603, \\u"); //$NON-NLS-1$
        p("\t\tHello \u00e4 \u2603, \\u00e4 \\u2603, \\u");
    }
    private static void p(String s) {
        System.out.println(s);
    }
}
Comment 7 Deepak Azad CLA 2010-10-22 07:25:49 EDT
Created attachment 181487 [details]
additional fix

Fixed in HEAD.
Comment 8 Deepak Azad CLA 2010-10-22 07:28:49 EDT
.
Comment 9 Raksha Vasisht CLA 2010-10-26 10:01:00 EDT
Verified for 3.7 M3 with  I20101025-1800.