| Summary: | [nls tooling] TUR4.2: Unicode escape of Latin 1 specific characters is incorrect by Externalize String | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] JDT | Reporter: | Kentaroh Noji <kennoji> | ||||||||
| Component: | Text | Assignee: | JDT-Text-Inbox <jdt-text-inbox> | ||||||||
| Status: | CLOSED INVALID | QA Contact: | |||||||||
| Severity: | normal | ||||||||||
| Priority: | P3 | CC: | camle, daniel_megert, harendra, kennoji, maedera | ||||||||
| Version: | 3.8 | ||||||||||
| Target Milestone: | --- | ||||||||||
| Hardware: | PC | ||||||||||
| OS: | Windows 7 | ||||||||||
| Whiteboard: | |||||||||||
| Attachments: |
|
||||||||||
Created attachment 214510 [details]
Sample test case
This sample contains Latin 1 specific characters.
Created attachment 214511 [details]
a message.properties file generated by Externalize String
Created attachment 214512 [details]
Screen capture of result
(In reply to comment #1) > Created attachment 214510 [details] > Sample test case > > This sample contains Latin 1 specific characters. This file does not compile and hence nothing can be externalized. When I fix the error and then externalize the string I get this entry: Uni.0=¡¢£¤¥¦§¨©ª«¬SHY®¯Bx°±²³´µ¶·¸¹º»¼½¾¿CxÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏDxÐÑÒÓÔÕÖרÙÚÛÜÝÞßExàáâãäåæçèéêëìíîïFxðñòóôõö÷øùúûüýþÿ And all works fine. (In reply to comment #4) > And all works fine. Let me explain the problem again. Here are problem recreation steps: 1. Create a java class file which contains literals with Latin 1 characters: public class Uniescape { /** * @param args */ public static void main(String[] args) { System.out.println("AÀÁÂÃÄÅ"); } } 2. Externalize String in Eclipse. Source > Externalize String. 3. Then, messages.properties file is created and it contains key=value like: Uniescape.0=AÀÁÂÃÄÅ 4. When I run JDK's native2ascii command for the message.properties, I get the following result: Uniescape.0=A\u00c0\u00c1\u00c2\u00c3\u00c4\u00c5 5. Why does not the Eclipse's externalizing string function transform these Latin 1 characters into Unicode escape defined by the Java sepc.? Note that Eclipse's externalizing string function transform non-ASCII character other than Latin 1 into Unicode escape. (In reply to comment #5) > (In reply to comment #4) > > > And all works fine. > > Let me explain the problem again. Here are problem recreation steps: > > 1. Create a java class file which contains literals with Latin 1 characters: > > public class Uniescape { > > /** > * @param args > */ > public static void main(String[] args) { > System.out.println("AÀÁÂÃÄÅ"); > } > > } > > 2. Externalize String in Eclipse. Source > Externalize String. > > 3. Then, messages.properties file is created and it contains key=value like: > > Uniescape.0=AÀÁÂÃÄÅ And this is correct. > 5. Why does not the Eclipse's externalizing string function transform these > Latin 1 characters into Unicode escape defined by the Java sepc.? That's not in spec. Please point me to which part in the JLS7 you refer to if you disagree. The Javadoc says that only non-Latin1 characters need to be escaped. The Java™ Language Specification Java SE 7 Edition describes in section 3.3 Unicode escape: The Java programming language specifies a standard way of transforming a program written in Unicode into ASCII that changes a program into a form that can be processed by ASCII-based tools. The transformation involves converting any Unicode escapes in the source text of the program to ASCII by adding an extra u - for example, \uxxxx becomes \uuxxxx - while simultaneously converting non- ASCII characters in the source text to Unicode escapes containing a single u each. So, it looks that Unicode escape is for transforming from Unicode chars into ASCII chars only. It looks ASCII does not include U+00A0 - U+00FF. Yes, there *is a way* to transform to ASCII. It does not say anything that one *must* transform it. Only when a properties file entry is non-Latin1 one has to do this. Thank you. I found the following statement in javadoc Properties: When saving properties to a stream or loading them from a stream, the ISO 8859-1 character encoding is used. For characters that cannot be directly represented in this encoding, Unicode escapes are used; however, only a single 'u' character is allowed in an escape sequence. The native2ascii tool can be used to convert property files to and from other character encodings. I understand this is not a bug, I am closing this report. |
Build Identifier: I20120315-1300 Latin 1 specific characters such as "¡¢£¤¥¦§¨©ª«¬SHY®¯Bx°±²³´µ¶·¸¹º»¼½¾¿CxÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏDxÐÑÒÓÔÕÖרÙÚÛÜÝÞßExàáâãäåæçèéêëìíîïFxðñòóôõö÷øùúûüýþÿ!" should be encoded in Unicode escape \uxxxx in properties file. However, Externalize String function generates these Latin 1 characters as it is. JDK's native2ascii translates these Latin 1 characters into Unicode escape such as \u00a1\u00a2\u00a3\u00a4\u00a5\u00a6\u00a7\u00a8\u00a9\u00aa\u00ab\u00acSHY\u00ae\u00afBx\u00b0\u00b1\u00b2\u00b3\u00b4\u00b5\u00b6\u00b7\u00b8\u00b9\u00ba\u00bb\u00bc\u00bd\u00be\u00bfCx\u00c0\u00c1\u00c2\u00c3\u00c4\u00c5\u00c6\u00c7\u00c8\u00c9\u00ca\u00cb\u00cc\u00cd\u00ce\u00cfDx\u00d0\u00d1\u00d2\u00d3\u00d4\u00d5\u00d6\u00d7\u00d8\u00d9\u00da\u00db\u00dc\u00dd\u00de\u00dfEx\u00e0\u00e1\u00e2\u00e3\u00e4\u00e5\u00e6\u00e7\u00e8\u00e9\u00ea\u00eb\u00ec\u00ed\u00ee\u00efFx\u00f0\u00f1\u00f2\u00f3\u00f4\u00f5\u00f6\u00f7\u00f8\u00f9\u00fa\u00fb\u00fc\u00fd\u00fe\u00ff Reproducible: Always Steps to Reproduce: OS: e.g. Windows 7 SP1 Professional Turkish Edition JDK: java full version JRE 1.7.0 IBM Windows AMD 64 build pwa6470-20110906_01 Locale:Turkish I found this symptom in Turkish Environment with some Turkish characters. After some investigation, I found that this symptom happens with Latin 1 specific characters. 1. Create a Java project, and create a java class. 2. Add "System.out.println("¡¢£¤¥¦§¨©ª«¬SHY®¯Bx°±²³´µ¶·¸¹º»¼½¾¿CxÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏDxÐÑÒÓÔÕÖרÙÚÛÜÝÞßExàáâãäåæçèéêëìíîïFxðñòóôõö÷øùúûüýþÿ");" in the java class created. 3. Source > Externalize String. Click Next button, and finish. 4. Browse the messages.properties file.