Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 320133

Summary: [nls tooling] externalize strings escapes ISO 8859-1
Product: [Eclipse Project] JDT Reporter: Sebastian Dietrich <Sebastian.Dietrich>
Component: TextAssignee: Deepak Azad <deepakazad>
Status: VERIFIED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: daniel_megert, dasvipin5585, remy.suen
Version: 3.7   
Target Milestone: 3.7 M1   
Hardware: All   
OS: All   
Whiteboard:
Attachments:
Description Flags
fix + tests none

Description Sebastian Dietrich CLA 2010-07-16 13:24:47 EDT
Build Identifier: 20100218-1602

The externalize strings capability of eclipse escapes ISO 8859-1 characters, even if that is not necessary (since properties files are ISO 8859-1). E.g. "�" becomes "\u00FC".
This makes both the properties files less readable and might (in special cases) produce some different behavior depending on the encoding settings of your JVM

Reproducible: Always

Steps to Reproduce:
1. write code using a string that uses some special (but still ISO 8859-1) characters like "הצ��"
2. select "source/externalize strings"
3. see the messages.properties files and notice that the characters have been escaped
Comment 1 Dani Megert CLA 2010-07-20 03:17:20 EDT
> might (in special cases)
>produce some different behavior depending on the encoding settings of your JVM
Can you give an example?

Deepak, please investigate.
Comment 2 Sebastian Dietrich CLA 2010-07-20 03:27:20 EDT
(In reply to comment #1)
> > might (in special cases)
> >produce some different behavior depending on the encoding settings of your JVM
> Can you give an example?

We had changed the encoding of your Java files to UTF-8 and wrote some unit-tests that compared strings with umlauts with those in message.properties (with escaped umlauts). Everything ran fine in eclipse, but not with ant (until we changed the encoding for Ant as well to UTF-8). This is probably a different problem, but I'm sure it could have been avoided if the umlauts would not have been escaped when generating the message.properties file.
Comment 3 Dani Megert CLA 2010-07-20 03:31:14 EDT
>This is probably a different problem
Exactly ;-)
Comment 4 Deepak Azad CLA 2010-07-26 05:54:36 EDT
Created attachment 175208 [details]
fix + tests

Fixed in HEAD.

Now only the control characters, and NBSP (Non Breaking Space) will be escaped from the ISO-8859-1 character set.

NBSP is escaped (to \u00A0) so that it is differentiated from the normal space character. I can change the behavior is someone thinks otherwise.
Comment 5 Deepak Azad CLA 2010-07-26 05:55:15 EDT
.
Comment 6 Dani Megert CLA 2010-08-03 05:19:37 EDT
Verified in I20100802-1800.
Comment 7 Dani Megert CLA 2010-09-03 08:03:06 EDT
*** Bug 324378 has been marked as a duplicate of this bug. ***