Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 320133 - [nls tooling] externalize strings escapes ISO 8859-1
Summary: [nls tooling] externalize strings escapes ISO 8859-1
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Text (show other bugs)
Version: 3.7   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: 3.7 M1   Edit
Assignee: Deepak Azad CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 324378 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-07-16 13:24 EDT by Sebastian Dietrich CLA
Modified: 2010-09-03 08:04 EDT (History)
3 users (show)

See Also:


Attachments
fix + tests (2.81 KB, patch)
2010-07-26 05:54 EDT, Deepak Azad CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Dietrich CLA 2010-07-16 13:24:47 EDT
Build Identifier: 20100218-1602

The externalize strings capability of eclipse escapes ISO 8859-1 characters, even if that is not necessary (since properties files are ISO 8859-1). E.g. "�" becomes "\u00FC".
This makes both the properties files less readable and might (in special cases) produce some different behavior depending on the encoding settings of your JVM

Reproducible: Always

Steps to Reproduce:
1. write code using a string that uses some special (but still ISO 8859-1) characters like "הצ��"
2. select "source/externalize strings"
3. see the messages.properties files and notice that the characters have been escaped
Comment 1 Dani Megert CLA 2010-07-20 03:17:20 EDT
> might (in special cases)
>produce some different behavior depending on the encoding settings of your JVM
Can you give an example?

Deepak, please investigate.
Comment 2 Sebastian Dietrich CLA 2010-07-20 03:27:20 EDT
(In reply to comment #1)
> > might (in special cases)
> >produce some different behavior depending on the encoding settings of your JVM
> Can you give an example?

We had changed the encoding of your Java files to UTF-8 and wrote some unit-tests that compared strings with umlauts with those in message.properties (with escaped umlauts). Everything ran fine in eclipse, but not with ant (until we changed the encoding for Ant as well to UTF-8). This is probably a different problem, but I'm sure it could have been avoided if the umlauts would not have been escaped when generating the message.properties file.
Comment 3 Dani Megert CLA 2010-07-20 03:31:14 EDT
>This is probably a different problem
Exactly ;-)
Comment 4 Deepak Azad CLA 2010-07-26 05:54:36 EDT
Created attachment 175208 [details]
fix + tests

Fixed in HEAD.

Now only the control characters, and NBSP (Non Breaking Space) will be escaped from the ISO-8859-1 character set.

NBSP is escaped (to \u00A0) so that it is differentiated from the normal space character. I can change the behavior is someone thinks otherwise.
Comment 5 Deepak Azad CLA 2010-07-26 05:55:15 EDT
.
Comment 6 Dani Megert CLA 2010-08-03 05:19:37 EDT
Verified in I20100802-1800.
Comment 7 Dani Megert CLA 2010-09-03 08:03:06 EDT
*** Bug 324378 has been marked as a duplicate of this bug. ***