Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 269314 - [encoding] Changing encoding utf-8 forth and back breaks compilaiton
Summary: [encoding] Changing encoding utf-8 forth and back breaks compilaiton
Status: RESOLVED INVALID
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Text (show other bugs)
Version: 3.5   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Platform-Text-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-19 03:33 EDT by Blazej Kroll CLA
Modified: 2009-03-25 07:40 EDT (History)
3 users (show)

See Also:


Attachments
The screenshot showing the described bug (231.59 KB, application/pdf)
2009-03-19 03:33 EDT, Blazej Kroll CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Blazej Kroll CLA 2009-03-19 03:33:01 EDT
Created attachment 129322 [details]
The screenshot showing the described bug

Build ID: 3.5M5

Steps To Reproduce:
1. Create a java/text file in a project (by default it's encoding is cp1250 on Win)

2. Put there a copyright char, like:
   System.out.println('\u00a9');
   System.out.println('©');

3. Change the encoding of this file to uft-8
    You would see the second line's char changed

4. Do some changes and save the file

5. Try to change the encoding back to cp1250
   Results in changes in the second line's char which introduces a char that breaks the compilation of the code.

See attached screenshots


More information:
Comment 1 Dani Megert CLA 2009-03-19 04:41:25 EDT
There's nothing we can do here because the encoded copyright value in the file is valid in both UTF-8 and cp1250 encoding but results in a different character when decoded. In cases when we can detect that a character is not valid we block you from further saving the file, e.g. if you switched to US-ASCII instead of UTF-8.
Comment 2 Jawahar CLA 2009-03-20 05:33:54 EDT
Megert

There's nothing we can do here because the encoded copyright value in the file
is valid in both UTF-8 and cp1250 encoding but results in a different character
when decoded. In cases when we can detect that a character is not valid we
block you from further saving the file, e.g. if you switched to US-ASCII
instead of UTF-8.


When you said this, does it mean that the encoding is correct in UTF-8 but decoding results in different character? in such a case isn't it a bug?
Comment 3 Markus Keller CLA 2009-03-20 06:26:38 EDT
Note that setting the encoding in Eclipse only changes the interpretation of the bits in the file -- there's no direct way re-encode an existing file (i.e. change the file contents from one encoding to another). You can do this manually with the steps from bug 144422 comment 8.

Shall we take this bug as an enhancement request for an explicit "Change Encoding" action, which would read the file in its current encoding and transcribe it into a different encoding? Depending on the used encodings, this action could lose some characters that are not representable in the new encoding.
Comment 4 Dani Megert CLA 2009-03-20 06:37:08 EDT
>Shall we take this bug as an enhancement request for an explicit
They already exist: bug 47346 and bug 179187.
Comment 5 Dani Megert CLA 2009-03-20 06:42:16 EDT
>When you said this, does it mean that the encoding is correct in UTF-8 but
>decoding results in different character? in such a case isn't it a bug?
The data in the file is valid (i.e. can be read) with UTF-8 and cp1250 and hence nothing we can detect except doing the dangerous conversion suggested by Markus.
Comment 6 Jawahar CLA 2009-03-25 01:57:42 EDT
When can we expect a fix for the bug mentioned: bug 47346
Comment 7 Dani Megert CLA 2009-03-25 07:40:42 EDT
>When can we expect a fix for the bug mentioned: bug 47346
This is a complex item (need to find a way to prevent data corruption/loss, see comment 3) and the current approach is good because obviously the user sees the problem in the UI:

>3. Change the encoding of this file to uft-8
>    You would see the second line's char changed

Also, the action/code that changes the encoding is owned by Platform UI and they need to decided whether they want to rewrite the file on an encoding change or not. I move the bug accordingly.