Community
Participate
Working Groups
Created attachment 129322 [details] The screenshot showing the described bug Build ID: 3.5M5 Steps To Reproduce: 1. Create a java/text file in a project (by default it's encoding is cp1250 on Win) 2. Put there a copyright char, like: System.out.println('\u00a9'); System.out.println('©'); 3. Change the encoding of this file to uft-8 You would see the second line's char changed 4. Do some changes and save the file 5. Try to change the encoding back to cp1250 Results in changes in the second line's char which introduces a char that breaks the compilation of the code. See attached screenshots More information:
There's nothing we can do here because the encoded copyright value in the file is valid in both UTF-8 and cp1250 encoding but results in a different character when decoded. In cases when we can detect that a character is not valid we block you from further saving the file, e.g. if you switched to US-ASCII instead of UTF-8.
Megert There's nothing we can do here because the encoded copyright value in the file is valid in both UTF-8 and cp1250 encoding but results in a different character when decoded. In cases when we can detect that a character is not valid we block you from further saving the file, e.g. if you switched to US-ASCII instead of UTF-8. When you said this, does it mean that the encoding is correct in UTF-8 but decoding results in different character? in such a case isn't it a bug?
Note that setting the encoding in Eclipse only changes the interpretation of the bits in the file -- there's no direct way re-encode an existing file (i.e. change the file contents from one encoding to another). You can do this manually with the steps from bug 144422 comment 8. Shall we take this bug as an enhancement request for an explicit "Change Encoding" action, which would read the file in its current encoding and transcribe it into a different encoding? Depending on the used encodings, this action could lose some characters that are not representable in the new encoding.
>Shall we take this bug as an enhancement request for an explicit They already exist: bug 47346 and bug 179187.
>When you said this, does it mean that the encoding is correct in UTF-8 but >decoding results in different character? in such a case isn't it a bug? The data in the file is valid (i.e. can be read) with UTF-8 and cp1250 and hence nothing we can detect except doing the dangerous conversion suggested by Markus.
When can we expect a fix for the bug mentioned: bug 47346
>When can we expect a fix for the bug mentioned: bug 47346 This is a complex item (need to find a way to prevent data corruption/loss, see comment 3) and the current approach is good because obviously the user sees the problem in the UI: >3. Change the encoding of this file to uft-8 > You would see the second line's char changed Also, the action/code that changes the encoding is owned by Platform UI and they need to decided whether they want to rewrite the file on an encoding change or not. I move the bug accordingly.