| Summary: | [implementation] save changes BOM of UTF-16 with LE BOM | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] Platform | Reporter: | Warren Paul <warren.paul> | ||||
| Component: | Text | Assignee: | Platform-Text-Inbox <platform-text-inbox> | ||||
| Status: | RESOLVED FIXED | QA Contact: | |||||
| Severity: | major | ||||||
| Priority: | P3 | CC: | daniel_megert, Szymon.Brandys | ||||
| Version: | 3.3.1 | ||||||
| Target Milestone: | 3.4 M7 | ||||||
| Hardware: | PC | ||||||
| OS: | Windows XP | ||||||
| Whiteboard: | |||||||
| Attachments: |
|
||||||
|
Description
Warren Paul
Created attachment 94857 [details]
test case
Here's a simple UTF-16LE file. Before the change, if you opened this file in Eclipse, changed it and saved, the encoding would be changed to UTF-16BE. An easy way to tell is opening the file in Notepad on Win32 and doing Save As. That shows you the type, either Unicode, or Unicode big endian.
After the change, the proper encoding is preserved. I tested opening/changing both UTF-16BE and UTF-16LE files to make sure the correct BOM is preserved.
I've added a workaround to file buffers which will fix it for most textual editors until this bug gets fixed. Szymon, the proposed fix in comment 0 looks reasonable to me. >Szymon, the proposed fix in comment 0 looks reasonable to me. Actually, after discussing with Szymon and reading http://en.wikipedia.org/wiki/Utf-16 we conclude that the current behavior of ContentDescription.getCharset() is correct. The writer needs to take care of preserving the BOM. Fixed in HEAD for file buffers and FileDocumentProvider. Available in builds > N20080410-2000. Starting verification... Verified in I20080427-2000. The test case still fails for me. I open a Unicode file in RC2, change something and save, then open in Notepad, Save As. It's Unicode big endian. The original bug report is still an issue. The save as was done in notepad as an easy way to see the encoding after saving the file in Eclipse. Sorry, I cannot reproduce. Please provide exact/detailed steps. (In reply to comment #11) > Sorry, I cannot reproduce. Please provide exact/detailed steps. > I take a UTF16-LE file and open it in Eclipse RC3 using File->Open File. I make a minor change and save the file. The encoding is changed to UTF16-BE. I'm using Notepad to verify the encoding of the file before and after changing it in Eclipse. Please file a separate bug. I fixed the scenario where files within a workspace aren't handled correctly. If you use File > Open File it's a different scenario, especially since we have no clue what encoding that file has. (In reply to comment #13) > Please file a separate bug. I fixed the scenario where files within a workspace > aren't handled correctly. If you use File > Open File it's a different > scenario, especially since we have no clue what encoding that file has. > OK, I cloned this to 236266. But we do know the encoding as it's in the BOM. The file may not have a BOM in which case we won't know, but when it does, it should be honored. |