Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 333851 - [encoding] EncodingHelper.getEncoding :: BOM detection not quite right
Summary: [encoding] EncodingHelper.getEncoding :: BOM detection not quite right
Status: CLOSED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Smila (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows Vista
: P3 minor (vote)
Target Milestone: ---   Edit
Assignee: Andreas Weber CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-10 05:59 EST by thomas menzel CLA
Modified: 2022-07-07 11:31 EDT (History)
1 user (show)

See Also:


Attachments
Fix for bug 333851 and encoding detection added to CopyPipelet (46.34 KB, patch)
2013-04-11 04:22 EDT, Peter Palmar CLA
no flags Details | Diff
TestEncodingHelper's input data files contained in the patch (1.44 KB, application/x-zip-compressed)
2013-04-11 06:40 EDT, Peter Palmar CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description thomas menzel CLA 2011-01-10 05:59:58 EST
the method EncodingHelper.getEncoding does the BOM detection not quite right.

- UTF-32LE will never be detected - it will be detected as UTF-16LE

- also it will not detect empty or with 1-2 symbol only UTF-8 and UTF-16 files
Comment 1 Andreas Weber CLA 2013-01-03 03:53:11 EST
It looks like EncodingHelper is not used/referenced anymore. If there are no objections we should remove it for the next release - and close this issue then.
Comment 2 Peter Palmar CLA 2013-04-11 04:22:31 EDT
Created attachment 229593 [details]
Fix for bug 333851 and encoding detection added to CopyPipelet
Comment 3 Peter Palmar CLA 2013-04-11 04:27:42 EDT
Patch:
- fixing the bug in EncodingHelper.getEncoding,
- with encoding detection added to CopyPipelet.

The encoding detection is based on byte order mark for any document or encoding information for html and xml documents.
Comment 4 Andreas Weber CLA 2013-04-11 05:54:34 EDT
Hi Peter, good idea to use the EncodingHelper there, thanks.

But I got some errors when applying your patch and running the TestEncodingHelper test. They all have to do with the UTF32 html/xml test files you provided, e.g. testIsMarkup():  junit.framework.AssertionFailedError ...  at org.eclipse.smila.utils.test.TestEncodingHelper.testIsMarkup(TestEncodingHelper.java:224)

Could you have a look?
Comment 5 Andreas Weber CLA 2013-04-11 06:25:30 EDT
Hi Peter, I think the test files are corrupted by using the patch mechanism, maybe you could provide these test files just as zip attachment?
Comment 6 Peter Palmar CLA 2013-04-11 06:40:26 EDT
Created attachment 229601 [details]
TestEncodingHelper's input data files contained in the patch

Hi Andreas, done.
Comment 7 Andreas Weber CLA 2013-04-11 08:06:14 EDT
Ok, thanx. Now it works fine. I checked it in.
Comment 8 Andreas Weber CLA 2014-05-23 09:34:18 EDT
Closing this - was fixed for 1.2