Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 333851

Summary: [encoding] EncodingHelper.getEncoding :: BOM detection not quite right
Product: z_Archived Reporter: thomas menzel <tmenzel>
Component: SmilaAssignee: Andreas Weber <Andreas.Weber>
Status: CLOSED FIXED QA Contact:
Severity: minor    
Priority: P3 CC: peter.palmar
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows Vista   
Whiteboard:
Attachments:
Description Flags
Fix for bug 333851 and encoding detection added to CopyPipelet
none
TestEncodingHelper's input data files contained in the patch none

Description thomas menzel CLA 2011-01-10 05:59:58 EST
the method EncodingHelper.getEncoding does the BOM detection not quite right.

- UTF-32LE will never be detected - it will be detected as UTF-16LE

- also it will not detect empty or with 1-2 symbol only UTF-8 and UTF-16 files
Comment 1 Andreas Weber CLA 2013-01-03 03:53:11 EST
It looks like EncodingHelper is not used/referenced anymore. If there are no objections we should remove it for the next release - and close this issue then.
Comment 2 Peter Palmar CLA 2013-04-11 04:22:31 EDT
Created attachment 229593 [details]
Fix for bug 333851 and encoding detection added to CopyPipelet
Comment 3 Peter Palmar CLA 2013-04-11 04:27:42 EDT
Patch:
- fixing the bug in EncodingHelper.getEncoding,
- with encoding detection added to CopyPipelet.

The encoding detection is based on byte order mark for any document or encoding information for html and xml documents.
Comment 4 Andreas Weber CLA 2013-04-11 05:54:34 EDT
Hi Peter, good idea to use the EncodingHelper there, thanks.

But I got some errors when applying your patch and running the TestEncodingHelper test. They all have to do with the UTF32 html/xml test files you provided, e.g. testIsMarkup():  junit.framework.AssertionFailedError ...  at org.eclipse.smila.utils.test.TestEncodingHelper.testIsMarkup(TestEncodingHelper.java:224)

Could you have a look?
Comment 5 Andreas Weber CLA 2013-04-11 06:25:30 EDT
Hi Peter, I think the test files are corrupted by using the patch mechanism, maybe you could provide these test files just as zip attachment?
Comment 6 Peter Palmar CLA 2013-04-11 06:40:26 EDT
Created attachment 229601 [details]
TestEncodingHelper's input data files contained in the patch

Hi Andreas, done.
Comment 7 Andreas Weber CLA 2013-04-11 08:06:14 EDT
Ok, thanx. Now it works fine. I checked it in.
Comment 8 Andreas Weber CLA 2014-05-23 09:34:18 EDT
Closing this - was fixed for 1.2