Community
Participate
Working Groups
the method EncodingHelper.getEncoding does the BOM detection not quite right. - UTF-32LE will never be detected - it will be detected as UTF-16LE - also it will not detect empty or with 1-2 symbol only UTF-8 and UTF-16 files
It looks like EncodingHelper is not used/referenced anymore. If there are no objections we should remove it for the next release - and close this issue then.
Created attachment 229593 [details] Fix for bug 333851 and encoding detection added to CopyPipelet
Patch: - fixing the bug in EncodingHelper.getEncoding, - with encoding detection added to CopyPipelet. The encoding detection is based on byte order mark for any document or encoding information for html and xml documents.
Hi Peter, good idea to use the EncodingHelper there, thanks. But I got some errors when applying your patch and running the TestEncodingHelper test. They all have to do with the UTF32 html/xml test files you provided, e.g. testIsMarkup(): junit.framework.AssertionFailedError ... at org.eclipse.smila.utils.test.TestEncodingHelper.testIsMarkup(TestEncodingHelper.java:224) Could you have a look?
Hi Peter, I think the test files are corrupted by using the patch mechanism, maybe you could provide these test files just as zip attachment?
Created attachment 229601 [details] TestEncodingHelper's input data files contained in the patch Hi Andreas, done.
Ok, thanx. Now it works fine. I checked it in.
Closing this - was fixed for 1.2