Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 62443

Summary: [content type] UTF-16 causes exception in XMLRootHandler with IBM's JRE
Product: [Eclipse Project] Platform Reporter: David Williams <david_williams>
Component: ResourcesAssignee: Rafael Chaves <eclipse>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3    
Version: 3.0   
Target Milestone: 3.0 RC1   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Attachments:
Description Flags
junit test to demonstrate above error
none
hex values of test file none

Description David Williams CLA 2004-05-17 03:12:54 EDT
Using 0514 build. 
JRE is 
java version "1.4.2"
J9 - VM for the Java(TM) platform (build 2.1)
IBM J9SE VM (build 2.1, J2RE 1.4.2 IBM J9 build 20040422 (JIT enabled))

Works ok on Suns' JRE (1.4.2_03) but with IBM's its easy to exception from 
iFile.getContentDescription(), I'll attach test case. 


= = = = =
org.eclipse.core.internal.resources.ResourceException(/com.ibm.encoding.
resource.newtests/testfiles/xml/testUTF16.xml)[381]: sun.io.
MalformedInputException
	at sun.io.ByteToCharUnicode.flush(ByteToCharUnicode.java:227)
	at sun.nio.cs.StreamDecoder$ConverterSD.flushInto(StreamDecoder.java:305)
	at sun.nio.cs.StreamDecoder$ConverterSD.implRead(StreamDecoder.java:329)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:222)
	at java.io.InputStreamReader.read(InputStreamReader.java:207)
	at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
	at org.apache.xerces.impl.XMLEntityScanner.skipSpaces(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.
dispatch(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.
scanDocument(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at javax.xml.parsers.SAXParser.parse(Unknown Source)
	at org.eclipse.core.internal.content.XMLRootHandler.
parseContents(XMLRootHandler.java:163)
	at org.eclipse.core.runtime.content.XMLRootElementContentDescriber.
checkCriteria(XMLRootElementContentDescriber.java:62)
	at org.eclipse.core.runtime.content.XMLRootElementContentDescriber.
describe(XMLRootElementContentDescriber.java:87)
	at org.eclipse.core.internal.content.ContentType.describe(ContentType.java:
164)
	at org.eclipse.core.internal.content.ContentTypeManager.
internalFindContentTypesFor(ContentTypeManager.java:295)
	at org.eclipse.core.internal.content.ContentTypeManager.
getDescriptionFor(ContentTypeManager.java:262)
	at org.eclipse.core.internal.resources.ContentDescriptionManager.
readDescription(ContentDescriptionManager.java:57)
	at org.eclipse.core.internal.resources.ContentDescriptionManager.
getDescriptionFor(ContentDescriptionManager.java:42)
	at org.eclipse.core.internal.resources.File.getContentDescription(File.
java:239)
	at com.ibm.encoding.resource.tests.example.TestCodedReader.
doContentDescriptionTest(TestCodedReader.java:213)
	at com.ibm.encoding.resource.tests.example.TestCodedReader.
testFile123(TestCodedReader.java:206)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:84)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.
invoke(DelegatingMethodAccessorImpl.java:59)
	at java.lang.reflect.Method.invoke(Method.java:390)
	at junit.framework.TestCase.runTest(TestCase.java:154)
	at junit.framework.TestCase.runBare(TestCase.java:127)
	at junit.framework.TestResult$1.protect(TestResult.java:106)
	at junit.framework.TestResult.runProtected(TestResult.java:124)
	at junit.framework.TestResult.run(TestResult.java:109)
	at junit.framework.TestCase.run(TestCase.java:118)
	at junit.framework.TestSuite.runTest(TestSuite.java:208)
	at junit.framework.TestSuite.run(TestSuite.java:203)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.
runTests(RemoteTestRunner.java:422)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.
run(RemoteTestRunner.java:306)
	at org.eclipse.pde.internal.junit.runtime.RemotePluginTestRunner.
main(RemotePluginTestRunner.java:30)
	at org.eclipse.pde.internal.junit.runtime.UITestApplication$1.
run(UITestApplication.java:90)
	at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35)
	at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:
106)
	at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:2702)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2394)
	at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:1353)
	at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:1324)
	at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:
243)
	at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:141)
	at org.eclipse.ui.internal.ide.IDEApplication.run(IDEApplication.java:90)
	at org.eclipse.pde.internal.junit.runtime.UITestApplication.
run(UITestApplication.java:33)
	at org.eclipse.core.internal.runtime.PlatformActivator$1.
run(PlatformActivator.java:298)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:
249)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:
126)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:84)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.
invoke(DelegatingMethodAccessorImpl.java:59)
	at java.lang.reflect.Method.invoke(Method.java:390)
	at org.eclipse.core.launcher.Main.basicRun(Main.java:269)
	at org.eclipse.core.launcher.Main.run(Main.java:722)
	at org.eclipse.core.launcher.Main.main(Main.java:706)
Comment 1 David Williams CLA 2004-05-17 03:16:26 EDT
Created attachment 10718 [details]
junit test to demonstrate above error
Comment 2 Rafael Chaves CLA 2004-05-17 12:54:48 EDT
David, it seems the mentioned file does not have a UTF-16 BOM. According to:

http://www.w3.org/TR/2004/REC-xml-20040204/#charencoding

it seems it should have. 

Of course, this does not invalidate the PR. I just thought I should mention.
Comment 3 David Williams CLA 2004-05-17 13:16:01 EDT
Created attachment 10738 [details]
hex values of test file

I think it does -- testUTF16.xml in zip file? -- unless somethings getting
"lost in translation". Of course, I think some one's code some where may leave
the input stream positioned after the BOM :) [that works pretty well, and is
needed, for UTF-8 BOMs, I've always found the UTF-16 boms a little more
problematic, sometimes expected, sometimes not, as the different results for
two VM's would seem to indicate]

The attached image show's what the hex values look like for the file I'm
looking at ... FFFE, right?
Comment 4 Rafael Chaves CLA 2004-05-17 14:29:47 EDT
My fault... yes, the right one has the right BOM... I was trying with
"test-UTF-16.xml" in testfiles\genedFiles...\xml. 

The reason we were failing is that we want to let IOExceptions flow to the
caller, but sun.io.MalformedInputException is an I/O exception
(CharConversionException). We will have to handle those (and let
non-encoding-related ones flow).
Comment 5 Rafael Chaves CLA 2004-05-17 15:24:55 EDT
Since we are reading the contents right in the beginning in the handling it to
describers, any "real" I/O exceptions will happen right way. When calling
describers, I/O exceptions will not be severe, so they are just logged (not thrown).

Fixed and released to HEAD as described above.
Comment 6 Rafael Chaves CLA 2004-05-17 17:50:08 EDT
Actually the problem itself still occurs...
Comment 7 Rafael Chaves CLA 2004-05-17 18:55:56 EDT
David, that file has an odd number of bytes. The IOException happens when trying
to decode the last char. Is this intentional? The following example would cause
a CharConversionException to occur with IBM's JRE:

import java.io.*;
public class Simple {
	public static void main(String[] args) throws IOException {
		Reader reader = new InputStreamReader(new FileInputStream(args[0]), args[1]);
		int c;
		while ((c = reader.read()) != -1)
			System.out.println((char) c);
	}
}
Comment 8 David Williams CLA 2004-05-17 22:53:38 EDT
No, it wasn't intentional. Well, at least I don't think so. I'll try and 
recover its "history", but its just part of the whole set of files I've 
routinely tested for the past few years! (I should document my unit tests 
better).  I assume some previous version of Java wrote it that way. It does 
make obvious, though, that the CharsetDecoder error defaults are different 
between the IBM and Sun VMs (and, we've had trouble in the past where the 
defaults change from one version to another). 

If you change your example to use "Replace" on error then you get the same 
behavior on both VMs. 

	 	Charset charset = Charset.forName("UTF-16");
	 	CharsetDecoder charsetDecoder = charset.newDecoder();
	 	charsetDecoder.onMalformedInput(CodingErrorAction.REPLACE);
		 Reader reader = new InputStreamReader(new FileInputStream(args
[0]), charsetDecoder);
		 int c;
		 while ((c = reader.read()) != -1)
		 		 System.out.println((char) c);
}

Can things be arranged so each "content type handler" set its own values for 
this type of error handling? It seems the XMLRootHander would be best off 
ignoring (replacing) them (since its looking for a "positive match"). But, in 
the past, we've enjoyed giving are editor users a choice when an error 
occurs ... e.g. "malformed input detected do you want to continue or cancel?". 
I'm not sure how to do that with this new system. 
Comment 9 Rafael Chaves CLA 2004-05-25 13:56:07 EDT
Partial fix was to handle faulty describers so other describers still have a chance.

As the change you suggested, David, in core.runtime we have a requirement of
running on J2SE subsets that do not include java.nio. To do what you suggested
would require doing some exercise with reflection.
Comment 10 David Williams CLA 2004-05-25 22:28:36 EDT
No java.nio!? How do those systems handle encoding/decoding? I thought java.nio 
was a "standard" part of Java 1.4. So, if core.runtime has to run on a subset 
of standard Java, then some of this encoding/decoding function doesn't belong 
at that level, that'd be my opinion, I mean. Even more concretely, in this 
case, if your "fix" is just to disable that provider as 'faulty', then there 
will be a bug open that object contributions depending on 
XMLRootElementContentDescriber would not work. (Or, do you mean it was just be 
disabled for that one pass, for that one file, in which case, you'd always need 
that sort of fall back behaviour for that one time.). 

BTW, I think this "invalid file" was formed by checking a UTF-16 files with 
single EOL into CVS, and then when checked back out, an 2 coded EOL was added. 
Or some similar "play" with end-of-lines. I suspect this will be moderately 
common.

More severely, for me to maintain our products current level of encoding 
support/behavior, I will have to use java.nio (e.g. to check differenct 
of "detected encoding" and "used encoding" (to know when an alias is being 
used) and have control over how its set/initialized.  I as going to propose 
some of these as fixes for core.runtime, but sounds like that would be a hard 
case to sell. I don't mind leaving them in my own XML version of 
ContentDescriber, as long as I can depend on it always being called. I assume I 
would put its priority as "high" and a child of runtime.xml. Do you forsee any 
problems with this approach?

Thanks in advance for any help or advice. 
Comment 11 Rafael Chaves CLA 2004-05-26 12:20:11 EDT
Agreed that the content type support is being more restricted than it should,
but right now we don't have many choices.

Regardless the cirscumstances your file got into that state, you agree it is
invalid, right? 

Re: providing a personalized version of the XML content describer: you cannot
replace the default XML content provider. But the XML describer will hardly
classify any contents as invalid (currently it never does that), so you don't
need a new content type for XML. You need a more appropriate XML content
describer to be used by your XML-based content types.
Comment 12 Rafael Chaves CLA 2004-05-26 16:38:37 EDT
No further action planned. We will log such exceptions only if in debug mode
(added a debug option for content type), and faulty describers will just be
skipped during that lookup.