Bug 67048 - [doc] XML files with BOM fail to have content types set
Summary: [doc] XML files with BOM fail to have content types set
Status: RESOLVED WONTFIX
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 3.0   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.0 RC4   Edit
Assignee: Platform-Resources-Inbox CLA Friend
QA Contact:
URL:
Whiteboard:
Keywords: readme
: 70177 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-06-14 13:41 EDT by Darin Swanson CLA Friend
Modified: 2005-11-04 12:06 EST (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Darin Swanson CLA Friend 2004-06-14 13:41:29 EDT
This will likely have to just be a readme item as it would appear to be a 
problem with the xml parser.

The XMLRootHandler fails parsing an XMl file if that fail has a byte order 
mark (BOM) (fatal SAXParseException that the document root element is missing).

This only occurs using the Crimson parser. With the Xerces parser the file is 
parsed successfully.
For a test file with a BOM, see bug 61564

As a result of this problem, valid buildfiles do not have the Run Ant menu 
entries in the Run context menu.
Comment 1 Rafael Chaves CLA Friend 2004-06-14 17:11:10 EDT
That is right, some parsers seem not to be able to take BOMs in XML files.
Comment 2 Rafael Chaves CLA Friend 2004-06-21 12:45:30 EDT
Darin, actually I am seeing problems happening while using IBM's VM (which uses
Xerces). Using Sun's vm does not cause any problem. Can you confirm this?
Comment 3 Rafael Chaves CLA Friend 2004-06-21 12:50:18 EDT
This is what I am seeing with IBM 1.4.1 (if you enable tracing for
org.eclipse.core.runtime and and check the contenttypes/debug debug option, you
should be able to see errors thrown by content describers in the log).

sun.io.MalformedInputException
	at sun.io.ByteToCharUnicode.flush(ByteToCharUnicode.java:214)
	at sun.nio.cs.StreamDecoder$ConverterSD.flushInto(StreamDecoder.java:305)
	at sun.nio.cs.StreamDecoder$ConverterSD.implRead(StreamDecoder.java:329)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:222)
	at java.io.InputStreamReader.read(InputStreamReader.java:207)
	at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
	at org.apache.xerces.impl.XMLEntityScanner.skipString(Unknown Source)
	at
org.apache.xerces.impl.XMLDocumentScannerImpl$XMLDeclDispatcher.dispatch(Unknown
Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
	at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
	at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at javax.xml.parsers.SAXParser.parse(Unknown Source)
	at
org.eclipse.core.internal.content.XMLRootHandler.parseContents(XMLRootHandler.java:176)
	at
org.eclipse.core.runtime.content.XMLRootElementContentDescriber.checkCriteria(XMLRootElementContentDescriber.java:75)
	at
org.eclipse.core.runtime.content.XMLRootElementContentDescriber.describe(XMLRootElementContentDescriber.java:105)
	at org.eclipse.core.internal.content.ContentType.describe(ContentType.java:172)
	at
org.eclipse.core.internal.content.ContentTypeManager.internalFindContentTypesFor(ContentTypeManager.java:278)
	at
org.eclipse.core.internal.content.ContentTypeManager.getDescriptionFor(ContentTypeManager.java:244)
	at
org.eclipse.core.internal.resources.ContentDescriptionManager.readDescription(ContentDescriptionManager.java:93)
	at
org.eclipse.core.internal.resources.ContentDescriptionManager.getDescriptionFor(ContentDescriptionManager.java:58)
	at org.eclipse.core.internal.resources.File.getCharset(File.java:220)
Comment 4 Darin Swanson CLA Friend 2004-06-21 13:53:59 EDT
The content type fails to be set for me using a Sun 1.4.2 VM (crimson parser) 
using the test case of bug 61564.

Debugging the following fatal exception occurs:

Thread [main] (Suspended (exception SAXParseException))
	XMLRootHandler(DefaultHandler).fatalError(SAXParseException) line: 447
	Parser2.fatal(String, Object[], Exception) line: 3342
	Parser2.fatal(String) line: 3327
	Parser2.parseInternal(InputSource) line: 635
	Parser2.parse(InputSource) line: 333
	XMLReaderImpl.parse(InputSource) line: 448
	SAXParserImpl(SAXParser).parse(InputSource, DefaultHandler) line: 345
	XMLRootHandler.parseContents(InputSource) line: 176
	XMLRootElementContentDescriber.checkCriteria(InputSource) line: 75
	XMLRootElementContentDescriber.describe(InputStream, 
IContentDescription) line: 105
	ContentType.describe(IContentDescriber, InputStream, 
ContentDescription) line: 172
	ContentTypeManager.internalFindContentTypesFor(InputStream, IContentType
[]) line: 278
	ContentTypeManager.getDescriptionFor(InputStream, String, QualifiedName
[]) line: 244
	ContentDescriptionManager.readDescription(File) line: 92
	ContentDescriptionManager.getDescriptionFor(File, ResourceInfo) line: 57
	File.getContentDescription() line: 239
	EncodingActionGroup.getEncodingFromContent(IFile) line: 209
	EncodingActionGroup.getDefaultEncodingText(ITextEditor, String) line: 
196
	EncodingActionGroup.access$0(ITextEditor, String) line: 186
	EncodingActionGroup$PredefinedEncodingAction.update() line: 166
	EncodingActionGroup.update() line: 437
	DefaultEncodingSupport.reset() line: 93
	AntEditor(TextEditor).updatePropertyDependentActions() line: 312
	AntEditor(AbstractTextEditor).firePropertyChange(int) line: 4525
	AbstractTextEditor$3.run() line: 301
	AbstractTextEditor$ElementStateListener.execute(Runnable) line: 424
	AbstractTextEditor$ElementStateListener.elementDirtyStateChanged
(Object, boolean) line: 304
	TextFileDocumentProvider$FileBufferListener.dirtyStateChanged
(IFileBuffer, boolean) line: 249
	TextFileBufferManager.fireDirtyStateChanged(IFileBuffer, boolean) line: 
240
	ResourceTextFileBuffer(ResourceFileBuffer).commit(IProgressMonitor, 
boolean) line: 304
	AntEditorDocumentProvider(TextFileDocumentProvider).commitFileBuffer
(IProgressMonitor, TextFileDocumentProvider$FileInfo, boolean) line: 680
	TextFileDocumentProvider$2.execute(IProgressMonitor) line: 642
	TextFileDocumentProvider$2
(TextFileDocumentProvider$DocumentProviderOperation).run(IProgressMonitor) 
line: 105
	WorkspaceModifyDelegatingOperation.execute(IProgressMonitor) line: 67
	WorkspaceModifyOperation$1.run(IProgressMonitor) line: 91
	Workspace.run(IWorkspaceRunnable, ISchedulingRule, int, 
IProgressMonitor) line: 1673
	WorkspaceModifyDelegatingOperation(WorkspaceModifyOperation).run
(IProgressMonitor) line: 105
	WorkspaceOperationRunner.run(boolean, boolean, IRunnableWithProgress, 
ISchedulingRule) line: 73
	WorkspaceOperationRunner.run(boolean, boolean, IRunnableWithProgress) 
line: 63
	AntEditorDocumentProvider(TextFileDocumentProvider).executeOperation
(TextFileDocumentProvider$DocumentProviderOperation, IProgressMonitor) line: 403
	AntEditorDocumentProvider(TextFileDocumentProvider).saveDocument
(IProgressMonitor, Object, IDocument, boolean) line: 623
	AntEditor(AbstractTextEditor).performSave(boolean, IProgressMonitor) 
line: 3444
	AntEditor(AbstractTextEditor).doSave(IProgressMonitor) line: 3233
	AntEditor.doSave(IProgressMonitor) line: 683
	EditorManager$12.run(IProgressMonitor) line: 1160
	EditorManager$10.run(IProgressMonitor) line: 1015
	ModalContext.runInCurrentThread(IRunnableWithProgress, 
IProgressMonitor) line: 303
	ModalContext.run(IRunnableWithProgress, boolean, IProgressMonitor, 
Display) line: 253
	ApplicationWindow$1.run() line: 588
	BusyIndicator.showWhile(Display, Runnable) line: 69
	WorkbenchWindow(ApplicationWindow).run(boolean, boolean, 
IRunnableWithProgress) line: 585
	WorkbenchWindow.run(boolean, boolean, IRunnableWithProgress) line: 1653
	EditorManager.runProgressMonitorOperation(String, 
IRunnableWithProgress, IWorkbenchWindow) line: 1021
	EditorManager.savePart(ISaveablePart, IWorkbenchPart, boolean) line: 
1165
	WorkbenchPage.savePart(ISaveablePart, IWorkbenchPart, boolean) line: 
2528
	WorkbenchPage.saveEditor(IEditorPart, boolean) line: 2540
	SaveAction.run() line: 69
	SaveAction(Action).runWithEvent(Event) line: 881
	ActionHandler.execute(Map) line: 141
	Command.execute(Map) line: 132
	WorkbenchKeyboard.executeCommand(String) line: 469
	WorkbenchKeyboard.press(List, Event) line: 887
	WorkbenchKeyboard.processKeyEvent(List, Event) line: 928
	WorkbenchKeyboard.filterKeySequenceBindings(Event) line: 546
	WorkbenchKeyboard.access$2(WorkbenchKeyboard, Event) line: 494
	WorkbenchKeyboard$1.handleEvent(Event) line: 259
	EventTable.sendEvent(Event) line: 82
	Display.filterEvent(Event) line: 714
	Tree(Widget).sendEvent(Event) line: 795
	Tree(Widget).sendEvent(int, Event, boolean) line: 820
	Tree(Widget).sendEvent(int, Event) line: 805
	Tree(Control).sendKeyEvent(int, int, int, int, Event) line: 1734
	Tree(Control).sendKeyEvent(int, int, int, int) line: 1730
	Tree(Control).WM_CHAR(int, int) line: 3067
	Tree.WM_CHAR(int, int) line: 1372
	Tree(Control).windowProc(int, int, int, int) line: 2970
	Display.windowProc(int, int, int, int) line: 3298
	OS.DispatchMessageW(MSG) line: not available [native method]
	OS.DispatchMessage(MSG) line: 1467
	Display.readAndDispatch() line: 2396
	Workbench.runEventLoop(Window$IExceptionHandler, Display) line: 1375
	Workbench.runUI() line: 1346
	Workbench.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 252
	PlatformUI.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 141
	IDEApplication.run(Object) line: 96
	PlatformActivator$1.run(Object) line: 335
	EclipseStarter.run(Object) line: 272
	EclipseStarter.run(String[], Runnable) line: 128
	NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not 
available [native method]
	NativeMethodAccessorImpl.invoke(Object, Object[]) line: 39
	DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 25
	Method.invoke(Object, Object[]) line: 324
	Main.basicRun(String[]) line: 186
	Main.run(String[]) line: 647
	Main.main(String[]) line: 631
Comment 5 Rafael Chaves CLA Friend 2004-06-21 17:43:16 EDT
Thanks, Darin, the problem I was seeing was actually bug 67975 (UTF-16 BOM on
Windows IBM VM). 

This bug is caused by:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058

And affects not only content type determination, but also any other Crimson
clients such as Ant. When running manually or as an external builder, I get this
error:

Buildfile: d:\temp\tests\runtime-workbench\AntTest\build.xml
BUILD FAILED: D:\temp\tests\runtime-workbench\AntTest\build.xml:1: Document root
element is missing.
Total time: 125 milliseconds

There is one thing I do not understand: when running with IBM 1.4.1, content
type detection works fine, but running Ant manually (using the "Run Ant" action)
fails. What is worse, running the same script as external too builder works
fine. Darin, is there any difference between the two modes of running Ant w.r.t.
XML parsing?
Comment 6 Darin Swanson CLA Friend 2004-06-21 17:48:15 EDT
All kinds of differences :-)
All depends on the VM you are running the Ant build within and what is on the 
Ant runtime classpath. You can specify Xerces to be on the Ant runtime 
classpath and then Xerces is used as the parser (just like Ant at the 
commandline).

External tool builders by default run in the same VM (IBM 1.4.1). Your Run As 
test case: is that running in IBM 1.4.1 or in a Sun VM? What does its runtime 
classpath look like?
Comment 7 Rafael Chaves CLA Friend 2004-06-21 17:55:44 EDT
Ok, got it. Since I am running (alternately) wit Sun and IBM VMs on the same
workspace, I guess I ran Ant for the first time using Sun's VM, and then when
running with IBM's VM the original settings (Sun's) were remembered. Using the
"Run->Ant build..." action, it seems I caused the settings to be re-computed for
the current default JRE, because then Ant worked.
Comment 8 Darin Swanson CLA Friend 2004-06-21 21:54:12 EDT
So you are going to add a readme section about this. 
I should probably add one in Ant land to specify how to run the Ant builds for 
buildfiles that do contain a BOM. Logged bug 68132
Comment 9 Rafael Chaves CLA Friend 2004-06-22 13:49:32 EDT
So is there an workaround (other than running with Xerces?)?

Also, the problem is only with UTF-8 BOMs (UTF-16 BOMs are fine).
Comment 10 Darin Swanson CLA Friend 2004-06-22 13:51:23 EDT
Not that I know of.
Comment 11 DJ Houghton CLA Friend 2004-06-22 15:45:11 EDT
Added to README for 3.0.
Comment 12 Rafael Chaves CLA Friend 2004-07-19 10:56:47 EDT
*** Bug 70177 has been marked as a duplicate of this bug. ***
Comment 13 Damien Mascord CLA Friend 2004-07-19 11:07:05 EDT
Would it be possible to read in the entire file, strip out the BOM characters, 
and process as normal?

I know that this is probably a bit overkill for every parse of every file, but 
if we encounter this particular Exception in parsing, couldn't we rejig the 
file a bit to strip the BOM in memory before parsing ?