Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 251748

Summary: XMLContentDescriber doesn't set charset for XML declarations with line feeds
Product: [Eclipse Project] Platform Reporter: Nick Sandonato <nsand.dev>
Component: ResourcesAssignee: Szymon Brandys <Szymon.Brandys>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: david_williams, Szymon.Brandys, thatnitind
Version: 3.5   
Target Milestone: 3.5 M3   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on:    
Bug Blocks: 251659    
Attachments:
Description Flags
Fix v01 none

Description Nick Sandonato CLA 2008-10-22 14:14:12 EDT
It looks like the patch for bug 249214 changed how the XML declaration is read for XML content.  If a line feed is detected in the XML declaration before the encoding is read in, the describe method will return INVALID.

Before the patch, the reader would based the end of the XML declaration on coming across a '?'. Now, the reader checks for either '\r' or '\n'. But, according to the XML spec for the Text Declaration (http://www.w3.org/TR/REC-xml/#sec-TextDecl), the EncodingDecl can contain whitespace that includes (#x20 | #x9 | #xD | #xA)+.

An example of a failing XML declaration is:
<?xml version="1.0"  

encoding="ISO-8859-1" 

 ?>
Comment 1 Nick Sandonato CLA 2008-10-22 14:24:17 EDT
Sorry, I meant that the property IContentDescription.CHARSET will not be set for the description, not that it will return INVALID when linefeeds are present before the encoding attribute.
Comment 2 Szymon Brandys CLA 2008-10-23 08:20:03 EDT
I'm on it.
Comment 3 Szymon Brandys CLA 2008-10-29 07:40:44 EDT
Created attachment 116405 [details]
Fix v01
Comment 4 Szymon Brandys CLA 2008-10-29 08:14:23 EDT
Released.
Comment 5 Szymon Brandys CLA 2008-10-29 08:14:48 EDT
The patch was slightly modified before the release.