Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 346793

Summary: [misc] XHTML: content-type meta tag is ignored during charset detection
Product: [WebTools] WTP Source Editing Reporter: Sven Köhler <sven.koehler>
Component: wst.htmlAssignee: wst.html <wst.html-inbox>
Status: RESOLVED WORKSFORME QA Contact: Nick Sandonato <nsand.dev>
Severity: normal    
Priority: P3 CC: thatnitind
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows 7   
Whiteboard:

Description Sven Köhler CLA 2011-05-21 13:16:10 EDT
Build Identifier: M20110210-1200

Hi,

in Bug 318768, I reported that the document charset was not properly detected. In fact,I believe the W3C suggests that the default charset of an XHTML document is UTF-8, and that you may specify an alternative character set via a XML declaration.

However, for backwards compatibility, the W3C suggest that people, that serve XHTML as text/html, may also use a meta tag to specify the content type.

http://www.w3.org/International/O-charset.en.php?changelang=en

I believe, the order of detection for documents with XHTML should be the following:
- look for BOM
- look for xml declarion
- look for meta tag


I believe, the last step is currently missing.


Reproducible: Always
Comment 1 Nick Sandonato CLA 2012-11-08 10:12:29 EST
Thanks for the bug report. I'm seeing the proper encoding being picked up from the meta tag in the absence of the XML declaration for XHTML files. This may have been resolved in the meantime. If you have a different scenario that we're not covering, please reopen the defect with a scenario we should try.