| Summary: | issue with encoding in news items displayed via Pheonix | ||
|---|---|---|---|
| Product: | Community | Reporter: | David Williams <david_williams> |
| Component: | Website | Assignee: | phoenix.ui <phoenix.ui-inbox> |
| Status: | RESOLVED WORKSFORME | QA Contact: | |
| Severity: | major | ||
| Priority: | P3 | CC: | bob.fraser, david_williams, jesper, thatnitind, webmaster |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Windows XP | ||
| Whiteboard: | |||
|
Description
David Williams
Bob ... do you know how to fix? As you say, somebody is interpreting the UTF-8 as ISO-8859-1. There is a mismatch between what the HTTP headers say: Content-Type: text/html; charset=ISO-8859-1 and what the document itself says: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> The document carries no XML declaration The workaround: If it is XML, you should be able to use the character entity notation of ø - it's like the slash - o. Phoenix is not the only Eclipse system which has i18n problems - see bug 211139: My name is cursed, I guess. This is a tough one. The good news is that the RSS feed looks good. The o-slash shows up fine. And using the actual character itself is legal ISO-8859-1. I believe there is a problem in the php part of the code. I saw this problem before with non-blanking space. Furthermore, the problem is with eclipse.org. The character renders fine on my box running MAMP. It may be an http server setting or a php setting. I've tried the ø fix, but same problem. In fact, changing the encoding in the browser, from ISO-8859-1 to UTF-8 allows the character to be displayed correctly, so ... it does seem a matter that the HTTP header is wrong. I suppose another work-around would be to use ISO-8859-1 on our XML file? Changing title to emphasis that the news item, as a news item if fine ... it's just the Phoenix version on our webpage, at http://www.eclipse.org/webtools that is incorrect. (see the "news" section of that page). Is Phoenix the right component for this? I think it may be a problem with server set up, PHP specifically. To clarify Jesper's remarks, if you do a WGET -S, you can see the HTTP header is set to ISO-8859-1, even though the HTML output itself says UTF-8. Here's what I see $ wget -S http://www.eclipse.org/webtools/index.php --01:23:47-- http://www.eclipse.org/webtools/index.php => `index.php.2' Resolving www.eclipse.org... 206.191.52.50 Connecting to www.eclipse.org|206.191.52.50|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Tue, 22 Jan 2008 06:23:47 GMT Server: Apache Cache-Control: max-age=86400 Expires: Wed, 23 Jan 2008 06:23:47 GMT Connection: close Content-Type: text/html; charset=ISO-8859-1 Length: unspecified [text/html] Now, my first thought was this was being done in the pheonix scripts (and other themes) but the header.php files look correct, and they look like the output of the UTF-8 charset comes very early, so I don't think it's a matter that the output buffer is filling up and MUST provide some header. Googling around in the PHP docs, and looking at my PHP ini file, I see this: = = = = = = = = As of 4.0b4, PHP always outputs a character encoding by default in ; the Content-type: header. To disable sending of the charset, simply ; set it to be empty. ; ; PHP's built-in default is text/html default_mimetype = "text/html" ;default_charset = "iso-8859-1" = = = = = = = = On mine, it's commented out, such as this, and indeed, I get no HTTP header. Though it sounds like, when they set "set to empty" that perhaps some people have to set it such as default_charset= or something. So ... any of you server guys care to experiment? :) I think this should be in the "Eclispe Foundation" product, in the Server component (or, website?) but ... darn if I can tell how to move it there? (In reply to comment #7) > darn if I can tell how to move it there? Set product to Community, choose "Reassign bug to default assignee and QA contact, and add Default CC of selected component" and commit. I can confirm Davids findings as well. I have had problems with nonblanking space, either with the actual ascii code or with the entity   Worked on my machine but not on eclipse.org. My recommendation would be to either remove the offending php config that is sending the character encoding head altogether or set it to utf-8. I'm changing this to "major" (missing function) partially since I see I've ran into this before, and opened a bug 210887. Also, I think main problem is is not PHP. I tried to get a plain HTML file, that says utf-8, and it too has the incorrect problematic http header. If interested, try http://www.eclipse.org/webtools/wst/components/server/index.html I suspect the problem is that the apache server specifies AddDefaultCharset ISO-8859-1, and it should not set any default (allowing content providers to set their own). As an aside, if the php.ini file in /etc/php.ini is the one that's actually used by eclipse servers, I see it had outputbuffering "off". This can sometimes prohibit content providers from setting their own charset (header) in their own php script ... but, one problem at a time, I guess. Indeed, the default character set configuration comes from Apache via the AddDefaultCharset directive, not PHP. If you use PHP, you can override the default character set with the header() function before sending content. This is a design "feature" with Phoenix, as the headers are not mangled by our page generation code. <?php header( "Content-type: text/html; charset=utf-8" ); ?> This header() function can be added to your _projectCommon.php file to be applied to all your project's web pages. For static html files, one could argue that your 'index.html' page isn't a Phoenix page, and needs to be updated. Beyond that, nothing is stopping you from overriding the default Apache setting by creating a .htaccess file in your directory with this line in it: AddDefaultCharset UTF-8 Try this (using a .htaccess): wget -S http://www.eclipse.org/webmaster/main.html HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Wed, 23 Jan 2008 18:42:19 GMT [snip] Content-Type: text/html; charset=UTF-8 I'm closing as WORKSFORME because, although we have specified a default system-wide character set according to recommendations from the Apache docs [1], the system is flexible in that you are free to override it. > As an aside, if the php.ini file in /etc/php.ini is the one that's actually > used by eclipse servers, I see it had outputbuffering "off". This can sometimes > prohibit content providers from setting their own charset (header) in their own > php script ... but, one problem at a time, I guess. Although we're not using /etc/php.ini, we have intentionally set OutputBuffering=Off because, as stated above, the way Phoenix is setup allows you send headers before the content (no loss of functionality for you), and setting the buffer off yields better performance (happy webmaster). [1] From mod_mime-defaults.conf # # Specify a default charset for all pages sent out. This is # always a good idea and opens the door for future internationalisation # of your web site, should you ever want it. Specifying it as # a default does little harm; as the standard dictates that a page # is in iso-8859-1 (latin1) unless specified otherwise i.e. you # are merely stating the obvious. There are also some security # reasons in browsers, related to javascript and URL parsing # which encourage you to always set a default char set. # (see http://httpd.apache.org/info/css-security/) # I will try out the header( "Content-type: text/html; charset=utf-8" ); fix on the web tools pages. Bob Worked fine. I will update our site. Hot dogs ... .htaccess files! :) I'm glad the common project approach works. Much thanks Denis. (and Bob, and Mr. Møller. :) And, Denis, thanks for fixing bug 210887 too. |