Community
Participate
Working Groups
Hi, We are using jetty 6 (6.1.22) and jetty 7 (7.1.6) on Windows XP with JDK 1.5_16 in embedded mode to implement server to server components using HTTP communication (mostly REST/XML). Some request information is stored in request headers. We support western European languages as well as Eastern European and Asiatic languages. So we encode header values in UTF-8. Unfortunately we found out that jetty automatically decodes header values using ISO-8859-1 before can decode them using UTF-8. It's done in BufferUtil: public static String to8859_1_String(Buffer buffer) { if (buffer instanceof CachedBuffer) return buffer.toString(); return buffer.toString(StringUtil.__ISO_8859_1); } which is used in Field.getName and Field.getValue methods. This prevents us from using UTF-8 to support a wide range of languages. Thanks
Pascal, I believe HTTP headers MUST be iso-8859-1. I think there has been a recent relaxation of the requirement for URLs to be iso-8859-1, but they are still encoded and I have seen nothing about UTF-8 headers. Can you point me to an RFC that describes how UTF-8 headers should be handled?
Pascal, I did some more research on this, specifically by reading the httpbis draft at http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-11 which shows the current intention of the ietf standards process as: Historically, HTTP has allowed field content with text in the ISO- 8859-1 [ISO-8859-1] character encoding and supported other character sets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII character encoding [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII characters. Recipients SHOULD treat other (obs-text) octets in field content as opaque data. thus UTF-8 should not be used for header values. regards