Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 324988

Summary: UTF-8 header value not supported because BufferUtil.to8859_1_String(_value)
Product: [RT] Jetty Reporter: pascal gehl <pascal.gehl>
Component: serverAssignee: Greg Wilkins <gregw>
Status: RESOLVED INVALID QA Contact:
Severity: major    
Priority: P3 CC: jetty-inbox, mgorovoy
Version: unspecified   
Target Milestone: 7.1.x   
Hardware: PC   
OS: Windows XP   
Whiteboard:

Description pascal gehl CLA 2010-09-10 12:15:46 EDT
Hi,

We are using jetty 6 (6.1.22) and jetty 7 (7.1.6) on Windows XP with JDK 1.5_16 in embedded mode to implement server to server components using HTTP communication (mostly REST/XML).
Some request information is stored in request headers. We support western European languages as well as Eastern European and Asiatic languages. So we encode header values in UTF-8.
Unfortunately we found out that jetty automatically decodes header values using ISO-8859-1 before can decode them using UTF-8.
It's done in BufferUtil:
public static String to8859_1_String(Buffer buffer)
    {
        if (buffer instanceof CachedBuffer)
            return buffer.toString();
        return buffer.toString(StringUtil.__ISO_8859_1);
    }
which is used in Field.getName and Field.getValue methods.
This prevents us from using UTF-8 to support a wide range of languages.

Thanks
Comment 1 Greg Wilkins CLA 2010-09-13 19:49:09 EDT
Pascal,

I believe HTTP headers MUST be iso-8859-1.  I think there has been a recent relaxation of the requirement for URLs to be iso-8859-1, but they are still encoded and I have seen nothing about UTF-8 headers.

Can you point me to an RFC that describes how UTF-8 headers should be handled?
Comment 2 Greg Wilkins CLA 2010-09-14 02:40:26 EDT
Pascal,

I did some more research on this, specifically by reading the httpbis draft at http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-11 which shows the current intention of the ietf standards process as:

   Historically, HTTP has allowed field content with text in the ISO-
   8859-1 [ISO-8859-1] character encoding and supported other character
   sets only through use of [RFC2047] encoding.  In practice, most HTTP
   header field values use only a subset of the US-ASCII character
   encoding [USASCII].  Newly defined header fields SHOULD limit their
   field values to US-ASCII characters.  Recipients SHOULD treat other
   (obs-text) octets in field content as opaque data.

thus UTF-8 should not be used for header values.

regards