Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 348607

Summary: [Text] Fails to transmit special characters (Umlauts) correctly
Product: [RT] RAP Reporter: Serge Démoulin <serge_demoulin68>
Component: RWTAssignee: Project Inbox <rap-inbox>
Status: RESOLVED FIXED QA Contact:
Severity: critical    
Priority: P3 CC: beyhan.veliev, tbuschto
Version: 1.4   
Target Milestone: 1.5 M1   
Hardware: All   
OS: All   
Whiteboard: sr141
Attachments:
Description Flags
Sample View
none
screenshot
none
Proposed patch none

Description Serge Démoulin CLA 2011-06-07 12:16:24 EDT
Build Identifier: 1.4.0 RC2

When I put the chain "äüö" in a Text with setText() and I enter with the keyboard " x" then the method getText() returns "äüöx" instead of "äüö x".
In the sample it doesn't works on IE and works correctly in Firefox but in my real software I have similary problems with Firefox.
My configuration is
* RAP 1.4.0 RC2
* J2RE 1.5.0 IBM J9 2.3 
* WebSphere
* Linux


Reproducible: Always

Steps to Reproduce:
1. Make an RAP Application including the attached view
2. Start the application under Linux
3. Add the following chain at this end of the upper Text : " x"
4. Click on copy
Comment 1 Serge Démoulin CLA 2011-06-07 12:17:40 EDT
Created attachment 197515 [details]
Sample View
Comment 2 Serge Démoulin CLA 2011-06-07 12:21:10 EDT
Created attachment 197516 [details]
screenshot
Comment 3 Rüdiger Herrmann CLA 2011-06-07 15:14:53 EDT
I cannot reproduce this bug. Tried with Firefox 4.0, IE 9 and Chrome 11. Everything works as expected: the lower text widget shows "äüöx".
With which browser(s) can you provke this behavior?
Comment 4 Serge Démoulin CLA 2011-06-08 03:17:44 EDT
* The example works correctly (RAP 1.4.0-RC2 + 1.3.0) on my computer (Windows,german)
* The example works correctly with RAP 1.3.0 on the server (Linux, english)
* The example doesnt work correctly with RAP 1.4.0-RC2 on the server (Linux, english) see screenshot
Comment 5 Ivan Furnadjiev CLA 2011-06-08 03:51:44 EDT
This issue is reproducible with online Controls Demo -> Text Tab:
http://rap.eclipsesource.com/rapdemo/rap?startup=controls
Paste the text "äüö" in the text field and click on getText button. Works fine in Firefox 4 and Safari 5.0.5, but failed in Crome 12.0.742.91, Opera 11.11 and IE9.
Working fine in my local workspace in all browsers (Windows, English).
Comment 6 Serge Démoulin CLA 2011-06-08 04:59:52 EDT
WebSphere on Linux
language : english
Java version = 1.5.0, Java Compiler = j9jit23, Java VM name = IBM J9 VM
JVM parameters : 
-Xdump:java 
-Dorg.eclipse.rwt.compression=true 
-Djava.compiler=NONE 
-Xdebug 
-Xnoagent 
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=7777


Test results
------------
RAP 1.3.0                  IE8:OK      Firefox 4:OK     Chrome 11:OK
RAP 1.4.0-M6               IE8:OK      Firefox 4:OK     Chrome 11:OK
RAP 1.4.0-M7               IE8:OK      Firefox 4:OK     Chrome 11:OK
RAP nightlybuild-20110516  IE8:NOT OK  Firefox 4:OK     Chrome 11:NOT OK
RAP 1.4.0-RC2              IE8:NOT OK  Firefox 4:OK     Chrome 11:NOT OK
RAP 1.4.0-RC3              IE8:NOT OK  Firefox 4:OK     Chrome 11:NOT OK
Comment 7 Rüdiger Herrmann CLA 2011-06-08 05:50:04 EDT
Can you tell what default encoding the VM uses, i.e. the result of System.getProperty( "file.encoding" ) ?

Just a side note, the org.eclipse.rwt.compression system property was removed in M7 (see the 
"N&N":http://eclipse.org/rap/noteworthy/1.4/news_M7.php)
Comment 8 Serge Démoulin CLA 2011-06-08 06:57:29 EDT
(In reply to comment #7)
> Can you tell what default encoding the VM uses, i.e. the result of
> System.getProperty( "file.encoding" ) ?
> 
> Just a side note, the org.eclipse.rwt.compression system property was removed
> in M7 (see the 
> "N&N":http://eclipse.org/rap/noteworthy/1.4/news_M7.php)

Here the list of all the properties that may concern the encoding :

file.encoding=UTF-8
os.encoding=UTF-8
org.osgi.framework.language=de
user.language=de
java.version=1.5.0
org.osgi.framework.os.name=Linux
user.timezone=Europe/Berlin
sun.jnu.encoding=UTF-8
user.country=DE
osgi.os=linux
sun.io.unicode.encoding=UnicodeLittle
com.ibm.cpu.endian=little
osgi.nl=de_DE
ibm.system.encoding=UTF-8
java.vm.name=IBM J9 VM
user.region=DE

Thank you for the side note.
Comment 9 Ivan Furnadjiev CLA 2011-06-09 09:09:11 EDT
Created attachment 197693 [details]
Proposed patch

This patch appends "charset=UTF-8" to request header "Content-Type". Tested with Tomcat and Crome, IE9 and Opera. Problem is gone with this patch.
Comment 10 Serge Démoulin CLA 2011-06-10 04:40:22 EDT
Thank you for the correction.
Can you please submit me yow can I apply the patch myself to test it in my environment ?
I made 2 tries : 
- first I made a fragment hosted by org.eclipse.rap.rwt containing the patched file Request.js from org.eclipse.rap.rwt.q07. This fragment had a lower plugin-id than org.eclipse.rap.rwt.q07. The problem was not corrected. 
- Second I replace the file Request.js in the fragment org.eclipse.rap.q07 itself. The problem was not corrected.
Comment 11 Rüdiger Herrmann CLA 2011-06-10 04:58:30 EDT
Did you replace the Request.js in the "binary" fragment or did you check out the sources? I vaguely remember that PDE had problems with Host-Bundles and Fragments if they come from mixed locations (workspace vs. target). The safe way would be to check out o.e.rap.rwt and o.e.rap.rwt.q07.

From what I got in a discussion with Ralf is that you can also work around the problem if you re-configure Tomcat. You would have to change the "URIEncoding" to UTF-8 (see the "Tomcat docs":http://tomcat.apache.org/tomcat-7.0-doc/config/http.html)

And yet another workaround is to install a servlet filter that calls request.setCharacterEncoding( "utf-8" ) before the RAP servlet (RWTDelegate) processes the request.
Comment 12 Ivan Furnadjiev CLA 2011-06-10 05:05:58 EDT
Hi Serge, when a RAP application is deployed in Tomcat it uses "standard" client library variant (client.js). In order to run your application with the patched bundle you need either to rebuild the client.js (see http://github.com/ralfstx/rap-clientbuilder ) or force the "debug" client library variant with org.eclipse.rwt.clientLibraryVariant=DEBUG VM parameter. Make sure that you clean the browser cache too.
Comment 13 Serge Démoulin CLA 2011-06-10 05:18:28 EDT
(In reply to comment #11)
> Did you replace the Request.js in the "binary" fragment or did you check out
> the sources? I vaguely remember that PDE had problems with Host-Bundles and
> Fragments if they come from mixed locations (workspace vs. target). The safe
> way would be to check out o.e.rap.rwt and o.e.rap.rwt.q07.
> 
> From what I got in a discussion with Ralf is that you can also work around the
> problem if you re-configure Tomcat. You would have to change the "URIEncoding"
> to UTF-8 (see the "Tomcat
> docs":http://tomcat.apache.org/tomcat-7.0-doc/config/http.html)
> 
> And yet another workaround is to install a servlet filter that calls
> request.setCharacterEncoding( "utf-8" ) before the RAP servlet (RWTDelegate)
> processes the request.

Hi Rüdiger. I don't use tomcat but WebSpher. I replace the file Request.js in the compiled plug-in : org.eclipse.rap.rwt.q07.jar.
Comment 14 Serge Démoulin CLA 2011-06-10 05:21:52 EDT
(In reply to comment #11)
> Did you replace the Request.js in the "binary" fragment or did you check out
> the sources? I vaguely remember that PDE had problems with Host-Bundles and
> Fragments if they come from mixed locations (workspace vs. target). The safe
> way would be to check out o.e.rap.rwt and o.e.rap.rwt.q07.
> 
> From what I got in a discussion with Ralf is that you can also work around the
> problem if you re-configure Tomcat. You would have to change the "URIEncoding"
> to UTF-8 (see the "Tomcat
> docs":http://tomcat.apache.org/tomcat-7.0-doc/config/http.html)
> 
> And yet another workaround is to install a servlet filter that calls
> request.setCharacterEncoding( "utf-8" ) before the RAP servlet (RWTDelegate)
> processes the request.

Thank you for the workaround. Does it works also if the client browser use another encoding ?
Comment 15 Ivan Furnadjiev CLA 2011-06-10 06:27:47 EDT
In Tomcat filter example [1] the character encoding is only set if it is not specified in the request. This example provides "ignore" parameter to ignore the client encoding if needed.
[1] http://www.jdocs.com/tomcat/5.5.17/org/apache/webapp/admin/filters/SetCharacterEncodingFilter.html
Comment 16 Serge Démoulin CLA 2011-06-10 08:15:01 EDT
I install the following filter (see under) and register it as an org.eclipse.equinox.http.registry.filters and now it works correctly.
Thank you.


plugin.xml:
----------
<plugin>
   <extension
         point="org.eclipse.equinox.http.registry.filters">
      <filter
            alias="/"
            class="WorkaroundHttpFilter"
            load-on-startup="true">
      </filter>
   </extension>
</plugin>

WorkaroundHttpFilter.java:
-------------------------
public class WorkaroundHttpFilter implements Filter {
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
	final String characterEncoding = request.getCharacterEncoding();
	if (characterEncoding == null) {
	   request.setCharacterEncoding("utf-8");
	}
	chain.doFilter(request, response);
}
  ...

}
Comment 17 Rüdiger Herrmann CLA 2011-06-30 11:25:14 EDT
Applied patch to CVS HEAD and v14_Maintenance branch
Comment 18 Ralf Sternberg CLA 2011-07-04 19:46:22 EDT
For the record: after some more reading I'm now convinced that this patch is the correct solution:

The HTTP 1.1 spec defines ISO-8859-1 as the default charset. See RFC 2616, 3.7.1:
> When no explicit charset parameter is provided by the sender,
> media subtypes of the "text" type are defined to have a default
> charset value of "ISO-8859-1" when received via HTTP.

So it's correct that the servlet implementation has this default and we should not override it.
Clients have to declare their charset if it differs from ISO-8859-1.
Hence, we must handle the encoding on the client side.

The XHR candidate spec (http://www.w3.org/TR/XMLHttpRequest/) suggests that strings are always sent in UFT-8 encoding, so adding the charset parameter seems to be sufficient. We can have a closer look at this when resolving bug 351126.
Comment 19 Tim Buschtoens CLA 2011-08-25 08:47:53 EDT
Applied patch to v14_Tree_Table_Merge branch.