Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 456082 - Gerrit Connector meet the CJK problem
Summary: Gerrit Connector meet the CJK problem
Status: CLOSED MOVED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Mylyn (show other bugs)
Version: 2.4   Edit
Hardware: PC Windows NT
: P3 critical with 1 vote (vote)
Target Milestone: ---   Edit
Assignee: Mylyn Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-23 09:52 EST by Fei Li CLA
Modified: 2015-02-02 14:10 EST (History)
2 users (show)

See Also:


Attachments
CJK char can not be displayed (54.44 KB, image/png)
2014-12-23 09:52 EST, Fei Li CLA
no flags Details
compare view (19.67 KB, image/png)
2014-12-23 09:54 EST, Fei Li CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fei Li CLA 2014-12-23 09:52:06 EST
Created attachment 249612 [details]
CJK char can not be displayed

I installed the nightly build(version: 2.5.0.N20141217-1956) gerrit connector, and found that it has the CJK problem. See the pictures for more detail.
Comment 1 Fei Li CLA 2014-12-23 09:54:11 EST
Created attachment 249613 [details]
compare view

code compare view has the problem too.
Comment 2 Miles Parker CLA 2014-12-23 19:25:05 EST
Since you say "the CJK problem" hopefully you have some more background on this? Test case (including public server changes, maybe just a bogus eclipse review) would be really helpful.
Comment 3 Fei Li CLA 2014-12-25 06:37:48 EST
Use the word "测试" for test.

In python shell:

>>> aaaa = '测试'
>>> print aaaa
测试
>>> aaaa
'\xe6\xb5\x8b\xe8\xaf\x95'
>>>
>>> aaaa = u'测试'
>>> aaaa
u'\u6d4b\u8bd5'

I publish a comment with the word 测试 and capture the post request:

In text format:

POST /a/changes/28/revisions/4/review HTTP/1.1
Accept: application/json
X-Gerrit-Auth: aSceprrIwDs0TIrY5jNefjXWZj-7e8hQoW
User-Agent: Jakarta Commons-HttpClient/3.1
Host: gerrit.xxxxxxx.cn
Cookie: $Version=0; GerritAccount=aSceprqEp7Sf1WT-NkwMrp-Fi6Hd0-Hv7a; $Path=/
Content-Length: 62
Content-Type: application/json

{"message":"娴�璇�","labels":{"Verified":-1,"Code-Review":-2}}

In hex format:

00000000  7b 22 6d 65 73 73 61 67 65 22 3a 22 e6 b5 8b e8   {"message":"    
00000010  af 95 22 2c 22 6c 61 62 65 6c 73 22 3a 7b 22 56     ","labels":{"V
00000020  65 72 69 66 69 65 64 22 3a 2d 31 2c 22 43 6f 64   erified":-1,"Cod
00000030  65 2d 52 65 76 69 65 77 22 3a 2d 32 7d 7d         e-Review":-2}}  

I get a comment with "²âÊÔ".

I fake a request like this:

POST /a/changes/28/revisions/4/review HTTP/1.1
Accept: application/json
X-Gerrit-Auth: aSceprrIwDs0TIrY5jNefjXWZj-7e8hQoW
User-Agent: Jakarta Commons-HttpClient/3.1
Host: gerrit.xxxxxxx.cn
Cookie: $Version=0; GerritAccount=aSceprqEp7Sf1WT-NkwMrp-Fi6Hd0-Hv7a; $Path=/
Content-Type: application/json
Content-Length: 68

{"message":"\u6d4b\u8bd5","labels":{"Verified":-1,"Code-Review":-2}}

and I can get a correct comment with "测试".

So the problem is gerrit server does not decode UTF-8 encoded JSON string.
There's two way to fix this:

1. use ascii-escaping JSON string when post.
2. make gerrit decode UTF-8 encoded JSON string correctly.

Link:
http://stackoverflow.com/questions/583562/json-character-encoding-is-utf-8-well-supported-by-browsers-or-should-i-use-nu
Comment 4 Fei Li CLA 2014-12-25 07:20:22 EST
Bad luck, I found this link

https://code.google.com/p/google-gson/issues/detail?id=388
Comment 6 Miles Parker CLA 2014-12-26 14:48:23 EST
It seems that we'd want to use the ASCII escaping approach (1) because otherwise (2) you'd get garbage in Web UI when you tried to read something posted from Gerrit Reviews, right? We need to act just like the web client unless there is simply no way to do that. However, the bad thing is that we would need to encode every comment by walking through it char by char. Considering this is all in memory, it might not be a terrible thing to do, but I wish we could think of a way to avoid it..
Comment 7 Miles Parker CLA 2014-12-26 15:29:35 EST
(Huh, this if revealing an issue with the Bugzilla comment handling itself! It seems that json encoding is breaking the message.)
Comment 8 Miles Parker CLA 2014-12-26 16:33:36 EST
Li, the entry point is pretty straightforward, but I'm having a bit of trouble getting the unicode encoding working as needed. Everything I've tried ends up giving me "\\u.." in json message body which of course won't work -- the Writer approach in link isn't well suited to this usage either. I'll take another look next week but in meantime if you can find a clean way to do that Escaped Unicode String -> Json String, that would be reallhy helpful.
Comment 9 Fei Li CLA 2014-12-26 19:40:38 EST
Thanks for quick reply.

I'm trying to do some tests too, but I'm not a Java programmer and because of complex design patterns I don't known which file should be modified. Could you tell me in which interface/method is the best place to do such conversion, so I can submit a patch.
Comment 10 Miles Parker CLA 2014-12-30 16:19:02 EST
See this review for the correct entry point. Unfortunately, it doesn't actually work because of tricky Java escape sequence handling.

https://git.eclipse.org/r/38856
Comment 11 Sam Davis CLA 2015-01-05 15:00:22 EST
Is this related to bug 438139?
Comment 12 Miles Parker CLA 2015-01-06 15:32:03 EST
Yes, it appears so.
Comment 13 Moonki Cho CLA 2015-02-02 04:19:42 EST
korean comment message same problem.
This problem, I do fix it myself?
Do you plan to bug fix?
Comment 14 Moonki Cho CLA 2015-02-02 04:20:09 EST
korean comment message same problem.
This problem, I do fix it myself?
Do you plan to bug fix?
Comment 15 Sam Davis CLA 2015-02-02 14:10:14 EST
It would be great if you could take a look at the work in progress at https://git.eclipse.org/r/#/c/38856/ (which doesn't work) and either push a better fix to Gerrit or suggest improvements.
Comment 16 Eclipse Webmaster CLA 2022-11-15 11:45:08 EST
Mylyn has been restructured, and our issue tracking has moved to GitHub [1].

We are closing ~14K Bugzilla issues to give the new team a fresh start. If you feel that this issue is still relevant, please create a new one on GitHub.

[1] https://github.com/orgs/eclipse-mylyn