| Summary: | JSP/HTML editor doesn't recognize lowercase tags in Turkish locale | ||
|---|---|---|---|
| Product: | [WebTools] WTP Source Editing | Reporter: | Tuncay Baskan <tbaskan> |
| Component: | jst.jsp | Assignee: | David Williams <david_williams> |
| Status: | CLOSED FIXED | QA Contact: | |
| Severity: | normal | ||
| Priority: | P2 | CC: | kitlo, lmcliste, luol, turkoglu.deniz |
| Version: | unspecified | ||
| Target Milestone: | 1.5.3 M153 | ||
| Hardware: | PC | ||
| OS: | Windows XP | ||
| Whiteboard: | |||
| Attachments: | |||
|
Description
Tuncay Baskan
Can you attach samples or JUnit tests that show the problem? That'd make it easier for us. (In reply to comment #1) > Can you attach samples or JUnit tests that show the problem? > That'd make it easier for us. > Sorry for late reply, I was on vacation. I'll download 0.7 RC3 and send you samples. I don't know how to generate a JUnit test for this problem. Created attachment 25411 [details]
Shows incorrect behavior of JSP/HTML editor to different title tags
Comment on attachment 25411 [details]
Shows incorrect behavior of JSP/HTML editor to different title tags
My Windows is configured to use Turkish as locale. The attached image has three
different typed <title> tags.
JSP/HTML editor incorrectly marks <title> as "unknown tag". The other tags
<tıtle> and <TITLE> tags are accepted as normal.
I think after loading DTDs the editor uses String.toLowerCase or
String.toUpperCase without specifying US Locale. Therefore alphabets are
incorrectly lowered/uppered.
Ok, thanks for the test case. I will investigate. But ... the HTML file says is encoding is UTF-8 (in the META tag) ... we detect that and take it at is word. Could that be related? How do you generate the source for this file? Can you attach the source? Thanks. (In reply to comment #5) > Ok, thanks for the test case. I will investigate. > But ... the HTML file says is encoding is UTF-8 (in the META tag) ... we detect > that and take it at is word. Could that be related? How do you generate the > source for this file? > Can you attach the source? > Thanks. I don't think there is a relation between the encoding of the file and tag handling. I tried to change the charset attribute to "windows-1254" (Windows Turkish), windows-1252, ISO8859-9, ISO8859-1 but nothing changed about the tags. I didn't understand "how do you generate" question. I simply select File->New JSP in J2EE perspective. Below is the complete source of the file: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>content of lowercase title tags</title> <tıtle>content of lowercase title tags (no diaresis over i)</tıtle> <TITLE>content of uppercase tags</TITLE> </head> <body> </body> </html> More information about Turkish 'I' problem can be found at: http://blogs.msdn.com/deeptanshuv/archive/2004/09/04/225720.aspx From using the small sample in comment #6, in a JSP file, I see a warning on the <tıtle> tag. It says "unknown tag (t)". Is that what you mean? That appears a little different than the attached JPEG, so, not exactly sure what's causing that difference. And want to be sure I'm looking at correctly. Other two title tags seem fine. Also, I tried the small sample in an HTML and it did not generate warnings on either of the three versions of the title tag. Does that match your observations as well? I'm using some windows cp1552 setting, so wanted to be sure I wasn't fooling myself. We do have code in places to handle the famous Turkish I, and if its only a problem for the HTML in JSP case, I suspects there's something going wrong in the "translation" from JSP to HTML. Hi, The problem, afaik, resides in org.eclipse.wst.html.core_1.0.0.v200602062135.jar which seems to handle the tags, since the only thing I've noticed is the tags being declared as uppercase only, the bug seems to be causing from an obsolete toLowerCase() with current/default locale, I say obsolete cause tags are US/EN only so this seems to cause tags with our famous I to become invalid tags (ones with ı). I understand the fact and difficulty in tracing such a bug without a Turkish locale and keyboard but it so obvious for one with having both. To make things cleaner, see the code below String test="INPUT"; System.out.println(test.toLowerCase()); which returns "ınput" in my computer whereas it should be "input" when it comes to html. I am well familar with the general problem, it is well documented and I think much more commonly known that you realize -- at least among us that work with text for a living. And, I think that makes you miss my question ... I do not see the problem described here, with a pure and plain HTML file. Do you? I am thinking its isolated to the JSP case, and was hoping to get some help narrowing it down between the HTML file case, and the JSP file case. It's indeed isolated to the jsp case, and yes it's plain HTML file autocomplete problem, Eclipse WTP does support HTML tags autocomplete but when it's used with a tag containg "I" char, in Turkish locale , it lowers it to "i" without a dot, it's not in any way, in my opinion, releated to JSP. Ok, now I'm getting more confused :) Is it a "validation" problem, or a "content assist" problem? Or both? If a validation problem, please attach a zipped up a JSP file that shows the problem, and, if I am reading your comment correctly, please attach a zipped up HTML file that shows its not a problem there. If its a content assist problem, can you help by discribing how I might reproduce the problem with a non-turkish keyboard? If there is not way ... I'll have to dig around for one, I guess, but would take longer. If it is a content assist probem, is the problem with what the proposals are? Or a problem with what happens after the text is inserted, from selecting one of the proposals? It's a content assist problem leading to validation problem (occurs when assisted content is modified by user). If I press "i" key it assists input, iframe, etc.. but with letter I (small one, i without dot, bugzilla encodes small I...), please check screenshot, so to reproduce the bug you should not need a Turkish keyboard, just change all your locale settings to Turkish, and there you have it, press i and see what it assists you (ones starting with small I) and when you select one of those, Eclipse will say all fine, but when you correct small "I" with "i", it will throw a validation error, to make things crystal clear (hopefully) I am attaching some files/screenshots I created, hope it helps... (See the content assist and marking title tag invalid which is totally valid) Created attachment 37604 [details]
Content assist and invalidation at title tag
See the content assist and validation bug at title tag
Created attachment 37605 [details]
File that is validated by Eclipse
Created attachment 37606 [details]
File with invalid tags (according to WTP)
Created attachment 37663 [details]
a zip file containng one tiny tet.jsp with turkish i's in UTF-8 encoding
Thanks for all the info. But, I am having difficulty reproducing any problem of the types you mention. Can you please attach to this bugzilla your "configuration" file .. that lists plugins installed, system encoding, version of Java used, etc.
The JSP file I attached is a vesion produced on my system. I'd be interested in what you see if you unzip it, import it into an existing web project.
BTW, I attached as (binary)zip, to make sure that attaching it to bugzilla did not change any of the characters. For future reference, I think that's the best (only?) way to exchange files with encoding issues.
Gee ... Naci should have shown me the problem at EclipseCon :)
Created attachment 37726 [details]
Pack including screenshot of tet.jsp from former attachment and configuration file
I am attaching the files that you have asked for, I have checked out the latest CVS last night and have seen a comment that is related to the very bug we are on, can you please check ElementNodeCleanupHandler.java .ln 175 for the comment I'm talking about. Btw, I really wonder what Eclipse says to the former attachment you've made (tet.jsp).
As for the Mr. Naci not showing you the bug, I've shown it to him right after he came from EclipseCon and he made me submit (well, it was already submitted) it.
The screen shot in comment #18 is very interesting. In my system, there are no warnings. On my system, I would have expected a warning on the one case that had <tıtle> but that is the one case that does not show an error on your system?! So, that makes me wonder ... you did import that into eclipse with eclipse file import ... right? (And not some copy/paste function, using some other viewer/editor?) Let's get down to the bits and bytes ... if you look at that tet.jsp file on your system with a binary editor/viewer, does it show hex 69 for the lower case 'i's? (it does on mine, that should be what is in the zip file version). I guess I'm wondering if there is some step in the process, on your system, that is converting those lowercase (english) i's to something else. As another suggestion to help rule out "odd" cases ... have you tried this with a completely fresh version of Eclipse and WTP, downloaded from Eclipse.org, and with a completely new and fresh workspace? Ok, FYI, I have been able to finally reproduce this now. For me, the way I could reproduce was to set -Dosgi.nl=tr_TR when I launch eclipse. Here are the ways that didn't work (which I list here just for future debugging reminders): 1. using -nl=tr_TR as an eclipse application arg (normally that picks up "turkish" translations). 2. using -Dfile.encodng=ISO-8859-9 3. and even changing my "windows regional settings" to "Turkey" did not work. Not sure why this last one would not have worked, but there might have been something cached in Eclipse that it remembered some previous locale setting. But, now that I see it myself, I believe you :) and will be able to debug. You've been busy while I was sleeping :) Anyway, I'm glad you've managed to reproduce the bug, please let me know if you need further assistance. Fixed released for 1.5.3 and 2.0.
The fix was in TolerantStringDualMap and serveral other classes (4 places altogether, for HTMl and CHTML, for the content model itself, and some content model "lookup" methods.
I changed
private String makeCanonicalForm(String raw) {
return raw.toUpperCase();
}
to the following
private String makeCanonicalForm(String raw) {
// see https://bugs.eclipse.org/bugs/show_bug.cgi?id=100152
// we are able to "cheat" here a little and use US Locale
// to get a good cononical form, since we are using this only
// for HTML and JSP standard tags.
// Long term, for similar needs with XML 1.1 (for example)
// we should use a class such as com.ibm.icu.text.Normalizer
return raw.toUpperCase(Locale.US);
}
*** Bug 167673 has been marked as a duplicate of this bug. *** *** Bug 166905 has been marked as a duplicate of this bug. *** *** Bug 166902 has been marked as a duplicate of this bug. *** verified using WTP1.5.3 200702060621 that the invalid warnings regarding the dotted 'i' no longer show up remaining similar bugs: bug 170267 bug 166902 |