Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 100152

Summary: JSP/HTML editor doesn't recognize lowercase tags in Turkish locale
Product: [WebTools] WTP Source Editing Reporter: Tuncay Baskan <tbaskan>
Component: jst.jspAssignee: David Williams <david_williams>
Status: CLOSED FIXED QA Contact:
Severity: normal    
Priority: P2 CC: kitlo, lmcliste, luol, turkoglu.deniz
Version: unspecified   
Target Milestone: 1.5.3 M153   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Attachments:
Description Flags
Shows incorrect behavior of JSP/HTML editor to different title tags
none
Content assist and invalidation at title tag
none
File that is validated by Eclipse
none
File with invalid tags (according to WTP)
none
a zip file containng one tiny tet.jsp with turkish i's in UTF-8 encoding
none
Pack including screenshot of tet.jsp from former attachment and configuration file none

Description Tuncay Baskan CLA 2005-06-15 07:27:16 EDT
This is a typical Turkish locale problem. 

When tags whose name contain an alphabet 'i' is used JSP/HTML editor can't
recognize the tags. I guess the editor tries to convert tag name to uppercase
with String.toUpper() without specifying US locale. Since the uppercase of 'i'
is not 'I' editor fails to recognize tag name, and reports it; for example:
Unknown tag(title).
Comment 1 David Williams CLA 2005-07-10 23:54:02 EDT
Can you attach samples or JUnit tests that show the problem?
That'd make it easier for us. 
Comment 2 Tuncay Baskan CLA 2005-07-27 09:36:01 EDT
(In reply to comment #1)
> Can you attach samples or JUnit tests that show the problem?
> That'd make it easier for us. 
> 

Sorry for late reply, I was on vacation. I'll download 0.7 RC3 and send you
samples. I don't know how to generate a JUnit test for this problem.
Comment 3 Tuncay Baskan CLA 2005-07-28 10:50:40 EDT
Created attachment 25411 [details]
Shows incorrect behavior of JSP/HTML editor to different title tags
Comment 4 Tuncay Baskan CLA 2005-07-28 10:56:04 EDT
Comment on attachment 25411 [details]
Shows incorrect behavior of JSP/HTML editor to different title tags

My Windows is configured to use Turkish as locale. The attached image has three
different typed <title> tags.

JSP/HTML editor incorrectly marks <title> as "unknown tag". The other tags
<t&#305;tle> and <TITLE> tags are accepted as normal.

I think after loading DTDs the editor uses String.toLowerCase or
String.toUpperCase without specifying US Locale. Therefore alphabets are
incorrectly lowered/uppered.
Comment 5 David Williams CLA 2005-07-28 11:03:35 EDT
Ok, thanks for the test case. I will investigate. 
But ... the HTML file says is encoding is UTF-8 (in the META tag) ... we detect
that and take it at is word. Could that be related? How do you generate the
source for this file? 
Can you attach the source? 
Thanks. 
Comment 6 Tuncay Baskan CLA 2005-07-28 11:35:41 EDT
(In reply to comment #5)
> Ok, thanks for the test case. I will investigate. 
> But ... the HTML file says is encoding is UTF-8 (in the META tag) ... we detect
> that and take it at is word. Could that be related? How do you generate the
> source for this file? 
> Can you attach the source? 
> Thanks. 

I don't think there is a relation between the encoding of the file and tag
handling. I tried to change the charset attribute to "windows-1254" (Windows
Turkish), windows-1252, ISO8859-9, ISO8859-1 but nothing changed about the tags.

I didn't understand "how do you generate" question. I simply select File->New
JSP in J2EE perspective. 

Below is the complete source of the file:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>content of lowercase title tags</title>


<t&#305;tle>content of lowercase title tags (no diaresis over i)</t&#305;tle>
<TITLE>content of uppercase tags</TITLE>
</head>
<body>

</body>
</html>
Comment 7 Tuncay Baskan CLA 2005-07-28 11:57:29 EDT
More information about Turkish 'I' problem can be found at:
http://blogs.msdn.com/deeptanshuv/archive/2004/09/04/225720.aspx
Comment 8 David Williams CLA 2006-04-03 12:06:30 EDT
From using the small sample in comment #6, in a JSP file, I see a warning on the 
<t&#305;tle> 
tag. It says "unknown tag (t)".  Is that what you mean? That appears a little different than the attached JPEG, 
so, not exactly sure what's causing that difference. 
And want to be sure I'm looking at correctly. 
Other two title tags seem fine. 

Also, I tried the small sample in an HTML and it did not generate warnings on either of the three versions of the title tag. Does that match your observations as well?  I'm using some windows cp1552 setting, so wanted to be sure I wasn't fooling myself. 

We do have code in places to handle the famous Turkish I, and if its only a problem for the HTML in JSP case, I suspects there's something going wrong in the "translation" from JSP to HTML. 

Comment 9 Deniz Turkoglu CLA 2006-04-04 01:22:43 EDT
Hi,

The problem, afaik, resides in org.eclipse.wst.html.core_1.0.0.v200602062135.jar which seems to handle the tags, since the only thing I've noticed is the tags being declared as uppercase only, the bug seems to be causing from an obsolete toLowerCase() with current/default locale, I say obsolete cause tags are US/EN only so this seems to cause tags with our famous I to become invalid tags (ones with &#305;).

I understand the fact and difficulty in tracing such a bug without a Turkish locale and keyboard but it so obvious for one with having both.

To make things cleaner, see the code below

String test="INPUT";
System.out.println(test.toLowerCase());

which returns "&#305;nput" in my computer whereas it should be "input" when it comes to html.
Comment 10 David Williams CLA 2006-04-04 04:10:23 EDT
I am well familar with the general problem, it is well documented and I think much more commonly known that you realize -- at least among us that work with text for a living. 

And, I think that makes you miss my question ... I do not see the problem described  here, with a pure and plain HTML file. Do you?  

I am thinking its isolated to the JSP case, and was hoping to get some help narrowing it down between the HTML file case, and the JSP file case. 





Comment 11 Deniz Turkoglu CLA 2006-04-04 04:47:06 EDT
It's indeed isolated to the jsp case, and yes it's plain HTML file autocomplete problem, Eclipse WTP does support HTML tags autocomplete but when it's used with a tag containg "I" char, in Turkish locale , it lowers it to "i" without a dot, it's not in any way, in my opinion, releated to JSP.
Comment 12 David Williams CLA 2006-04-04 04:59:46 EDT
Ok, now I'm getting more confused :) 
Is it a "validation" problem, or a "content assist" problem? Or both? 

If a validation problem, please attach a zipped up a JSP file that shows the problem, and, if I am reading your comment correctly, please attach a zipped up HTML file that shows its not a problem there. 

If its a content assist problem, can you help by discribing how I might reproduce the problem with a non-turkish keyboard? If there is not way ... I'll have to dig around for one, I guess, but would take longer. 

If it is a content assist probem, is the problem with what the proposals are? Or a problem with what happens after the text is inserted, from selecting one of the proposals? 


Comment 13 Deniz Turkoglu CLA 2006-04-04 06:10:58 EDT
It's a content assist problem leading to validation problem (occurs when assisted content is modified by user). If I press "i" key it assists input, iframe, etc.. but with letter I (small one, i without dot, bugzilla encodes small I...), please check screenshot, so to reproduce the bug you should not need a Turkish keyboard, just change all your locale settings to Turkish, and there you have it, press i and see what it assists you (ones starting with small I) and when you select one of those, Eclipse will say all fine, but when you correct small "I" with "i", it will throw a validation error, to make things crystal clear (hopefully) I am attaching some files/screenshots I created, hope it helps... (See the content assist and marking title tag invalid which is totally valid)
Comment 14 Deniz Turkoglu CLA 2006-04-04 06:15:33 EDT
Created attachment 37604 [details]
Content assist and invalidation at title tag

See the content assist and validation bug at title tag
Comment 15 Deniz Turkoglu CLA 2006-04-04 06:19:32 EDT
Created attachment 37605 [details]
File that is validated by Eclipse
Comment 16 Deniz Turkoglu CLA 2006-04-04 06:20:45 EDT
Created attachment 37606 [details]
File with invalid tags (according to WTP)
Comment 17 David Williams CLA 2006-04-04 16:16:26 EDT
Created attachment 37663 [details]
a zip file containng one tiny tet.jsp with turkish i's in UTF-8 encoding

Thanks for all the info. But, I am having difficulty reproducing any problem of the types you mention. Can you please attach to this bugzilla your "configuration" file .. that lists plugins installed, system encoding, version of Java used, etc. 

The JSP file I attached is a vesion produced on my system. I'd be interested in what you see if you unzip it, import it into an existing web project. 

BTW, I attached as (binary)zip, to make sure that attaching it to bugzilla did not change any of the characters. For future reference, I think that's the best (only?) way to exchange files with encoding issues. 

Gee ... Naci should have shown me the problem at EclipseCon :)
Comment 18 Deniz Turkoglu CLA 2006-04-05 08:41:47 EDT
Created attachment 37726 [details]
Pack including screenshot of tet.jsp from former attachment and configuration file

I am attaching the files that you have asked for, I have checked out the latest CVS last night and have seen a comment that is related to the very bug we are on, can you please check ElementNodeCleanupHandler.java .ln 175 for the comment I'm talking about. Btw, I really wonder what Eclipse says to the former attachment you've made (tet.jsp).

As for the Mr. Naci not showing you the bug, I've shown it to him right after he came from EclipseCon and he made me submit (well, it was already submitted) it.
Comment 19 David Williams CLA 2006-04-05 21:26:15 EDT
The screen shot in comment #18 is very interesting. 
In my system, there are no warnings. 
On my system, I would have expected a warning on the one case that had 
<t&#305;tle>
but that is the one case that does not show an error on your system?! 

So, that makes me wonder ... you did import that into eclipse with eclipse file import ... right? (And not some copy/paste function, using some other viewer/editor?) 

Let's get down to the bits and bytes ... 
if you look at that tet.jsp file on your system with a binary editor/viewer, does it show hex 69 for the lower case 'i's?  (it does on mine, that should be what is in the zip file version). 

I guess I'm wondering if there is some step in the process, on your system, that is converting those lowercase (english) i's to something else. 
Comment 20 David Williams CLA 2006-04-05 21:27:45 EDT
As another suggestion to help rule out "odd" cases ... have you tried this with a completely fresh version of Eclipse and WTP, downloaded from Eclipse.org, and with a completely new and fresh workspace? 
Comment 21 David Williams CLA 2006-04-06 00:12:23 EDT
Ok, FYI, I have been able to finally reproduce this now. For me, the way I could reproduce was to set -Dosgi.nl=tr_TR when I launch eclipse. 

Here are the ways that didn't work 
(which I list here just for future debugging reminders): 

 1. using -nl=tr_TR  as an eclipse application arg (normally that picks up "turkish" translations). 

 2. using -Dfile.encodng=ISO-8859-9

 3. and even changing my "windows regional settings" to "Turkey" did not work. 

Not sure why this last one would not have worked, but there might have been something cached in Eclipse that it remembered some previous locale setting. 

But, now that I see it myself, I believe you :) and will be able to debug. 

Comment 22 Deniz Turkoglu CLA 2006-04-06 01:35:34 EDT
You've been busy while I was sleeping :) Anyway, I'm glad you've managed to reproduce the bug, please let me know if you need further assistance.
Comment 23 David Williams CLA 2006-12-20 03:27:42 EST
Fixed released for 1.5.3 and 2.0. 

The fix was in TolerantStringDualMap and serveral other classes (4 places altogether, for HTMl and CHTML, for the content model itself, and some content model "lookup" methods. 

I changed 

private String makeCanonicalForm(String raw) {
return raw.toUpperCase();
}

to the following


private String makeCanonicalForm(String raw) {
// see https://bugs.eclipse.org/bugs/show_bug.cgi?id=100152
// we are able to "cheat" here a little and use US Locale 
// to get a good cononical form, since we are using this only 
// for HTML and JSP standard tags. 
// Long term, for similar needs with XML 1.1 (for example)
// we should use a class such as com.ibm.icu.text.Normalizer
return raw.toUpperCase(Locale.US);
}
Comment 24 David Williams CLA 2006-12-20 03:29:44 EST
*** Bug 167673 has been marked as a duplicate of this bug. ***
Comment 25 Amy Wu CLA 2007-01-05 17:13:20 EST
*** Bug 166905 has been marked as a duplicate of this bug. ***
Comment 26 Amy Wu CLA 2007-01-05 17:14:12 EST
*** Bug 166902 has been marked as a duplicate of this bug. ***
Comment 27 Amy Wu CLA 2007-02-07 16:17:30 EST
verified using WTP1.5.3 200702060621 that the invalid warnings regarding the dotted 'i' no longer show up

remaining similar bugs:
bug 170267
bug 166902