Community
Participate
Working Groups
Bugzilla – Bug 150794
JSP quote escaping as per JSP rules in tag attributes are reported as ERRORS.
Last modified: 2009-12-14 14:45:46 EST
The JSP 2.0 standard states that quotes in JSP espressions used in tag attributes must be escaped with a backslash. This is silly, unnecessary and counter-intuitive but it is the standard... Eclipse reports errors for constructs like: <my:tag errorimage="<%= crm_cities.error.getErrorPicture(\"cit_local_area_code\") %>" /> The error reported is: Syntax error on token "Invalid Character", ) expected "Fixing" the code by removing the backslashes before the quotes removes the error message.
An extra comment: this may not look like a big deal but it renders the validators useless for my project: I have more than 4000 errors reported by this bug; my only option is to disable all validation..
I think I have found a way to fix this. The class JSPTranslator takes the regions discovered by the JSP parser, and the region is nothing more than the embedded string. Currently the region gets passed to translateEmbeddedJSPInAttribute() regardless of the "location" of the JSP expression. This translate then adds the expression to the generated source for the JSP servlet verbatim in translateExpressionString(), adding only the write to out around it. What I can do is to make translateEmbeddedJSPInAttribute determine the source of the expression, and if it is a double or single quoted attribute I could "dequote" the string before adding it to the source segment. The only thing I need to find out is how to handle the correlation from "source positions" to "destination positions" but it looks like that is fixable.. I'll check to see if this works and post a patch if it does.
Created attachment 46689 [details] Fixes quote escaping in JSP Translator This fixes JSPTranslator to handle JSP quoting in tag library attributes that contain JSP expressions. This at least fixes (most) validation errors (gone from 5000 to 800 errors). I need to check for other places where this unquoting is to be done too; I'll do that later on. Although this fixes the validation the syntax highlighting is still in trouble. In addition since I could not find a way to generate an error from the translator (the entire code never encounters any!?) I cannot report quoting errors properly; the places are indicated in the code though.
Much appreciated, your analysis sounds correct to me. I havne't had a chance to step through the patch, only to read it, but why the changes in the "-1763,7 +1818,7" area?
(In reply to comment #4) > why the changes in the "-1763,7 +1818,7" area? This is in the "appendToBuffer()" call which adds a translated piece to the output buffer AND updates the source <-> destination line:column number mapping. The patch there prevents it from adding a newline to every segment it receives (I added a parameter and an extra function overload to effect that without effects in the rest of the code). The new unquote code must split a string into parts for every escape sequence encountered (it must remove the escape) and must call appendBuffer() for every part; this is because characters are removed from the string output so so output length != input length, and the source position of characters must be correct or error messages report an incorrect location.
The current patch fixes the JSP attribute case but goes wrong for non-JSP attributes (i.e. attributes of normal html tags that contain a JSP expression). I'll look into that but I could use some info (the code is depressingly large ;-): i need to fix some other code too. Notably: - Syntax highlighting is off too around quoted expressions. Where can I start to find the highlighting code? - there seems to be another thing "validating" JSP. Although the validator reports 0 errors i have a "squiggle" in something like <navi:out val="hello \"world\" "> at the position of the last escaped quote. Can anyone tell me where that validator is?
(In reply to comment #6) > - Syntax highlighting is off too around quoted expressions. Where can I start > to find the highlighting code? First we need to be sure that the correct line style provider is being used. The contents of the expression should have a partition type of org.eclipse.jst.jsp.core.internal.provisional.text.IJSPPartitionTypes.JSP_CONTENT_JAVA. You can verify this by turning on the org.eclipse.wst.sse.ui/actioncontributor/debugstatusfields debug option. The org.eclipse.jst.jsp.ui.StructuredTextViewerConfigurationJSP.getLineStyleProviders(ISourceViewer, String) method is where the various providers get mapped against the partition types, and it's entirely possible that that provider's not working off of the translated source (we've added the sources verbatim until now so it hasn't been necessary). > - there seems to be another thing "validating" JSP. Although the validator > reports 0 errors i have a "squiggle" in something like <navi:out val="hello > \"world\" "> at the position of the last escaped quote. Can anyone tell me > where that validator is? What's the error message? There are actually several validators that may be running in addition to org.eclipse.jst.jsp.core.internal.validation.JSPJavaValidator (the one that takes the translated source and runs it through ecj). I appreciate your continued efforts, Frits.
Sorry, I forgot to add that after turning on the debug option, swipe the interesting area of the file you have open and double-click on the status bar's selection range ([nn-nnn]) field.
I found another problem in the JSP lexer: it does not differentiate between JSP tags/attributes and "common" attributes (HTML ones). The HTML ones do not have quoting rules and having 2 quotes there is an error, but JSP attributes (even without embedded JSP expressions) have. To try to fix this I need to regenerate the JSPTokenizer but it appears you have used a patched, specific version of JFlex to generate it. I checked out the entire webtools repository but I cannot find the patched version (which should be present in something called JFlex/lib/sed-jflex.jar)... I would need this file to allow me to continue. In the meantime I'll go looking to the syntax highlighting thing.
I checked the partition type using the method you spec'd and got org.eclipse.jst.jsp.SCRIPT.JAVA which is different from what you stated? I'll go looking into the line style providers thingy. Thanks for helping, Nitin! It's fun working this code but without some help it's hard!
(In reply to comment #9) We don't use a patched version of JFlex, ever. The XML and JSP scanners are generated using JFlex 1.2.2 using a different *skeleton* file found at org.eclipse.wst.sse.core/DevTimeSupport/SedModel/HTMLTokenizer/devel/skeleton.sse. CSS and DTD can be generated using the latest JFlex versions as they don't manipulate the internal structures at runtime; XML and JSP do so to skip over sections of text that they're not supposed to parse. JFlex's author mentioned that 1.2.2 would remain available from http://jflex.de/old/jflex-1.2.2.tar.gz on the jflex-users mailing list. What did you mean by having two quotes is an error for HTML tags? Oh, and org.eclipse.jst.jsp.core.internal.provisional.text.IJSPPartitionTypes.JSP_CONTENT_JAVA is the constant holding the partition value for the Java code sections, not the value itself.
(In reply to comment #11) > (In reply to comment #9) > We don't use a patched version of JFlex, ever. The XML and JSP scanners are > generated using JFlex 1.2.2 using a different *skeleton* file found at > org.eclipse.wst.sse.core/DevTimeSupport/SedModel/HTMLTokenizer/devel/skeleton.sse. I know about that; but if you look in that directory at the jflex.sh script and the accompanying README it states that you use a patched jflex and it refers to the patched version as 'sed-jflex.jar' which is not the .jar that's present in the binary distribution. I have no problem using the "stock" jflex but regarding the comments in the file I don't know what will break... If you are sure that a stock version will work I'll use that one. > What did you mean by having two quotes is an error for HTML tags? In pure HTML there is (afaik) no way to escape quotes *except* by using entities or embedding the "other" kind of quote. So something like: <a ... title="The \"real\" world"> is an error in html (an attribute with the name [title] and the value [The \] followed by an attribute [world] and an error on the ["] after it. A proper way to handle this in html would either be: <a ... title='The "real" world'> (using the "other" quote) or <a ... title="The "real" world"> (using entities) However if the attribute is part of a JSP-recognised construct (anything that is not defined as "template text" in the standard) then the escaping rules apply. That is: for all tags in the page that are *known* to be parts of a declared tag library (which are not all tags that have a name space!), all JSP directives, EL expressions etc etc. This means we have to distinguish between 'JSP' attribute values and 'non-JSP' attribute values in the lexer. Which needs context/state information about what kind of tag the attribute occurs in. > > Oh, and > org.eclipse.jst.jsp.core.internal.provisional.text.IJSPPartitionTypes.JSP_CONTENT_JAVA > is the constant holding the partition value for the Java code sections, not the > value itself. I don't really know what you mean by this ;-) but I'll look into that further.
Auch. I just checked the syntax highlighting stuff and it all comes back to the same thing. The segment is indeed of the type you indicate (but from IJSPPartitions now), but _that type is wrong for escaped content_! It uses the classes JavaCodeScanner and LineStyleProviderForJava to tokenize and colorize the fragment (region? What's the difference between a partition and a region? Is there any internal docs on the design of the structured editors?). Problem 1 is that there seems to be an initialization sequence error in LineStyleProviderForJava: it calls the tokenizer (JavaCodeScanner) before it is initialized (it's ruleset is null). This is because a call "loadColors()" is called lazily and too late. This might cause odd errors in coloring until the widget is redrawn. The bigger problem is that we'd probably need a new "partition type" for JSP code in attributes because the Java code scanner will never work: it sees a string like: Test.check(\"hello\\\"world\\\"\") which is a properly escaped JSP expression scriptlet value (<%= %> within a JSP tag) but in no way Java (the Java version would of course be Test.check("Hello \"world\"") Is this what you would expect, and is this a kind of change you'd accept from an "outsider"?
(In reply to comment #13) > Auch. I just checked the syntax highlighting stuff and it all comes back to the > same thing. The segment is indeed of the type you indicate (but from > IJSPPartitions now), but _that type is wrong for escaped content_! It uses the > classes JavaCodeScanner and LineStyleProviderForJava to tokenize and colorize > the fragment (region? What's the difference between a partition and a region? > Is there any internal docs on the design of the structured editors?). Any docs we have are already out there (http://eclipse.org/webtools/wst/components/sse/overview.html), although I'll admit they're pretty brief when it comes to the non-UI parts. Partitions are the standard way that the text framework breaks up a document into different kinds of content. You'll see that the IJavaPartitions interface mentions six partition types, each potentially mapping a different implementation of a particular function to that area (what will provide syntax coloring, for one). The regions you're seeing in the JSP UI code are part of our model for text and tend to divide things more granularly, almost to the syntax token level. I think that the correct solution would start with the JSP translator. It would need to "unescape" the JSP attribute value properly in order to generate the correct Java source, and that converted Java source would ideally drive the syntax coloring for Java source. The comments in the org.eclipse.jst.jsp.ui.internal.style.java.LineStyleProviderForJava.prepareRegions() method indicates that this was at least considered of before. As for recognizing which tags are custom tags or not, the JSPTranslator could keep a tally of the custom tag prefixes as it iterates through the document regions. This would let it treat tags with names beginning with that prefix differently than other tags, including those escaped attribute values. But then the LineStyleProvider could probably just ask the TaglibController for the TLDCMDocumentManager for the current document and ask it for the prefixes at that offset directly, using that to help prepare the Java source correctly for the Java code scanner (or use a different scanner). The sed-jflex.jar file that was mentioned was just a straight recompile of the original JFlex 1.2.2 sources with JDK 1.3. Using the 1.2.2 jar as-is with the SSE skeleton will generate identical output, in fact the goal is that the skeleton plus the specification file generate the .java exactly. Modifications to the .java source after generation are a no-no. Anyway, I appreciate your continued perseverance in solving this problem. As long as your contributions are legally clean, it's being within the project's scope and their technical merits as to whether they're accepted or not. The depth of your investigation has already distanced you from any labelling as an "outsider", even if the term had any meaning in an Open Source project. Thanks again.
Thanks for the input. I have debugged my way thru the code to get a grip on things so at least I know some of the terms now; and i found http://www.realsolve.co.uk/site/tech/jface-text.php which was helpful. Fixing the translator would not be too hard but using the translated source as a source for coloring would be hard, especially since the untranslated source can have errors that have no representation in the generated source. Besides the current Java syntax scanner within WTP is so basic that just adding another one for quotes should be easy. I surmise that the JDT's syntax highlighter is based on some kind of (pre)compiled model of the Java code since it can highlight semantics. That would be great to have but I would not like to do that using a translator; I'd prefer building an AST/(pre)compile of the JSP itself, that is way cleaner and better maintainable. That is a bit out of the scope of the current problem though and perhaps something I'd like to do once I got to know more of the current code (and the code for the JDT compiler/highlighter). Thanks for your time!
Created attachment 47808 [details] Fixes quote escaping for simple taglib attributes [1] This fixes quote escaping in simple taglib attributes, i.e. attributes that are pure value without embedded JSP expressions and the like. These get recognised in the lexer in a single rule. The patch adds a naive way to find out if a tag is a JSP tag (by checking if the name contains a ':' - I'll replace that later). If so then it uses a new Lexer state (ST_JSP_ATTRIBUTE_VALUE) instead of ST_XML_ATTRIBUTE_VALUE when it starts to parse for attribute values. In this state the lexer now recognises a new Lexer rrule which allows for escaped quotes (QuotedAttValue instead of AttValue). The rest of the lexer is only patched to allow for the new lexer state everywhere where the other state is accepted.
Is it planned to integrate this patch into WTP 2.0? This bug is really nasty and makes validation in big projects almost unusable. This patch exists for quite a long time and can't wait for it to be integrated into the official WTP release.
I too found this problem. Tomcat is ok with the "wrong" syntax accepted by the JSP editor, but WebSphere is not. If validation worked correctly, it could outline the problems we encouter with WebSphere when we do not quote tag attribute values correctly. Please note that, as of now, the JSP validation is not compliant with the JSP specification: in 1.2 specification, for instance, in chapter 2.6 it is clearly said which should be the correct quoting and escaping conventions. Mauro.
Created attachment 75399 [details] Part 1 of patchset 2
Created attachment 75400 [details] Part 2 of patchset 2
Patchset 2 separates the two main kinds of attributes: - attributes in non-JSP tags (literal text). These attributes must obey Java escape rules only, i.e. they are quoted-once. - Attributes in JSP tags. These need double quote escaping. To handle this I added two new content regions to separate between the types JSP and XML tag. The XML variants are XML_TAG_ATTRIBUTE_VALUE_DQUOTE and XML_TAG_ATTRIBUTE_VALUE_SQUOTE as they were; the new ones are JSP_TAG_ATTRIBUTE_VALUE_DQUOTE and JSP_TAG_ATTRIBUTE_VALUE_SQUOTE which are new. The JSPTranslator uses the quote type to determine how and whether to unquote. This gets rid of the errors in both bases but it has a number of problems left: - I dont know how to generate error messages for badly quoted content.. If you misquote content it may appear in the generated java code as correct even though the JSP code is incorrect. The JSPTranslator should be able to report errors. - Syntax highlighting is not yet fixed.
Created attachment 75403 [details] Part 3 of patchset 2: syntax highlighting of double-quoted strings Ok; this should fix the syntax highlighting. I added an IPredicateRule instance which scans for double escaped strings. The rule is then added to the JavaCodeScanner so that it gets recognised; it returns a string token then. I also moved the initialization for LineStyleProviderForJava, the call to loadColors(), to its constructor because the rules were not initialized: prepareRegions was called before anything caused loadColors to be called. This meant that the ruleset was empty when prepareRegions tried to lex using those rules, causing only unknown tokens.
Comment on attachment 75403 [details] Part 3 of patchset 2: syntax highlighting of double-quoted strings Please reattach with a license on DoubleQuotedStringRule.java
Now that I'm reviewing the patch, I have a few questions. Hopefully you're willing to answer them. As mentioned in comment 23, with DoubleQuotedStringRule being a new class, I need you to include the license and proper copyright statement (attributing yourself) at the top of the file. -In patch 1, you don't have to include the updated .java file as I can regenerate that. -I also found the addition of a new variant of JSPTranlator.appendToBuffer with an extra boolean curious--why add the new line skipping? -Is the comment on DOMJSPRegionContexts.JSP_TAG_ATTRIBUTE_VALUE_DQUOTE correct?
Oops. Sorry for the late response. The comment on JSP_TAG_ATTRIBUTE_VALUE_DQUOTE seems correct. Both JSP_TAG_ATTRIBUTE_VALUE_DQUOTE and JSP_TAG_ATTRIBUTE_VALUE_SQUOTE are used for double escaping, but the first one handles the case where " (double-quote) is the one which needs double escaping and the latter does the same for ' (single-quote). The extra parameter to appendBuffer() is explained in comment #5; if that explanation is insufficient let me know and I'll try to explain it differently. I'll go to work on the license thing and removing the generated file (need to get a workspace first).
Frits, any updates?
Sorry- I've just been buried in work lately 8-( I'll try to add the new patches sometime next week.
Ok; I created a new patch copyrighting the new file, and without the generated JSPTokenizer.java. I had to rebuild the tokenizer patch because my original patch failed to apply to it at the current version 8-(. I have passed it through the JFlex generator and the result compiles but I do not have the time to check the result currently; that'll have to wait till I have some free time again. The code does compile however and I have not seen changes that would break the new code. Hope this one works for you!
Created attachment 92255 [details] New version of the patch with proper copyright (I hope ;-)
I'm adding my preemptive PMC approval, assuming your note to PMC list was an implicit request. I think this long standing limitation is important to include. I've assumed you have, and will continue to test well.
IP Clarification (as requested): The patches added above by me (Frits Jalvingh) above are completely written by me. The only materials used were the original sources of the modules that were patched, as they existed near the time that the patch was posted in the public Eclipse WTP CVS repository. No other code was used to construe the patch, nor was any other code not present in the Eclipse WTP CVS repository used as reference material. I hereby release all rights to the patches to the Eclipse Foundation. The patches can be used by the Eclipse foundation and all WTP commiters as they see fit; the patches are released under the Eclipse EPL and can be used in any version of Eclipse, future or past. I am not prohibited from releasing this code to the Eclipse Foundation in any way; not by my employer or by other legal obligations; they are fully mine to give away. I hope this is enough to clear the patches.. If more is needed or if you need this in writing please let me know; I would love to have an indication of a proper format though - I'm not a lawyer thank $DEITY ;-)
Please include this fix in WTP 3.0!!! By experience, I fear that if it is not included in 3.0, it won't in any 3.x subrelease... so we would have to wait WTP 4.0 (another whole year!) to have this important fix in a stable release... Thank you! Mauro.
Pushing to RC2 since there is a known failure case in the new code that needs to be worked out.
? Can you let me know what's the problem? (In reply to comment #33) > Pushing to RC2 since there is a known failure case in the new code that needs > to be worked out. >
Frits, when I integrated the patch I ended up seeing the org.eclipse.jst.jsp.ui.tests.other.ScannerUnitTests.testEmbeddedTagInAttr() unit test fail while parsing an embedded container with a tag inside of it. The container ended up consuming the rest of the input instead of exiting to the expected next state after the closing quote. It's easier to tell if the test text is changed to: <a href="<jsp:getProperty/>"> <a/> lorem ips ...in which the href attribute's value as parsed runs to the end if the input, consuming the <a/> tag and following text content. The test class lives in the org.eclipse.jst.jsp.ui.tests plug-in in the sourceediting/tests folder of the webtools CVS repository (:pserver:anonymous@dev.eclipse.org/cvsroot/webtools), if you want to look into it. I was planning to do so for early 3.0.1, but any additional help would be appreciated.
*** Bug 247615 has been marked as a duplicate of this bug. ***
Another target milestone shift for this... very sad :-( Mauro.
Created attachment 123815 [details] Patch for 304 with fixes, junits, etc. Here's an updated patch for 304. I've added fixes to this to handle the case mentioned in comment 35 along with some junits. This patch is rather large because I've included the generated JSPTokenizer.
Thank you very much for your quality patches, Frits. I've released the latest patch to the Maintenance stream. I still have to merge the changes with the Head stream, however.
Released to Head.
Verified in M-3.0.4-20090212063209.
*** Bug 151065 has been marked as a duplicate of this bug. ***
*** Bug 104767 has been marked as a duplicate of this bug. ***
*** Bug 93879 has been marked as a duplicate of this bug. ***