Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 200207

Summary: [javadoc] Incorrect flagging of @see as malformed javadoc
Product: [Eclipse Project] JDT Reporter: John Arthorne <john.arthorne>
Component: CoreAssignee: Jay Arthanareeswaran <jarthana>
Status: CLOSED WONTFIX QA Contact:
Severity: normal    
Priority: P3 CC: frederic_fusier, jarthana, markus.kell.r, srikanth_sankaran, tjwatson
Version: 3.3   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard: stalebug
Attachments:
Description Flags
[proposed patch + test case] on top v_868
none
Proposed patch
none
Proposed patch none

Description John Arthorne CLA 2007-08-16 11:56:45 EDT
Build: 3.3 final

Set the "Malformed Javadoc" compiler warning to "Error". Create the following CU:

/**
 * @see "Alice
 *      in Wonderland"
 */
public class A {
}

The @see gets flagged as malformed javadoc, but it is properly formed (see reference in bug 200202).
Comment 1 Frederic Fusier CLA 2007-08-16 12:18:05 EDT
Scanner throws an InvalidInpuException for non-terminated strings, so we have to handle this specific test case in AbstractCommentParser.parseReference() method...
Comment 2 Eric Jodet CLA 2008-05-23 09:13:13 EDT
Created attachment 101724 [details]
[proposed patch + test case] on top v_868
Comment 3 Frederic Fusier CLA 2008-09-12 05:24:48 EDT
Comment on attachment 101724 [details]
[proposed patch + test case] on top v_868

The patch is not 100% complete.

Initial test case is not tested and unfortunately warns about a missing reference!

The following test case:
/**
 * @see "Unterminated
 * @see string"
 */
public class X {
}
shows two different warnings although I would expect only one...
Comment 4 Jay Arthanareeswaran CLA 2009-03-18 07:12:59 EDT
Created attachment 129193 [details]
Proposed patch

Also made one change to the test case JavadocBugsTest.testBug207765(). The test expects two different warning messages for the following cases.
/**
 * {@link "http://www.eclipse.org/}
 * 
 * @see "http://www.eclipse.org/
 */ 

While the first one gets "Invalid Reference", the second one gets "Javadoc: Invalid URL reference. Double quote the reference or use the href syntax". The patch has made the warning message consistent on both scenarios and hence the test case modified accordingly.
Comment 5 Frederic Fusier CLA 2009-03-19 14:24:48 EDT
(In reply to comment #4)
> Created an attachment (id=129193) [details]
> Proposed patch
> 
> Also made one change to the test case JavadocBugsTest.testBug207765(). The test
> expects two different warning messages for the following cases.
> /**
>  * {@link "http://www.eclipse.org/}
>  * 
>  * @see "http://www.eclipse.org/
>  */ 
> 
> While the first one gets "Invalid Reference", the second one gets "Javadoc:
> Invalid URL reference. Double quote the reference or use the href syntax". The
> patch has made the warning message consistent on both scenarios and hence the
> test case modified accordingly.
> 
Hmm, unfortunately while talking about this patch, I missed the fact that using a specific scanner for the JavadocParser would introduce a different behavior while scanning comment through the comment parsers hierarchy.
1) CompletionJavadocParser has its own CompletionScanner and this hierarchy cannot be changed, hence you would also need to subclasse the CompletionScanner and duplicating code.
2) ASTConverter uses a "pure" Scanner and uses it as the scanner of the DocCommentParser, hence the parse of DOM/AST comment will be slightly different than the compiler...

There's also another annoying point in this patch, I do not think the parser field in the comment scanner was a good idea. This is a design breakage as none of existing scanner knows their parser.
As the parser already points on its scanner it would make a circular reference which is never a good thing to have...

So, I'm afraid that for these two reasons, this patch won't be valid and that the solution should be applied locally in the AbsractCommentParser using a specific readStringLiteral() method...
Comment 6 Jay Arthanareeswaran CLA 2009-03-26 07:54:10 EDT
After another look at the problem, there can be more than one approach to solve this problem and I am listing them here and to some extent their benefits and shortcomings.

1. The approach taken in the second patch by me. As Fred mentioned, the fact that CompletionJavadocParser also needs to use the same code is one of the issue. My suggestion to over come that is to add code for parsing the string literal for code and comments in two different methods as follows.

public int scanStringLiteral() {
  // This would be the extraction from the '"' case of Scanner.getNextToken()
}

public int scanJavadocStringLiteral() {
  // The new code to parse string literals in javadoc comments to go here.
  // This will look similar to the code in CommentScanner in the previous patch
}

All the Scanner except the CompletionJavadocParser and AbstractCommentParser will continue to use the scanStringLiteral() method while these two can use scanJavadocStringLiteral with the help of something like this. This 

//this.scanner = new CompletionScanner(ClassFileConstants.JDK1_3);
  this.scanner = new CompletionScanner(ClassFileConstants.JDK1_3) {
	protected int scanString() throws InvalidInputException {
		return scanJavadocStringLiteral();
	}
};

However, the only question is with the DocCommentParser, which gets another instance of Scanner from AstConverter. Now the question is this done for any particular purpose. But what it does is, it makes it impossible for us to use a specific Scanner for DocCommentParser. Unless there is a particular reason, this looks like a design flaw to me.

2. Let the Scanner throw the exception on new line, as it is doing right now and catch the exception in the AbstractCommentParser and handle the specific non exceptional cases. My concern with this is, we will end up allowing an exception and then recovering, while the whole scenario is a valid one. Instead we should be looking at avoiding the exception itself.

3. The AbstractCommentParser can directly implement the functionality(scanJavadocStringLiteral) to completely handle the string literals in itself, of course with some help from the Scanner. This would mean that the the parse itself is taking up some responsibility of scanning from the Scanner. Just like the second approach the parser would be doing something that in first place should be done by the scanner.

Both the last two approaches pose a problem of having to keep the parser and scanner in sync, if we want to use the methods in scanner (such as reading Unicode, handling escape characters etc)

If we have to take the first approach, then it's more than a simple change. The other two appoached will also have some complex code, esp. to keep the scanner in sync with the parser.
Comment 7 Frederic Fusier CLA 2009-03-26 12:36:09 EDT
(In reply to comment #6)
> 1...

IMO, even you'll find a pretty solution to implement this new design, the changes look now too risky at this late stage of the 3.5 development. We can keep your thoughts about this in our mind for the next release. Perhaps open a new bug to track this work?

> 2. Let the Scanner throw the exception on new line, as it is doing right now
> and catch the exception in the AbstractCommentParser and handle the specific
> non exceptional cases. My concern with this is, we will end up allowing an
> exception and then recovering, while the whole scenario is a valid one. 
> Instead we should be looking at avoiding the exception itself.
> 
I agree on this point. Get the exception although this is a correct Javadoc syntax is somehow inefficient and should be fixed while fixing the original issue

> 3. The AbstractCommentParser can directly implement the
> functionality(scanJavadocStringLiteral) to completely handle the string
> literals in itself, of course with some help from the Scanner. This would mean
> that the the parse itself is taking up some responsibility of scanning from 
> the Scanner. Just like the second approach the parser would be doing 
> something that in first place should be done by the scanner.
> 
I disagree on this point. The original design intends to parse the comment character per character as the comment contents is never supposed to be valid Java code, hence should not need to use the scanner which is dedicated to read Java tokens. The use of the scanner was introduced to save time while parsing references as they are supposed to be valid Java single or qualified references. So, I do not think that changing this design is really a good thing to do and even if it would, as I said for 1), it's surely not the time to do this kind of change (a M1 milestone would definitely be better...).

> Both the last two approaches pose a problem of having to keep the parser and
> scanner in sync, if we want to use the methods in scanner (such as reading
> Unicode, handling escape characters etc)
> 
Unicode character is already handled by the parser in the readChar() method. The only additional point to do by the parser is to handle the escape characters...

> If we have to take the first approach, then it's more than a simple change. 
> The other two apporaches will also have some complex code, esp. to keep the 
> scanner in sync with the parser.
>
I do not think it's true. Keeping the scanner in synch with the parser (and vice-versa) is extremely easy and is already done on several places through the parser hierarchy. It's often enough to make the scanner current position equals to the parser index position (and vice-versa)... I agree that in certain circumstances, there's a little bit more to do (e.g. reset the parser current token type, initialize the scanner start position, etc.) but this is only fields initialization and never exceed two or three lines...
Comment 8 Jay Arthanareeswaran CLA 2009-03-30 12:44:44 EDT
Created attachment 130273 [details]
Proposed patch

This patch contains the fix with approach 3, with all the fix code in AbstractCommentParser.
Comment 9 Frederic Fusier CLA 2009-03-31 06:00:49 EDT
(In reply to comment #8)
> Created an attachment (id=130273) [details]
> Proposed patch
> 
> This patch contains the fix with approach 3, with all the fix code in
> AbstractCommentParser.
> 
Several points on this patch:

1) I don't think this is a good thing to change the readToken() method in ACP, it would be better to avoid to slow down all callers of this method although this peculiar behavior is only requested while parsing a reference. So, the search for the first character should be done inside parseReference() method, hence, the test for the @see or @link tag would not be necessary. Do not forget that additional work, hence slow down the compiler, should always be done only when necessary...

2) Method scanJavadocString():
a) I don't like the name, this is a parser and all methods start with 'parse' or 'read'. So, readTokenString() seems to be more appropriate
b) I disagree on InvalidInputException usage. This method neither need to throws it (callers do not expect an exception to occur) nor need to throw any (it's inefficient to throw an exception and catch it to make specific things). Catch the exception of the Scanner.scanEscapeCharacters(), continue until the end of the line and return an TerminalTokenERROR should be enough...
c) The algorithm is missing some test cases (see 4-b) below)

3) You forget that this issue also must be fixed while parsing href reference (e.g. @see <a href="http://www.eclipse.org">Eclipse</a>). So, the readTokenString() method should also be called from parseHref() method (do not forget that you also need to consume whitespaces before).

4) There are some points still not addressed with your patch:
a) The string in the DOM/AST node is not correct (you need to install the JDT/UI ASTView from http://www.eclipse.org/jdt/ui/update-site to show it...)
b) There are still some test cases where the compiler reports warnings although the Javadoc tool does not:

public interface X06 {

	/**
	 * @see "Example with "quotes" inside quotes"
	 * @see "Example with ""double-quotes"" inside quotes"
	 */
	void foo();
}

public interface X07 {

	/**
	 * @see "Example with "quotes" inside quotes" and text after
	 */
	void foo();
}

public interface X08 {

	/**
	 * @see "Example with "quotes" inside quotes"
	 * and text on line after"
	 */
	void foo();
}

Note that the Javadoc tool also reports an error on X07 but here, this is the compiler message which is not correct: it underline text'quotes" inside quotes" and text after' as unexpected text althouth the Javadoc tool only complain about an unterminated string...

It looks like the javadoc tool behavior is quite simple, when the first character after the @see tag is a quote, then read until the next tag or the end of the javadoc to see if the last character is a quote. Otherwise report an unterminated string error.

For inline tag, it looks for the closing brace instead of next tag or javadoc end...

public interface X09 {

	/**
	 * {@link "Example with "quotes" inside quotes"
	 * and text on line after"}
	 */
	void foo();
}

public interface X09b {

	/**
	 * {@link "Example with "quotes" inside quotes"
	 * and text on line after}
	 */
	void foo();
}

X09 is valid but not X09b which complains for:
warning - Tag @link: no final close quote: ""Example with "quotes" inside quotes"
 and text on line after"

Comment 10 Jay Arthanareeswaran CLA 2009-04-02 03:51:14 EDT
Fred,

On Point 3:
Javadoc behaves in a completely different way when it comes to href. Javadoc doesn't really care what goes inside the anchor tag. It only validates the '<'s and '>'s. For e.g. 

	/**
	 * (@link <a href=Hyperlink""@#{}Invalid
	 * >Invalid</a>} 
	 *
	 * @see <a href=www.eclipse.org>Eclipse</a>
	 */ 
public class InlineTags {}

Both the cases(one with multiple quotes and one without any quotes) are accepted by Javadoc. If we are fine with this behavior, then we should change parseHref in such a way that it doesn't look for a StringLiteral token. I was wondering if you are aware of any javadoc bugs reported on this?

On point 4:
There is least one test case (JavadocBugsTest.testBug76324string) that failed inline tag {@link "invalid string""} and there were also test cases that would allow quote as escape sequence. Putting these together, I guess, our understanding of this particular scenario has been wrong all along. And hence this behavior and the involved test cases will have to be corrected as well.
Comment 11 Jay Arthanareeswaran CLA 2009-04-02 03:58:02 EDT
(In reply to comment #10)
> Fred,
> 
> On Point 3:
> 
By the way, following is the W3C specification on anchor tags and attributes. Though it's recommended to have attribute values (such as href) with single/double quotes, one can also use attributes without quotes as well. So, even if we want to warn the users about missing quotes, we should still allow single quotes. And this means that the current parseHref that looks for TokenNameStringLiteral, which we know will allow only double quotes, will have to be corrected one way or the other.
Comment 12 Jay Arthanareeswaran CLA 2009-04-02 03:59:41 EDT
And here is the link on HTML spec.

http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.2.2
Comment 13 Frederic Fusier CLA 2009-04-02 04:29:37 EDT
(In reply to comment #10)
> Fred,
> 
> On Point 3:
> Javadoc behaves in a completely different way when it comes to href. Javadoc
> doesn't really care what goes inside the anchor tag. It only validates the '<'s
> and '>'s. For e.g. 
> 
>         /**
>          * (@link <a href=Hyperlink""@#{}Invalid
>          * >Invalid</a>} 
>          *
>          * @see <a href=www.eclipse.org>Eclipse</a>
>          */ 
> public class InlineTags {}
> 
> Both the cases(one with multiple quotes and one without any quotes) are
> accepted by Javadoc. If we are fine with this behavior, then we should change
> parseHref in such a way that it doesn't look for a StringLiteral token. I was
> wondering if you are aware of any javadoc bugs reported on this?
> 
No, I'm not aware of bugs around this. I guess this is because this kind of reference is seldom used...

When I implemented this method, I initially thought that it was a good thing to be smarter than the Javadoc tool, but after time I realized that the specification was not enough precise to be sure of what is the best behavior and also that users rarely wants the compiler to report warnings that Javadoc tool does not...

So, the goal now is to behaves the closest as the Javadoc tool does (and not to write an HTML verifier). If its verification is minimal for href references, then I think we need to do the same, otherwise we would need to introduce a new preference to relax the extra verification done for this kind of reference.

However, be careful: it's not because the Javadoc tool does not raise any warning that it generates a valid href link. I agree that the string does not seem to be necessary, hence can be removed from the parseHref() method, but the 'a' tag name, 'href' and '=' sequence is necessary to build a valid link in the generated html page. So, this verification still needs to be done as it could be helpful for users to be warned that the written reference is not correct (confirmed by the fact that nobody complains about the current behavior)...

> On point 4:
> There is least one test case (JavadocBugsTest.testBug76324string) that failed
> inline tag {@link "invalid string""} and there were also test cases that would
> allow quote as escape sequence. Putting these together, I guess, our
> understanding of this particular scenario has been wrong all along. And hence
> this behavior and the involved test cases will have to be corrected as well.
> 
Yes, I missed this peculiar behavior since day 1. So, all existing test cases using string references surely need to be modified accordingly...
Comment 14 Jay Arthanareeswaran CLA 2009-04-02 06:25:00 EDT
(In reply to comment #7)
> (In reply to comment #6)
> > 1...
> 
> IMO, even you'll find a pretty solution to implement this new design, the
> changes look now too risky at this late stage of the 3.5 development. We can
> keep your thoughts about this in our mind for the next release.

As the ongoing investigation is still uncovering more scenarios and corner cases that need to addressed, I am inclined to agree that it is beginning to look to risky at this stage. Deferring to post 3.5.
Comment 15 Eclipse Genie CLA 2020-02-14 20:31:32 EST
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're closing this bug.

If you have further information on the current state of the bug, please add it and reopen this bug. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

--
The automated Eclipse Genie.