Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 189695

Summary: move stripping of RSS induced tags into core
Product: z_Archived Reporter: Steffen Pingel <steffen.pingel>
Component: MylynAssignee: Steffen Pingel <steffen.pingel>
Status: RESOLVED FIXED QA Contact:
Severity: minor    
Priority: P3 CC: mik.kersten
Version: unspecified   
Target Milestone: 2.3   
Hardware: PC   
OS: Linux   
Whiteboard:
Bug Depends on:    
Bug Blocks: 190852    
Attachments:
Description Flags
mylyn/context/zip
none
test case showing problem with last changes none

Description Steffen Pingel CLA 2007-05-29 11:13:02 EDT
JiraTaskDataHandler uses HTML2TextReader to convert HTML returned by the RSS reader to text. This conversion is specific to the RSS feed and should be done in RssContentHandler. 

See also bug 186897 comment 1.
Comment 1 Eugene Kuleshov CLA 2007-05-29 11:51:07 EDT
I am not sure about this because we may need an original html (i.e. for the rich editor)
Comment 2 Steffen Pingel CLA 2007-05-29 12:03:38 EDT
I assumed that the conversion is only needed to reverse the escaping required for the RSS feed. It should not strip any intentionally added HTML. We need test cases for that.
Comment 3 Eugene Kuleshov CLA 2007-05-29 12:09:12 EDT
Hmm. Maybe you are right
Comment 4 Steffen Pingel CLA 2007-05-29 12:19:35 EDT
I haven't checked but I guess that descriptions returned by the SOAP service would not need any conversion.
Comment 5 Mik Kersten CLA 2007-05-30 07:55:38 EDT
Btw, when I added that HTML2TextReader I was wondering if it should go somewhere more generic than JIRA.  It shold be coming from Platform but they have several copies of it as well and none in a good spot.

This class can't be moved into any core plug-in though, because it unfortunately depends on SWT and JFace.
Comment 6 Steffen Pingel CLA 2007-05-30 11:08:44 EDT
I am wondering if it really the right class for us to use since the goal is not to strip HTML but to do entity conversion.
Comment 7 Mik Kersten CLA 2007-05-30 14:18:06 EDT
Yes, I had this same thought when adding this functionality and knew it was going into the wrong place (I may have noted that in the code).  This is a presentation level thing, e.g. at some point we might want to do things like presenting the HTML markup as StyledText.

So what if we make the task editor strip the HTML when presenting?  That way we could also have a mode for the task editor where we get it to use the browser widget for rendering comments, and maybe later StyledText (so that we can continue to support proper Java Stack Trace links).
Comment 8 Steffen Pingel CLA 2008-01-02 17:40:18 EST
I'll switch from HTML2TextReader to using the commons-lang library (bug 208073). This should also address bug 208600.
Comment 9 Steffen Pingel CLA 2008-01-11 20:57:19 EST
Done.
Comment 10 Steffen Pingel CLA 2008-01-11 20:57:23 EST
Created attachment 86744 [details]
mylyn/context/zip
Comment 11 Eugene Kuleshov CLA 2008-01-12 14:37:01 EST
Steffen, I don't think that changes made for this issue are really good idea, unless you have something else in mind. For instance, the following method in the RssContentHandler:

	private String getCurrentElementTextEscapeHtml() {
		String unescaped = currentElementText.toString();
		unescaped = unescaped.replaceAll("\n", "");
		unescaped = unescaped.replaceAll("<br/>", "\n");
		unescaped = unescaped.replaceAll("&nbsp;", " ");
		unescaped = StringEscapeUtils.unescapeXml(unescaped);
		return unescaped;
	}

is changing text retrieved from the rss, so now we don't anymore have rendered html text from the jira, not original text. So it doesn't seem like change you made is playing in a favor for using jira headless api for working with jira tasks.
Comment 12 Steffen Pingel CLA 2008-01-12 15:19:41 EST
Please review my test cases.
Comment 13 Eugene Kuleshov CLA 2008-01-12 15:54:30 EST
The only test case I see in the task context is testUpdateIssueNonAscii() and without additional explanation I don't see how it is relevant to html rendering
Comment 14 Steffen Pingel CLA 2008-01-12 16:02:18 EST
Please provide a test case that demonstrates how my changes broke your API expectations.
Comment 15 Eugene Kuleshov CLA 2008-01-12 18:32:24 EST
Steffen, please use your common sense. The changes you've made don't allow to have html representation from the server and neither original representation. Before your change JIRA client allowed to retrieve at least html representation, so this is practically a regression (loss of the feature).
Comment 16 Steffen Pingel CLA 2008-01-12 18:47:20 EST
I am not sure you understand the escaping used by the RSS feed that is returned by the server. That is why I asked you to review my test cases. The code you quoted in comment 11 simply unescapes the RSS feed restoring the HTML returned by the server. Once again, please provide a test case that demonstrates that the HTML is not restored correctly. Before my changes the HTML was converted to some garbled representation that was intermingled with StyledText markup (see bug   190852). 
Comment 17 Eugene Kuleshov CLA 2008-01-12 20:35:10 EST
Created attachment 86767 [details]
test case showing problem with last changes

Steffen, I don't know if you are trying to give me the hard time or really don't see the obvious issue. 

Here is the failed test case as you requested. It shows rss reader does not produce exactly formatted html and certain things are lost during parsing, for example line breaks inside <pre> tags.
Comment 18 Steffen Pingel CLA 2008-01-12 21:51:30 EST
The escaping used by the RSS feed is different depending on whether Wiki markup is enabled or not. Here is the XML for http://mylyn.eclipse.org/jira-enterprise-3.9/browse/SCRATCH-194:

<description>line 1

&lt;p&gt;line 3&lt;/p&gt;</description>
                <environment>line 1
&lt;br/&gt;

&lt;br/&gt;
line 3</environment>

Markup is enabled for description but not for environment. The current JIRA connector implementation only works correctly if markup is not enabled. That is known and tracked on bug 190852.
Comment 19 Eugene Kuleshov CLA 2008-01-13 00:15:50 EST
Steffen, the test case I created per your request is showing parsing of the rss returned by jira with markup enabled. So I am still interested to hear an explanation why you think it is a good idea to modify data from the server in a way that can't be reversed (that is breaking the data).

It doesn't really matter if markup is on or off, and I would say that the data should be stored in the way as returned from server. Then it is up to the JIRA UI how to visualize data and that is what bug 190852 should be taking care of.

If you think about it, JIRA connector can be retrieving hundreds issues, then those issues should be available at least for viewing offline. It seems a logical choice to keep the ready-to-use html at least for viewing. Am I missing something?
Comment 20 Steffen Pingel CLA 2008-01-13 01:09:05 EST
I agree with the points you made in comment 19 but I am under the impression that you have either not read my comments or I have not expressed myself clearly. Quoting bug 190852 comment 10:

If rich-text rendering is enabled for the repository that [XML] representation only provides rendered HTML but not the original markup. That means it will be necessary to use either a SOAP or XML-RPC call to get the source markup.

The most important use case that needs to be supported is to retrieve, edit and submit an issue. If markup is not enabled JIRA escapes line breaks with <br/>\n and spaces with &nbsp; for certain fields. In order to submit these fields back to the repository the text needs to be unescaped. That is what the code quoted in comment  11 does and other code did before my refactoring. This is validated by test cases and necessary to make the connector work at all.

The problems are:

- If markup is enabled for a field (e.g. description or environment) the RSS feed does not return the source but a rendered version of the source. We currently have no way to submit such an issue without causing changes to those fields (see bug 190852). This is a bug.
 
- Unfortunately the escaping is differently when markup is enabled causing the HTML to be mangled but I am currently not concerned about that for 2 reasons:

 - Mylyn does not support rendering of HTML for JIRA making the HTML unreadable with line breaks or without line breaks. Supporting that is not a priority for Mylyn 2.3.

 - The RSS feed does not indicate which type of escaping is used. If you can suggest a reliable method of detecting enabled markup we can skip unescaping of line breaks for those fields. Supporting that is not a priority for Mylyn 2.3 either. 

It is priority to support markup in a way that it does not "break" when submitting issues. I have suggested accordant changes on bug 190852 comment 10 . You have already voiced disagreement with these priorities on bug 190852 so I suggest that you add this as an item to next weeks agenda.
Comment 21 Steffen Pingel CLA 2008-01-13 01:19:32 EST
To sum up comment 20:

Eugene, please suggest how to support the use-case pointed out in comment 19 without breaking existing test cases. I simply don't know how to do it but would happily accept a patch. 
Comment 22 Eugene Kuleshov CLA 2008-01-13 19:20:02 EST
I don't know about existing test cases, but I believe it is possible to detect if html have markup or not. The latter probably won't have html tags other then <br/>.

BTW, what was your intention with unescaped.replaceAll("<br/>\n", "\n"); in RssContentHandler.getCurrentElementTextEscapeHtml() method? I probably wouldn't have to guess, if there was a low level unit test covering that method.
Comment 23 Mik Kersten CLA 2008-01-14 16:10:43 EST
 (In reply to comment #15)
> Steffen, please use your common sense. The changes you've made don't allow to
> have html representation from the server and neither original representation.
> Before your change JIRA client allowed to retrieve at least html representation,
> so this is practically a regression (loss of the feature).

Eugene: this comment is unfair.  Steffen did not break any feature or create any regression, he is clearly using his common sense and his best understanding of how to prioritize this work.  I realize that you have a different point of view on it, but it's far from clear to me whether your point of view and arguments will make for a better prioritization of the current problems with the connector, and your arguments have not taken his points into account and are not entirely constructive.  Yes, we are in a challenging state with where markup and information loss, and I agree with Steffen's approach to preserving as much information as possible and leaving the rendering/editing/transformation to the UI layer.
Comment 24 Steffen Pingel CLA 2008-01-14 16:25:54 EST
I have updated the subject of the bug to reflect the description. Summary of related bugs:

 - do RSS feed specific conversions in JIRA core (this bug)
 - avoid data conversion when editing issues that containt wiki markup (bug 190852) 
 - improve presentation of markup (bug 215262)
Comment 25 Eugene Kuleshov CLA 2008-01-14 16:54:18 EST
(In reply to comment #23)
> Eugene: this comment is unfair.  Steffen did not break any feature or create any
> regression, he is clearly using his common sense and his best understanding of...

My interpretation is that existing feature been indeed broken, and my attempts to get better understanding what Sfeffen's line of thoughts was when doing that wasn't too successful. See comments #11, #12 and #13.

> how to prioritize this work.  I realize that you have a different point of view
> on it, but it's far from clear to me whether your point of view and arguments
> will make for a better prioritization of the current problems with the
> connector, and your arguments have not taken his points into account and are not
> entirely constructive.  Yes, we are in a challenging state with where markup and
> information loss, and I agree with Steffen's approach to preserving as much
> information as possible and leaving the rendering/editing/transformation to the
> UI layer.

That is the problem. Steffen's approach is *NOT* preserving "as much as possible information", it is stripping significant information and that is what I've been arguing about in last 10 comments or so. I even provided a unit test to demonstrate that, but for some reason my unit test been disregarded.

Maybe I am still misunderstanding something, but Steffen's responses to my questions wasn't really helping to remove that misunderstanding.
Comment 26 Steffen Pingel CLA 2008-01-14 17:51:06 EST
Need to reopen. It turns out that the current conversion method is too simple to properly reverse the HTML added by the RSS feed:

 - links are converted to <a... tags
 
Unfortunately the way line breaks are formatted by the server can differ depending on whether an issue is submitted by Mylyn or by the web-interface. RSS feed for line-ending:

 - web submission: \r<br/>\n\r
 - task editor submission: <br/>\n\r

That will need additional test coverage.
Comment 27 Eugene Kuleshov CLA 2008-01-14 18:01:15 EST
(In reply to comment #23)
> ...I agree with Steffen's approach to preserving as much
> information as possible and leaving the rendering/editing/transformation to the
> UI layer.

Steffen, even Mik misinterpreted your change, I do agree with his comment that transformations related to the rendering should be done in the UI layer. So it would be good to restore that. Sorry if my previous comments where I tried to raise the same point weren't clear.
Comment 28 Steffen Pingel CLA 2008-01-14 18:27:25 EST
Could you clarify what UI layer means in this context (i.e. JiraTaskEditor or JiraTaskDataHandler which is in the UI plug-in but I consider it part of the core)? 

My goal is to move all lossy data conversion to the task editor and store the source representation in the task data. Source representation means the data represented in such a way that in can be transmitted back to the repository without causing a change (unless edited obviously).
Comment 29 Steffen Pingel CLA 2008-01-15 00:50:40 EST
Added detection for markup by scanning for any tag that is not <a> or <br/>. If markup is detected no conversion of the RSS feed is attempted otherwise HTML2TextReader is used to strip <a> and <br/> tags. I would prefer to limit the conversion to those tags and use a simpler conversion class but there is no point in re-implementing the class since it is known to work.