| Summary: | move stripping of RSS induced tags into core | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | z_Archived | Reporter: | Steffen Pingel <steffen.pingel> | ||||||
| Component: | Mylyn | Assignee: | Steffen Pingel <steffen.pingel> | ||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||
| Severity: | minor | ||||||||
| Priority: | P3 | CC: | mik.kersten | ||||||
| Version: | unspecified | ||||||||
| Target Milestone: | 2.3 | ||||||||
| Hardware: | PC | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Bug Depends on: | |||||||||
| Bug Blocks: | 190852 | ||||||||
| Attachments: |
|
||||||||
|
Description
Steffen Pingel
I am not sure about this because we may need an original html (i.e. for the rich editor) I assumed that the conversion is only needed to reverse the escaping required for the RSS feed. It should not strip any intentionally added HTML. We need test cases for that. Hmm. Maybe you are right I haven't checked but I guess that descriptions returned by the SOAP service would not need any conversion. Btw, when I added that HTML2TextReader I was wondering if it should go somewhere more generic than JIRA. It shold be coming from Platform but they have several copies of it as well and none in a good spot. This class can't be moved into any core plug-in though, because it unfortunately depends on SWT and JFace. I am wondering if it really the right class for us to use since the goal is not to strip HTML but to do entity conversion. Yes, I had this same thought when adding this functionality and knew it was going into the wrong place (I may have noted that in the code). This is a presentation level thing, e.g. at some point we might want to do things like presenting the HTML markup as StyledText. So what if we make the task editor strip the HTML when presenting? That way we could also have a mode for the task editor where we get it to use the browser widget for rendering comments, and maybe later StyledText (so that we can continue to support proper Java Stack Trace links). I'll switch from HTML2TextReader to using the commons-lang library (bug 208073). This should also address bug 208600. Done. Created attachment 86744 [details]
mylyn/context/zip
Steffen, I don't think that changes made for this issue are really good idea, unless you have something else in mind. For instance, the following method in the RssContentHandler:
private String getCurrentElementTextEscapeHtml() {
String unescaped = currentElementText.toString();
unescaped = unescaped.replaceAll("\n", "");
unescaped = unescaped.replaceAll("<br/>", "\n");
unescaped = unescaped.replaceAll(" ", " ");
unescaped = StringEscapeUtils.unescapeXml(unescaped);
return unescaped;
}
is changing text retrieved from the rss, so now we don't anymore have rendered html text from the jira, not original text. So it doesn't seem like change you made is playing in a favor for using jira headless api for working with jira tasks.
Please review my test cases. The only test case I see in the task context is testUpdateIssueNonAscii() and without additional explanation I don't see how it is relevant to html rendering Please provide a test case that demonstrates how my changes broke your API expectations. Steffen, please use your common sense. The changes you've made don't allow to have html representation from the server and neither original representation. Before your change JIRA client allowed to retrieve at least html representation, so this is practically a regression (loss of the feature). I am not sure you understand the escaping used by the RSS feed that is returned by the server. That is why I asked you to review my test cases. The code you quoted in comment 11 simply unescapes the RSS feed restoring the HTML returned by the server. Once again, please provide a test case that demonstrates that the HTML is not restored correctly. Before my changes the HTML was converted to some garbled representation that was intermingled with StyledText markup (see bug 190852). Created attachment 86767 [details]
test case showing problem with last changes
Steffen, I don't know if you are trying to give me the hard time or really don't see the obvious issue.
Here is the failed test case as you requested. It shows rss reader does not produce exactly formatted html and certain things are lost during parsing, for example line breaks inside <pre> tags.
The escaping used by the RSS feed is different depending on whether Wiki markup is enabled or not. Here is the XML for http://mylyn.eclipse.org/jira-enterprise-3.9/browse/SCRATCH-194: <description>line 1 <p>line 3</p></description> <environment>line 1 <br/> <br/> line 3</environment> Markup is enabled for description but not for environment. The current JIRA connector implementation only works correctly if markup is not enabled. That is known and tracked on bug 190852. Steffen, the test case I created per your request is showing parsing of the rss returned by jira with markup enabled. So I am still interested to hear an explanation why you think it is a good idea to modify data from the server in a way that can't be reversed (that is breaking the data). It doesn't really matter if markup is on or off, and I would say that the data should be stored in the way as returned from server. Then it is up to the JIRA UI how to visualize data and that is what bug 190852 should be taking care of. If you think about it, JIRA connector can be retrieving hundreds issues, then those issues should be available at least for viewing offline. It seems a logical choice to keep the ready-to-use html at least for viewing. Am I missing something? I agree with the points you made in comment 19 but I am under the impression that you have either not read my comments or I have not expressed myself clearly. Quoting bug 190852 comment 10: If rich-text rendering is enabled for the repository that [XML] representation only provides rendered HTML but not the original markup. That means it will be necessary to use either a SOAP or XML-RPC call to get the source markup. The most important use case that needs to be supported is to retrieve, edit and submit an issue. If markup is not enabled JIRA escapes line breaks with <br/>\n and spaces with for certain fields. In order to submit these fields back to the repository the text needs to be unescaped. That is what the code quoted in comment 11 does and other code did before my refactoring. This is validated by test cases and necessary to make the connector work at all. The problems are: - If markup is enabled for a field (e.g. description or environment) the RSS feed does not return the source but a rendered version of the source. We currently have no way to submit such an issue without causing changes to those fields (see bug 190852). This is a bug. - Unfortunately the escaping is differently when markup is enabled causing the HTML to be mangled but I am currently not concerned about that for 2 reasons: - Mylyn does not support rendering of HTML for JIRA making the HTML unreadable with line breaks or without line breaks. Supporting that is not a priority for Mylyn 2.3. - The RSS feed does not indicate which type of escaping is used. If you can suggest a reliable method of detecting enabled markup we can skip unescaping of line breaks for those fields. Supporting that is not a priority for Mylyn 2.3 either. It is priority to support markup in a way that it does not "break" when submitting issues. I have suggested accordant changes on bug 190852 comment 10 . You have already voiced disagreement with these priorities on bug 190852 so I suggest that you add this as an item to next weeks agenda. To sum up comment 20: Eugene, please suggest how to support the use-case pointed out in comment 19 without breaking existing test cases. I simply don't know how to do it but would happily accept a patch. I don't know about existing test cases, but I believe it is possible to detect if html have markup or not. The latter probably won't have html tags other then <br/>.
BTW, what was your intention with unescaped.replaceAll("<br/>\n", "\n"); in RssContentHandler.getCurrentElementTextEscapeHtml() method? I probably wouldn't have to guess, if there was a low level unit test covering that method.
(In reply to comment #15) > Steffen, please use your common sense. The changes you've made don't allow to > have html representation from the server and neither original representation. > Before your change JIRA client allowed to retrieve at least html representation, > so this is practically a regression (loss of the feature). Eugene: this comment is unfair. Steffen did not break any feature or create any regression, he is clearly using his common sense and his best understanding of how to prioritize this work. I realize that you have a different point of view on it, but it's far from clear to me whether your point of view and arguments will make for a better prioritization of the current problems with the connector, and your arguments have not taken his points into account and are not entirely constructive. Yes, we are in a challenging state with where markup and information loss, and I agree with Steffen's approach to preserving as much information as possible and leaving the rendering/editing/transformation to the UI layer. I have updated the subject of the bug to reflect the description. Summary of related bugs: - do RSS feed specific conversions in JIRA core (this bug) - avoid data conversion when editing issues that containt wiki markup (bug 190852) - improve presentation of markup (bug 215262) (In reply to comment #23) > Eugene: this comment is unfair. Steffen did not break any feature or create any > regression, he is clearly using his common sense and his best understanding of... My interpretation is that existing feature been indeed broken, and my attempts to get better understanding what Sfeffen's line of thoughts was when doing that wasn't too successful. See comments #11, #12 and #13. > how to prioritize this work. I realize that you have a different point of view > on it, but it's far from clear to me whether your point of view and arguments > will make for a better prioritization of the current problems with the > connector, and your arguments have not taken his points into account and are not > entirely constructive. Yes, we are in a challenging state with where markup and > information loss, and I agree with Steffen's approach to preserving as much > information as possible and leaving the rendering/editing/transformation to the > UI layer. That is the problem. Steffen's approach is *NOT* preserving "as much as possible information", it is stripping significant information and that is what I've been arguing about in last 10 comments or so. I even provided a unit test to demonstrate that, but for some reason my unit test been disregarded. Maybe I am still misunderstanding something, but Steffen's responses to my questions wasn't really helping to remove that misunderstanding. Need to reopen. It turns out that the current conversion method is too simple to properly reverse the HTML added by the RSS feed: - links are converted to <a... tags Unfortunately the way line breaks are formatted by the server can differ depending on whether an issue is submitted by Mylyn or by the web-interface. RSS feed for line-ending: - web submission: \r<br/>\n\r - task editor submission: <br/>\n\r That will need additional test coverage. (In reply to comment #23) > ...I agree with Steffen's approach to preserving as much > information as possible and leaving the rendering/editing/transformation to the > UI layer. Steffen, even Mik misinterpreted your change, I do agree with his comment that transformations related to the rendering should be done in the UI layer. So it would be good to restore that. Sorry if my previous comments where I tried to raise the same point weren't clear. Could you clarify what UI layer means in this context (i.e. JiraTaskEditor or JiraTaskDataHandler which is in the UI plug-in but I consider it part of the core)? My goal is to move all lossy data conversion to the task editor and store the source representation in the task data. Source representation means the data represented in such a way that in can be transmitted back to the repository without causing a change (unless edited obviously). Added detection for markup by scanning for any tag that is not <a> or <br/>. If markup is detected no conversion of the RSS feed is attempted otherwise HTML2TextReader is used to strip <a> and <br/> tags. I would prefer to limit the conversion to those tags and use a simpler conversion class but there is no point in re-implementing the class since it is known to work. |