Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 208600

Summary: HtmlStreamTokenizer.unescape..() don't properly handle entities
Product: z_Archived Reporter: Eugene Kuleshov <ekuleshov>
Component: MylynAssignee: Steffen Pingel <steffen.pingel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: robert.elves, shawn.minto, steffen.pingel
Version: unspecified   
Target Milestone: 2.3   
Hardware: PC   
OS: All   
Whiteboard:
Attachments:
Description Flags
test case showing the issue
none
mylyn/context/zip
none
mylyn/context/zip none

Description Eugene Kuleshov CLA 2007-11-02 14:02:49 EDT
HtmlStreamTokenizer.unescape don't properly handle entities. I.e. it converts "&quot" literal (not closed with ';') into ".
Comment 1 Eugene Kuleshov CLA 2007-11-02 14:06:57 EDT
Created attachment 81991 [details]
test case showing the issue

here is simple test case showing the issue
Comment 2 Eugene Kuleshov CLA 2007-11-02 14:07:00 EDT
Created attachment 81992 [details]
mylyn/context/zip
Comment 3 Eugene Kuleshov CLA 2007-11-02 14:24:03 EDT
The easiest way to fix this is probably use org.apache.commons.lang.StringEscapeUtils.unescapeHtml() from the commons-lang
Comment 4 George Lindholm CLA 2007-11-03 03:09:02 EDT
Also, HtmlStreamTokenizer has not been updated in 5 years and recognizes ~114 entity names.

StringEscapeUtils (Entities) was changed 3 months ago and currently recognizes ~250 entity names.
Comment 5 Mik Kersten CLA 2007-11-06 01:54:26 EST
Eugene, George: is the Commons Lang really the best library for unescaping HTML?  That functionality seems a bit misplaced in Lang, so I wonder if we can get it from another library that we're already approved for.

Steffen: let me know if you're familiar with anything.

Shawn: it's interesting that this class of yours has not been changed for 5 years!  Probably time for us to move on ;)
Comment 6 Mik Kersten CLA 2007-11-06 01:55:04 EST
Eugene: thanks for the test case, tthat's helpful.
Comment 7 Eugene Kuleshov CLA 2007-11-06 02:02:13 EST
(In reply to comment #6)
> Eugene: thanks for the test case, tthat's helpful.

Thank George. That is his testcase from bug 208073
Comment 8 Mik Kersten CLA 2007-12-12 23:53:46 EST
Steffen: if this isn't already supported by our addition of commons-lang consider for 2.3.
Comment 9 Steffen Pingel CLA 2008-01-11 19:46:51 EST
Thanks Eugene. I have deprecated the escaping methods in HtmlStreamTokenizer. From now on StringEscapeUtils from the commons lang library should be used instead.

Rob, I'll leave the cleanup of the Bugzilla deprecation warnings to you.
Comment 10 Steffen Pingel CLA 2008-01-11 19:46:56 EST
Created attachment 86742 [details]
mylyn/context/zip