Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 368068

Summary: [epub] HTML parsers should use jsoup instead of SAX
Product: z_Archived Reporter: Torkild Resheim <torkildr>
Component: MylynAssignee: Torkild Resheim <torkildr>
Status: CLOSED MOVED QA Contact:
Severity: enhancement    
Priority: P3    
Version: unspecified   
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard:
Bug Depends on: 398103    
Bug Blocks:    

Description Torkild Resheim CLA 2012-01-06 18:00:24 EST
While it is required that HTML for EPUB is well formed, this may not always be the case. The part of the EPUB tooling used for generating the table of contents and detecting referenced resources will fail if the HTML is not well formed. **jsoup** (http://jsoup.org/) could be used instead as it it's much better at handling bad HTML and has the additional benefit of being able to clean up the HTML. Options could be added to the EPUB generator for enabling these features in order to ensure that the final EPUB is correct.

See also bug 357294.
Comment 1 David Green CLA 2012-02-02 13:56:36 EST
Mylyn Docs is now free to use jsoup, based on the following CQ:

5978: jsoup Version: 1.6.1 (ATO CQ5559)
https://dev.eclipse.org/ipzilla/show_bug.cgi?id=5978

Also jsoup has just been added to Orbit (available in the latest Stable build http://download.eclipse.org/tools/orbit/downloads/drops/S20120123151124/)
Comment 2 Torkild Resheim CLA 2012-05-11 03:12:26 EDT
Moving to new EPUB component.
Comment 3 Eclipse Webmaster CLA 2022-11-15 11:45:08 EST
Mylyn has been restructured, and our issue tracking has moved to GitHub [1].

We are closing ~14K Bugzilla issues to give the new team a fresh start. If you feel that this issue is still relevant, please create a new one on GitHub.

[1] https://github.com/orgs/eclipse-mylyn