Community
Participate
Working Groups
Documents with a Schema instead of a DTD are not correctly read by the SAX-Parser. 1. Create the following document <?xml version='1.0' encoding='UTF-8'?> <article xmlns="http://docbook.org/ns/docbook"> <title>Title</title> <para>Para1</para> <!-- Comment 1 --> <!-- Comment 2 --> </article> 2. Open the document with VEX Result: VES inserts TEXT elements to the DOM This is caused by the SAX-Parser that reports all whitespace. When used with a DTD, the parser reports ignorable whitespace instead. The attached patch uses a very basic validation already in the document builder.
Created attachment 230819 [details] A unit test to catch the described behaviour
Created attachment 230820 [details] A possible patch with basic validation in DocumentBuilder
Carsten, thank you for this contribution. As far as I can see in the test, the problem are the text nodes containing only whitespace, right? I refactored the test to use the new IAxis-API in order to make it more expressive: @Test public void shouldNotAddWhitespaceTextNodesInDocumentWithoutDTD() throws Exception { final DocumentReader reader = new DocumentReader(); final IDocument document = reader.read(TestResources.get("documentWithoutDTD.xml")); final IElement rootElement = document.getRootElement(); final IFilter<INode> recursiveWhitespaceTextNodeFilter = new IFilter<INode>() { public boolean matches(final INode node) { return node.accept(new BaseNodeVisitorWithResult<Boolean>(false) { @Override public Boolean visit(final IElement element) { return containsEmptyTextNodes(element); } @Override public Boolean visit(final IText text) { return text.getText().trim().length() == 0; } }); } private boolean containsEmptyTextNodes(final IElement element) { return !element.children().matching(this).isEmpty(); } }; final IAxis<? extends INode> nodesWithEmptyTextNodes = rootElement.children().matching(recursiveWhitespaceTextNodeFilter); assertEquals("whitespace text nodes", 0, nodesWithEmptyTextNodes.count()); } What do you think about this?
This looks good to me. The result is a bit clearer. I just realized the provided patch has a little side effect, as it removes all invalid text nodes from the source. Originally, for elements with DTD an exception is thrown, docs without dtd are displayed but uneditable. I think it would be ok to remove those texts silently, or would it be better to show an error to the user (Something like 'Invalid document: Content not allowed in xxx')?
(In reply to comment #4) Currently we don't have any concept implemented for handling invalid content. There is the idea of having some kind of markers (e.g. for missing elements, wrong attribute values etc.) and providing quickfixes, but nothing realized yet in this direction. At the moment we can react in two ways: 1. superflous whitespace can be removed silently (this is conform with the XML spec) 2. documents containing invalid content should maybe opened read-only and a hint should be shown I think this bug involves only (1). With (2) I'm not really shure, but this should be only a temporary solution anyway because a user should always be able to make an invalid document valid in Vex (hopefully in the future :-) ) The solutions for this bug should concentrate on (1) (tiny little steps) and we should have a discussion about a short term concept for invalid content in a separate context.
Created attachment 230830 [details] Revised patch
(In reply to comment #5) Ok, i agree. I changed the patch to ignore only whitespace. It still ignores some characters it should not ignore according to XML standard. Thats caused by the use of Character#isWhitespace, that returns true for unicode chars that should not be ignored.
Created attachment 230831 [details] Revised patch
The new patch looks good to me. I did some clean-up and removed the dependency from DocumentReader to WTPVexValidator. Your contribution was commited as f726a9d02dc8a1422402e8a681153456d5139c75.
I assert that I: 1. authored 100% the content they are contributing 2. have the rights to donate the content to Eclipse 3. contribute the content under the EPL
M2 released