Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 357677

Summary: [parser] Incorrect parsing of XML/HTML escape symbols after editing
Product: [WebTools] WTP Source Editing Reporter: Yahor Radtsevich <yahorr>
Component: wst.xmlAssignee: Salvador Zalapa <zalapa>
Status: RESOLVED FIXED QA Contact: Nitin Dahyabhai <thatnitind>
Severity: normal    
Priority: P3 CC: nsand.dev
Version: unspecifiedFlags: nsand.dev: review+
Target Milestone: 3.4.2   
Hardware: PC   
OS: Windows 7   
Whiteboard:
Attachments:
Description Flags
Several screenshots to reproduce
none
Intial Patch
none
Second proposed patch
none
Second proposed patch (fixed)
none
Patch (head version) none

Description Yahor Radtsevich CLA 2011-09-14 15:38:21 EDT
Build Identifier: 

When XML/HTML escape symbols are edited in XML/HTML editor, they may become unrecognized by the XML/HTML parser.

Reproducible: Always

Steps to Reproduce:
1. Create new XML file with the following content:
&amp;
ASSERT: these symbols are highlighted in blue.
2. Open this file with XML editor.
3. Type space after '&' and remove it.
ACTUAL RESULT:
These symbols become black.
EXPECTED RESULT:
Symbols are highlighted in blue.
Comment 1 Yahor Radtsevich CLA 2011-09-14 15:52:57 EDT
Created attachment 203364 [details]
Several screenshots to reproduce

Screenshot demonstrating this bug are attached. I have used HTML Page Designer to illustrating purposes, but the result is the same for any XML/HTML editor.
Comment 2 Salvador Zalapa CLA 2012-06-14 15:00:25 EDT
Since after breaking down the EntityName region, it becomes in a XML_Content region. So, when the heuristics are triggered (in order do not re parse the region), the XMLContentRegion.UpdateRegion() method consider that if the change's lenght == 0 (withspace deleted) it can handle the change by itself, by just updating the region indexes (without any parse action). This initial patch adds a new heuristic to the XML_Content.UpdateRegion() method, if the content's length is between 4 and 10, it could be a potential EntityName region, so it should be parsed. The scenario reported is covered by this proposed patch however i am still facing an issue why i try to join back the following (due to i am getting 2 regions here):

&Aac ute;

I was wondering if this patch is sufficient and adequate to cover this bug? i am still trying to figure out a solution for the other scenario.
Comment 3 Salvador Zalapa CLA 2012-06-14 15:01:39 EDT
Created attachment 217380 [details]
Intial Patch
Comment 4 Salvador Zalapa CLA 2012-06-28 12:46:23 EDT
Attaching a second patch version, this patch covers all the scenarios. As i said in comment#2, the proposed heuristic consists in detect if the length of the region is between 4 and 10, it is a potential EntityName region, so a reparse is triggered. In this new patch, this also hadles the scenario:

&Aac ute;

For this i did add an extra rule to the XMLTokenizer, to handle a entity region decomposed just as one XML_Content instead of two or more (in order to be cached just one region, so the reparse can be performed properly).
Comment 5 Salvador Zalapa CLA 2012-06-28 12:47:05 EDT
Created attachment 218036 [details]
Second proposed patch
Comment 6 Salvador Zalapa CLA 2012-06-28 15:24:04 EDT
Created attachment 218050 [details]
Second proposed patch (fixed)

In the last patch i forgot to delete some system.out.println sentences.
Comment 7 Salvador Zalapa CLA 2012-07-04 12:38:03 EDT
Created attachment 218284 [details]
Patch (head version)

My mistake again, the last patch used an old WTP version, this one is on head version.
Comment 8 Nick Sandonato CLA 2012-11-07 16:37:49 EST
Hi Chava, is there any way to accomplish this without being dependent on the length of the region being between 4 and 10 characters? Maybe something based on the text contents or region type.
Comment 9 Salvador Zalapa CLA 2013-01-07 16:37:29 EST
https://github.com/zalapa/webtools.sourceediting/commit/425850e9d8f296d0c0ea156eba27e251f3c49386

Adding the new version, this is filtering the text started with "#" and ended with ";"
Comment 10 Nick Sandonato CLA 2013-01-08 10:38:48 EST
Patch from the remote repository looks good. Thanks, Chava.