Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 357677 - [parser] Incorrect parsing of XML/HTML escape symbols after editing
Summary: [parser] Incorrect parsing of XML/HTML escape symbols after editing
Status: RESOLVED FIXED
Alias: None
Product: WTP Source Editing
Classification: WebTools
Component: wst.xml (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: 3.4.2   Edit
Assignee: Salvador Zalapa CLA
QA Contact: Nitin Dahyabhai CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-14 15:38 EDT by Yahor Radtsevich CLA
Modified: 2013-01-08 10:39 EST (History)
1 user (show)

See Also:
nsand.dev: review+


Attachments
Several screenshots to reproduce (53.57 KB, image/png)
2011-09-14 15:52 EDT, Yahor Radtsevich CLA
no flags Details
Intial Patch (1.58 KB, patch)
2012-06-14 15:01 EDT, Salvador Zalapa CLA
no flags Details | Diff
Second proposed patch (30.66 KB, application/octet-stream)
2012-06-28 12:47 EDT, Salvador Zalapa CLA
no flags Details
Second proposed patch (fixed) (32.88 KB, patch)
2012-06-28 15:24 EDT, Salvador Zalapa CLA
no flags Details | Diff
Patch (head version) (28.07 KB, patch)
2012-07-04 12:38 EDT, Salvador Zalapa CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yahor Radtsevich CLA 2011-09-14 15:38:21 EDT
Build Identifier: 

When XML/HTML escape symbols are edited in XML/HTML editor, they may become unrecognized by the XML/HTML parser.

Reproducible: Always

Steps to Reproduce:
1. Create new XML file with the following content:
&
ASSERT: these symbols are highlighted in blue.
2. Open this file with XML editor.
3. Type space after '&' and remove it.
ACTUAL RESULT:
These symbols become black.
EXPECTED RESULT:
Symbols are highlighted in blue.
Comment 1 Yahor Radtsevich CLA 2011-09-14 15:52:57 EDT
Created attachment 203364 [details]
Several screenshots to reproduce

Screenshot demonstrating this bug are attached. I have used HTML Page Designer to illustrating purposes, but the result is the same for any XML/HTML editor.
Comment 2 Salvador Zalapa CLA 2012-06-14 15:00:25 EDT
Since after breaking down the EntityName region, it becomes in a XML_Content region. So, when the heuristics are triggered (in order do not re parse the region), the XMLContentRegion.UpdateRegion() method consider that if the change's lenght == 0 (withspace deleted) it can handle the change by itself, by just updating the region indexes (without any parse action). This initial patch adds a new heuristic to the XML_Content.UpdateRegion() method, if the content's length is between 4 and 10, it could be a potential EntityName region, so it should be parsed. The scenario reported is covered by this proposed patch however i am still facing an issue why i try to join back the following (due to i am getting 2 regions here):

&Aac ute;

I was wondering if this patch is sufficient and adequate to cover this bug? i am still trying to figure out a solution for the other scenario.
Comment 3 Salvador Zalapa CLA 2012-06-14 15:01:39 EDT
Created attachment 217380 [details]
Intial Patch
Comment 4 Salvador Zalapa CLA 2012-06-28 12:46:23 EDT
Attaching a second patch version, this patch covers all the scenarios. As i said in comment#2, the proposed heuristic consists in detect if the length of the region is between 4 and 10, it is a potential EntityName region, so a reparse is triggered. In this new patch, this also hadles the scenario:

&Aac ute;

For this i did add an extra rule to the XMLTokenizer, to handle a entity region decomposed just as one XML_Content instead of two or more (in order to be cached just one region, so the reparse can be performed properly).
Comment 5 Salvador Zalapa CLA 2012-06-28 12:47:05 EDT
Created attachment 218036 [details]
Second proposed patch
Comment 6 Salvador Zalapa CLA 2012-06-28 15:24:04 EDT
Created attachment 218050 [details]
Second proposed patch (fixed)

In the last patch i forgot to delete some system.out.println sentences.
Comment 7 Salvador Zalapa CLA 2012-07-04 12:38:03 EDT
Created attachment 218284 [details]
Patch (head version)

My mistake again, the last patch used an old WTP version, this one is on head version.
Comment 8 Nick Sandonato CLA 2012-11-07 16:37:49 EST
Hi Chava, is there any way to accomplish this without being dependent on the length of the region being between 4 and 10 characters? Maybe something based on the text contents or region type.
Comment 9 Salvador Zalapa CLA 2013-01-07 16:37:29 EST
https://github.com/zalapa/webtools.sourceediting/commit/425850e9d8f296d0c0ea156eba27e251f3c49386

Adding the new version, this is filtering the text started with "#" and ended with ";"
Comment 10 Nick Sandonato CLA 2013-01-08 10:38:48 EST
Patch from the remote repository looks good. Thanks, Chava.