Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 491366

Summary: DOT grammar does not accept unicode characters in string ID
Product: [Tools] GEF Reporter: Alexander Nyßen <nyssen>
Component: GEF DOTAssignee: Alexander Nyßen <nyssen>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3    
Version: 0.2.0   
Target Milestone: 4.0.0 (Neon) M7   
Hardware: All   
OS: All   
Whiteboard:

Description Alexander Nyßen CLA 2016-04-09 01:33:58 EDT
The dot language definition allows that string ids are "Any string of alphabetic ([a-zA-Z\200-\377]) characters, underscores ('_') or digits ([0-9]), not beginning with a digit;".

Our DOT grammar up to now only accepts the following:

terminal STRING:
	('a'..'z' | 'A'..'Z'| '_') ('a'..'z' | 'A'..'Z'  '_' | '0'..'9')*;

We need to adjust the grammar to accept all that is allowed in DOT.
Comment 1 Alexander Nyßen CLA 2016-04-09 01:48:21 EDT
I pushed the following changes to origin/master:

- Changed grammer rule to accept characters from unicode range \u0080 to \u00FF within ids. This corresponds to octal 200-377, which is specified as range within the DOT grammar language definition.
- Adjusted sample_input.dot to contain a special char in an unquoted id.

Resolving as fixed in 4.0.0 M7.