Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 329125 - (var+=RuleA|var+=RuleB)+ is not the same as var+=(RuleA|RuleB)+
Summary: (var+=RuleA|var+=RuleB)+ is not the same as var+=(RuleA|RuleB)+
Status: REOPENED
Alias: None
Product: TMF
Classification: Modeling
Component: Xtext Backlog (show other bugs)
Version: unspecified   Edit
Hardware: PC Mac OS X - Carbon (unsup.)
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-29 20:19 EDT by Mirko Raner CLA
Modified: 2016-08-24 12:52 EDT (History)
4 users (show)

See Also:


Attachments
A zip file containing the projects that exhibit the described behavior. (1.08 MB, application/zip)
2010-10-29 20:23 EDT, Mirko Raner CLA
no flags Details
Reduced example (3.19 MB, application/zip)
2016-08-24 03:25 EDT, Bernhard Stadler CLA
no flags Details
Experimental modifications (8.27 KB, application/zip)
2016-08-24 03:26 EDT, Bernhard Stadler CLA
no flags Details
EXPERIMENTAL patch for avoiding unnecessary parentheses (108.70 KB, application/octet-stream)
2016-08-24 05:15 EDT, Bernhard Stadler CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mirko Raner CLA 2010-10-29 20:19:08 EDT
Build Identifier: 3.6.1 (Helios SR1)

See http://www.eclipse.org/forums/index.php?t=msg&th=199125#msg_num_6 for details.


Reproducible: Always

Steps to Reproduce:
To reproduce, run the unit tests in the attached projects.
Comment 1 Mirko Raner CLA 2010-10-29 20:23:21 EDT
Created attachment 182090 [details]
A zip file containing the projects that exhibit the described behavior.

In the attached project GrammarATest will pass and GrammarBTest will fail.
The two grammars only differ in the "Text" production.
Comment 2 Moritz Eysholdt CLA 2011-01-11 06:12:56 EST
This is a parser related issue. To summarize your tests:

the string
------
This & That
------

can be matched by
------
Text: (elements+=EntityReference|elements+=SpecialCharacters|elements+=PlainText)+;EntityReference: AMPERSAND entity=IDENTIFIER SEMICOLON;
PlainText: {PlainText} (PCDATA|INDEX|COLON|DASH|DOT|SEMICOLON|AMPERSAND|NameOrKeyword|'='|'['|']'|'\\');
terminal AMPERSAND: '&';
------

but when changing "Text" to
------
Text: elements+=(EntityReference|SpecialCharacters|PlainText)+;
------

the following error is raised: "required (...)+ loop did not match anything at input '&'"

Mirko, can you turn off backtracking in your grammar? It's "evil" because it hides syntactic ambiguities.

In your case, the string "& id ;" can be parsed as PlainText as well as as EntityReference.

However, this doesn't explain why the one grammar "works" and the other one doesn't.
Comment 3 Mirko Raner CLA 2011-01-14 03:51:48 EST
Thanks for looking into this, Moritz!

I was aware that parsing of "& id ;" is ambiguous, and I actually enabled backtracking on purpose, as a quick and easy way to resolve this particular ambiguity (and a couple other similar ones). The goal is that "& id ;" gets parsed as an EntityReference, but "& id <something other than ';'>" gets parsed as PlainText.
I'll check my original grammar to see whether I can resolve the ambiguities without backtracking, but luckily the work-around of using multiple elements+=... assignments works for now.
Comment 4 Sven Efftinge CLA 2011-03-26 15:54:11 EDT
closing this due to the last comments.
Comment 5 Mirko Raner CLA 2011-03-26 18:48:13 EDT
There might be a misunderstanding here. This problem still persists and is not fixed yet. The syntax (var+=RuleA|var+=RuleB)+ should be the same as var+=(RuleA|RuleB)+, but it isn't.
As Moritz summarized:

> However, this doesn't explain why the one grammar "works" and the other one
> doesn't.

So, whereas there is an easy work-around, Xtext doesn't really behave the way it should, and I think the underlying problem should be investigated and fixed. I suggest reopening this bug.
Comment 6 Sven Efftinge CLA 2011-03-26 19:16:10 EDT
thanks for clarifying.
Comment 7 Bernhard Stadler CLA 2016-08-09 10:17:39 EDT
With current Xtext master, the grammars from the example projects produce ambiguity errors when running the MWE2 workflow. However, reordering the alternatives such that the call to EntityReference always comes first helps (there are only warnings left, no errors).

With or without the change, the generated Antlr files differ only in some parentheses and variable names in the Text rule, but semantically they are identical. 

The generated method names in the GrammarAccess differ correspondingly (e.g. getElementsPlainTextParserRuleCall_2_0() vs. getElementsPlainTextParserRuleCall_0_2()) but I don't think it makes sense to try making these identical.

So I suggest closing this as fixed.
Comment 8 Moritz Eysholdt CLA 2016-08-09 10:29:12 EDT
Hi Bernhard,

thank you for looking into this, but I think you're missing the point:

(In reply to Mirko Raner from comment #5)
> The syntax (var+=RuleA|var+=RuleB)+ should be the same as
> var+=(RuleA|RuleB)+, but it isn't.

For all documents the parser should behave the same for (var+=RuleA|var+=RuleB)+ as it does for var+=(RuleA|RuleB)+.

This is not about the order of choices inside the grammatical alternative and also not about the GrammarAccess.
Comment 9 Bernhard Stadler CLA 2016-08-24 03:25:24 EDT
Created attachment 263739 [details]
Reduced example
Comment 10 Bernhard Stadler CLA 2016-08-24 03:26:54 EDT
Created attachment 263740 [details]
Experimental modifications

Hi Moritz,

(In reply to Moritz Eysholdt from comment #8)
> thank you for looking into this, but I think you're missing the point:
> 
> (In reply to Mirko Raner from comment #5)
> > The syntax (var+=RuleA|var+=RuleB)+ should be the same as
> > var+=(RuleA|RuleB)+, but it isn't.
> 
> For all documents the parser should behave the same for
> (var+=RuleA|var+=RuleB)+ as it does for var+=(RuleA|RuleB)+.
> 
> This is not about the order of choices inside the grammatical alternative
> and also not about the GrammarAccess.

No, that's exactly how I understood this - but maybe I wasn't very precise. 
Anyway, what I did miss is the part about enabling backtracking, so only the Antlr grammars could be generated but not the parser. That's why I compared the Antlr grammars and all I saw are differences the IMO are irrelevant (except possibly the GrammarAccess stuff, which is why I mentioned it).

These differences in the Antlr grammar are the same with or without backtracking. 
However, after enabling backtracking, parser code can be generated and the example actually produces no errors in the first case but a grey error marker in the second case as reported, which puzzled me a bit. The message in the problem view is the following: "required (...)+ loop did not match anything at input '&'".

To me the Antlr grammars seem to be equivalent. 
There are only two differences:
1. The local variable names in ruleText (lv_elements_0_0 vs. lv_elements_0_1, lv_elements_1_0 vs. lv_elements_0_2 and lv_elements_2_0 vs. lv_elements_0_3)
2. The parentheses are chosen to be set differenly, namely:

(
  (( {...} lv_elements_0_0=ruleEntityReference {...} ))
  |
  (( {...} lv_elements_1_0=ruleSpecialCharacters {...} ))
  |
  (( {...} lv_elements_2_0=rulePlainText {...} ))
)+

vs.

(
  ((
    {...} lv_elements_0_1=ruleEntityReference {...}
    |
    {...} lv_elements_0_2=ruleSpecialCharacters {...}
    |
    {...} lv_elements_0_3=rulePlainText {...}
  ))
)+

As an experiment, I created a reduced version of the example grammars (attached).
Additionally, I changed the Xtext Antlr Grammar generator to insert one additional pair of parentheses in String ebnf2(Alternatives it, AntlrOptions options, boolean supportActions), and then I got the same error for the first grammar as well, this time with a red error marker, not a grey one as before (also attached). 

The differences in the generated Antlr grammar are minimal - in the first reduced example grammar, except for whitespace, it is only two additional pairs of parentheses, both directly inside another pair of parentheses.
A few superfluous parentheses shouldn't make a difference in meaning, but apparently they do, so to me it seems like this is actually a bug in the used Antlr version's way of handling parentheses in + repetitions.
Comment 11 Bernhard Stadler CLA 2016-08-24 03:30:38 EDT
> A few superfluous parentheses shouldn't make a difference in meaning, but apparently they do, so to me it seems like this is actually a bug in the used Antlr version's way of handling parentheses in + repetitions.

Correction: It seems like this is actually a bug in the used Antlr version's way of handling duplicate parentheses inside alternatives in + repetitions.
Comment 12 Bernhard Stadler CLA 2016-08-24 03:47:36 EDT
I suppose there are good reasons why Xtext is bound to Antlr 3.2.0 and not any newer release of Antlr 3 (breaking stuff etc.). 

So the question that remains for this issue whether we want to work around this bug or not, and if so, whether we want this case of ambiguous grammar to work with backtracking. 
If it is ok for this kind of grammar to fail, it would be easiest to insert that additional pair of parentheses - then parsing would fail in both cases.
 
However, I guess the desired behaviour would be for both grammars to work (which I suppose would bis also safer regarding breaking existing languages), so it would actually be necessary to be more careful about inserting duplicate parentheses.
Comment 13 Bernhard Stadler CLA 2016-08-24 05:10:34 EDT
I made an experimental patch for avoiding double parentheses, and it seems to help.
Comment 14 Bernhard Stadler CLA 2016-08-24 05:15:01 EDT
Created attachment 263746 [details]
EXPERIMENTAL patch for avoiding unnecessary parentheses

(This patch is untested and incomplete)
Comment 15 Eclipse Genie CLA 2016-08-24 10:18:12 EDT
GitHub Pull Request 90 created by [stadlerb]
https://github.com/eclipse/xtext-core/pull/90
Comment 16 Bernhard Stadler CLA 2016-08-24 12:52:44 EDT
Created pull request https://github.com/eclipse/xtext-core/pull/90

I tried carefully not to break anything but before I wrote additional tests, the existing tests in org.eclipse.xtext.tests didn't uncover any bugs although some were present at intermediate stages. So somebody who knows the code well should have a good look before merging.