Community
Participate
Working Groups
Build Identifier: 3.6.1 (Helios SR1) See http://www.eclipse.org/forums/index.php?t=msg&th=199125#msg_num_6 for details. Reproducible: Always Steps to Reproduce: To reproduce, run the unit tests in the attached projects.
Created attachment 182090 [details] A zip file containing the projects that exhibit the described behavior. In the attached project GrammarATest will pass and GrammarBTest will fail. The two grammars only differ in the "Text" production.
This is a parser related issue. To summarize your tests: the string ------ This & That ------ can be matched by ------ Text: (elements+=EntityReference|elements+=SpecialCharacters|elements+=PlainText)+;EntityReference: AMPERSAND entity=IDENTIFIER SEMICOLON; PlainText: {PlainText} (PCDATA|INDEX|COLON|DASH|DOT|SEMICOLON|AMPERSAND|NameOrKeyword|'='|'['|']'|'\\'); terminal AMPERSAND: '&'; ------ but when changing "Text" to ------ Text: elements+=(EntityReference|SpecialCharacters|PlainText)+; ------ the following error is raised: "required (...)+ loop did not match anything at input '&'" Mirko, can you turn off backtracking in your grammar? It's "evil" because it hides syntactic ambiguities. In your case, the string "& id ;" can be parsed as PlainText as well as as EntityReference. However, this doesn't explain why the one grammar "works" and the other one doesn't.
Thanks for looking into this, Moritz! I was aware that parsing of "& id ;" is ambiguous, and I actually enabled backtracking on purpose, as a quick and easy way to resolve this particular ambiguity (and a couple other similar ones). The goal is that "& id ;" gets parsed as an EntityReference, but "& id <something other than ';'>" gets parsed as PlainText. I'll check my original grammar to see whether I can resolve the ambiguities without backtracking, but luckily the work-around of using multiple elements+=... assignments works for now.
closing this due to the last comments.
There might be a misunderstanding here. This problem still persists and is not fixed yet. The syntax (var+=RuleA|var+=RuleB)+ should be the same as var+=(RuleA|RuleB)+, but it isn't. As Moritz summarized: > However, this doesn't explain why the one grammar "works" and the other one > doesn't. So, whereas there is an easy work-around, Xtext doesn't really behave the way it should, and I think the underlying problem should be investigated and fixed. I suggest reopening this bug.
thanks for clarifying.
With current Xtext master, the grammars from the example projects produce ambiguity errors when running the MWE2 workflow. However, reordering the alternatives such that the call to EntityReference always comes first helps (there are only warnings left, no errors). With or without the change, the generated Antlr files differ only in some parentheses and variable names in the Text rule, but semantically they are identical. The generated method names in the GrammarAccess differ correspondingly (e.g. getElementsPlainTextParserRuleCall_2_0() vs. getElementsPlainTextParserRuleCall_0_2()) but I don't think it makes sense to try making these identical. So I suggest closing this as fixed.
Hi Bernhard, thank you for looking into this, but I think you're missing the point: (In reply to Mirko Raner from comment #5) > The syntax (var+=RuleA|var+=RuleB)+ should be the same as > var+=(RuleA|RuleB)+, but it isn't. For all documents the parser should behave the same for (var+=RuleA|var+=RuleB)+ as it does for var+=(RuleA|RuleB)+. This is not about the order of choices inside the grammatical alternative and also not about the GrammarAccess.
Created attachment 263739 [details] Reduced example
Created attachment 263740 [details] Experimental modifications Hi Moritz, (In reply to Moritz Eysholdt from comment #8) > thank you for looking into this, but I think you're missing the point: > > (In reply to Mirko Raner from comment #5) > > The syntax (var+=RuleA|var+=RuleB)+ should be the same as > > var+=(RuleA|RuleB)+, but it isn't. > > For all documents the parser should behave the same for > (var+=RuleA|var+=RuleB)+ as it does for var+=(RuleA|RuleB)+. > > This is not about the order of choices inside the grammatical alternative > and also not about the GrammarAccess. No, that's exactly how I understood this - but maybe I wasn't very precise. Anyway, what I did miss is the part about enabling backtracking, so only the Antlr grammars could be generated but not the parser. That's why I compared the Antlr grammars and all I saw are differences the IMO are irrelevant (except possibly the GrammarAccess stuff, which is why I mentioned it). These differences in the Antlr grammar are the same with or without backtracking. However, after enabling backtracking, parser code can be generated and the example actually produces no errors in the first case but a grey error marker in the second case as reported, which puzzled me a bit. The message in the problem view is the following: "required (...)+ loop did not match anything at input '&'". To me the Antlr grammars seem to be equivalent. There are only two differences: 1. The local variable names in ruleText (lv_elements_0_0 vs. lv_elements_0_1, lv_elements_1_0 vs. lv_elements_0_2 and lv_elements_2_0 vs. lv_elements_0_3) 2. The parentheses are chosen to be set differenly, namely: ( (( {...} lv_elements_0_0=ruleEntityReference {...} )) | (( {...} lv_elements_1_0=ruleSpecialCharacters {...} )) | (( {...} lv_elements_2_0=rulePlainText {...} )) )+ vs. ( (( {...} lv_elements_0_1=ruleEntityReference {...} | {...} lv_elements_0_2=ruleSpecialCharacters {...} | {...} lv_elements_0_3=rulePlainText {...} )) )+ As an experiment, I created a reduced version of the example grammars (attached). Additionally, I changed the Xtext Antlr Grammar generator to insert one additional pair of parentheses in String ebnf2(Alternatives it, AntlrOptions options, boolean supportActions), and then I got the same error for the first grammar as well, this time with a red error marker, not a grey one as before (also attached). The differences in the generated Antlr grammar are minimal - in the first reduced example grammar, except for whitespace, it is only two additional pairs of parentheses, both directly inside another pair of parentheses. A few superfluous parentheses shouldn't make a difference in meaning, but apparently they do, so to me it seems like this is actually a bug in the used Antlr version's way of handling parentheses in + repetitions.
> A few superfluous parentheses shouldn't make a difference in meaning, but apparently they do, so to me it seems like this is actually a bug in the used Antlr version's way of handling parentheses in + repetitions. Correction: It seems like this is actually a bug in the used Antlr version's way of handling duplicate parentheses inside alternatives in + repetitions.
I suppose there are good reasons why Xtext is bound to Antlr 3.2.0 and not any newer release of Antlr 3 (breaking stuff etc.). So the question that remains for this issue whether we want to work around this bug or not, and if so, whether we want this case of ambiguous grammar to work with backtracking. If it is ok for this kind of grammar to fail, it would be easiest to insert that additional pair of parentheses - then parsing would fail in both cases. However, I guess the desired behaviour would be for both grammars to work (which I suppose would bis also safer regarding breaking existing languages), so it would actually be necessary to be more careful about inserting duplicate parentheses.
I made an experimental patch for avoiding double parentheses, and it seems to help.
Created attachment 263746 [details] EXPERIMENTAL patch for avoiding unnecessary parentheses (This patch is untested and incomplete)
GitHub Pull Request 90 created by [stadlerb] https://github.com/eclipse/xtext-core/pull/90
Created pull request https://github.com/eclipse/xtext-core/pull/90 I tried carefully not to break anything but before I wrote additional tests, the existing tests in org.eclipse.xtext.tests didn't uncover any bugs although some were present at intermediate stages. So somebody who knows the code well should have a good look before merging.