Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 318482

Summary: [GrammarLang] Support alternatives, groups and keywords in cross references
Product: [Modeling] TMF Reporter: Sven Efftinge <sven.efftinge>
Component: XtextAssignee: Project Inbox <tmf.xtext-inbox>
Status: RESOLVED FIXED QA Contact:
Severity: enhancement    
Priority: P5 CC: moritz.eysholdt, sebastian.zarnekow
Version: 1.0.0   
Target Milestone: ---   
Hardware: PC   
OS: Mac OS X - Carbon (unsup.)   
See Also: https://github.com/eclipse/xtext-core/pull/1413
Whiteboard: v2.22

Description Sven Efftinge CLA 2010-06-30 11:07:58 EDT
I'ld like to write something like this:

feature=[JvmOperation| '+'|'-']

which is syntactically not supported.

Also single keywords are supported by the parser but not by the generator.
It prints : crossrefEbnf is not supported for AbstractElement. 
Which comes from  AntlrGrammar.xpt:

«DEFINE crossrefEbnf(CrossReference ref) FOR AbstractElement-»
«ERROR "crossrefEbnf is not supported for AbstractElement"»
«ENDDEFINE»
Comment 1 Moritz Eysholdt CLA 2010-06-30 11:20:01 EDT
The bad thing about this notation is that a value converter can't be assigned to '+'|'-'. Eg, no validation, no escaping etc.

People will start writing things like feature=[JvmOperation| '+'|ID]. 

It would have to to be stored in the node model whether "+" or "ID" has been parsed. Currently, only the cross reference is stored.

for feature=[JvmOperation| '+'|ID|STRING] on serialization: How to decide which value converter to use?

I'd much rather like to see:

feature=[JvmOperation|MyComp]
MyComp:  '+'|ID|STRING;

Then, people have to implement a value converter for MyComp and can decide on their own whether it should delegate to ID or STRING.
Comment 2 Sebastian Zarnekow CLA 2010-06-30 17:37:31 EDT
I agree with Moritz. This will lead to problems that were present in the past and fixed by http://git.softeys.de/tmf.xtext.git/commit/a6e8d6eefdc0f9e9eb63aeb366d33275f53bd18c

However, keywords should be supported as a cross reference terminal.
Comment 3 Sebastian Zarnekow CLA 2010-06-30 17:37:55 EDT
see also bug 298831
Comment 4 Sven Efftinge CLA 2010-07-01 00:12:11 EDT
In the case of Xbase I had to introduce eight additional rules just to work around this issue. And none of those rules need a value converter.
Just allowing groups and alternatives containing keywords only would have helped in that scenario.

How does the serializer decide which one to take in the following case:

MyRule :
    foo=(ID|STRING);

if a node model is not present?
Comment 5 Sebastian Zarnekow CLA 2010-07-01 01:27:33 EDT
(In reply to comment #4)
> How does the serializer decide which one to take in the following case:
> 
> MyRule :
>     foo=(ID|STRING);
> 
> if a node model is not present?

The serializer may use backtracking to make this decision and the node model contains enough information about the grammar so the parser can choose which value converter to use. This is not the case for alternatives in cross references as the assigned grammar element is the cross reference instead of the terminal while it is the terminal in case of a simple assignment. This is not optimal but the current structure of the node model.

Just out of curiosity: Why did you need to use so many additional rules? 

TypeRef: type =[Type|ID] | type = [Type|STRING];

should work without introducing additional rules (if we fixed the keyword as terminal in cross refs issue). Why did you need to use Groups in Cross references. They imply to allow to use multiple tokens which may have hidden tokens etc between them which in turn need to be filtered but if you simple remove them, the new sequence may lead to another keyword.
Comment 6 Moritz Eysholdt CLA 2010-07-01 04:56:01 EDT
(In reply to comment #4)
> In the case of Xbase I had to introduce eight additional rules just to work
> around this issue. And none of those rules need a value converter.
> Just allowing groups and alternatives containing keywords only would have
> helped in that scenario.
> 
> How does the serializer decide which one to take in the following case:
> 
> MyRule :
>     foo=(ID|STRING);
> 
> if a node model is not present?

It takes the first rule for which the value converter doesn't throw an exception. For this, ITokenSerializer.IValueSerializer.isValid(EObject, RuleCall, Object, IErrorAcceptor) is called.

Ideally, this process should be part of the backtracking algorithm. With the current implementation, however, it isn't. I.e. if there is an alternative within an assignment, the first option that can be entered is entered and succeeds - or not.

What I don't like about scenarios like foo=(ID|STRING) is the huge intersection of ID's and STRING's value space. Actually, ID's value space is a subset of STRING's value space. Consequently, if the serializer tries STRING first, it will always succeed and ID will never be used. The grammar's author will have to care about the order of options in the alternative.
Comment 7 Sven Efftinge CLA 2010-07-01 12:41:29 EDT
(In reply to comment #5)
> (In reply to comment #4)
> > How does the serializer decide which one to take in the following case:
> > 
> > MyRule :
> >     foo=(ID|STRING);
> > 
> > if a node model is not present?
> 
> The serializer may use backtracking to make this decision and the node model
> contains enough information about the grammar so the parser can choose which
> value converter to use.

The serializer is supposed to be used for models without any node model. And in fact it doesn't use the node model but uses what Moritz describes. At least that's what my test also show.
That approach could be used for cross references as well and would be equally wrong :-)

I think in case of a String it would be correct to use the lexer to tokenize the value and depending on the outcome check which way the parser went. But since arbitrary value converters could have made arbitrary changes this is also not bullet proof. But a bit closer.

I think one could use the same 'wrong' algorithm for cross references since all the serializer relies on is a piece of text (no matter it's a cross reference or a string value) and a corresponding grammar element. 

> Just out of curiosity: Why did you need to use so many additional rules? 
> 
> TypeRef: type =[Type|ID] | type = [Type|STRING];

Because that is equally noisy, resp. even worst in case you have more than two alternatives, like we do have for some infix operators.

> Why did you need to use Groups in Cross
> references. 

I don't need them.
Comment 8 Sven Efftinge CLA 2010-07-01 12:47:11 EDT
(In reply to comment #6)
> (In reply to comment #4)
> > In the case of Xbase I had to introduce eight additional rules just to work
> > around this issue. And none of those rules need a value converter.
> > Just allowing groups and alternatives containing keywords only would have
> > helped in that scenario.
> > 
> > How does the serializer decide which one to take in the following case:
> > 
> > MyRule :
> >     foo=(ID|STRING);
> > 
> > if a node model is not present?
> 
> It takes the first rule for which the value converter doesn't throw an
> exception. For this, ITokenSerializer.IValueSerializer.isValid(EObject,
> RuleCall, Object, IErrorAcceptor) is called.
> 
> Ideally, this process should be part of the backtracking algorithm. With the
> current implementation, however, it isn't. I.e. if there is an alternative
> within an assignment, the first option that can be entered is entered and
> succeeds - or not.

As I mentioned in the other comment. It should respect the context less scanning. I.e.
it should check the terminal rules in the order they are defined in the lexer not the order they are defined in the alternative.

> 
> What I don't like about scenarios like foo=(ID|STRING) is the huge intersection
> of ID's and STRING's value space. Actually, ID's value space is a subset of
> STRING's value space. Consequently, if the serializer tries STRING first, it
> will always succeed and ID will never be used. The grammar's author will have
> to care about the order of options in the alternative.

That would be solved, because ID is defined first. But that is just a coincidence...

I think we have a more general problem here which is not just about cross reference.
The current strategy is a bit asymmetric, since we have a not fully working solution for common assignments and disallow use of complex alternatives a.t.l. in cross references.

If we had the possibility to suppress warnings I'ld say we should allow more stuff but give a warning as a hint. That if you need to control value conversion you'll have to introduce a new rule.

Maybe this needs some additional offline discussion :-)
Comment 9 Moritz Eysholdt CLA 2010-07-02 05:18:52 EDT
(In reply to comment #7)

> The serializer is supposed to be used for models without any node model. And in
> fact it doesn't use the node model but uses what Moritz describes. At least
> that's what my test also show.
> That approach could be used for cross references as well and would be equally
> wrong :-)

I totally agree that it would be better if the serializer would use backtracking for this decision.

(In reply to comment #8)
> As I mentioned in the other comment. It should respect the context less
> scanning. I.e.
> it should check the terminal rules in the order they are defined in the lexer
> not the order they are defined in the alternative.

If the serializer checks the terminal rules in the order in which they are defined it will come to different conclusions as the lexer. This is because the lexer matches tokens based on their "string" representation and the serializer matches tokens based on their "value" representation.

For example, the lexer will only match strings starting with a quotation mark as STRING but the serializer will match all values as STRING since all values become a valid STRING-token when surrounded by quotations marks.

Furthermore, this strategy leaves open how to prioritize when data type rules are involved.
Comment 10 Sebastian Zarnekow CLA 2010-10-05 09:06:57 EDT
The node model does not contain the information about the actual grammar element that leads to the leaf node in a cross reference. This piece of information is missing. That's why we cannot infer the right value converter for alternatives in cross refs. However leaf nodes in assignments contain this information.

Assignment -> RuleCall (LeafNode) vs Assignment -> CrossReference (LeafNode)

It is currenlty not possible to implement this symetrically for foo=(ID|STRING) and foo=[Type|(ID|STRING)].
Comment 11 Moritz Eysholdt CLA 2010-10-05 15:26:33 EDT
(In reply to comment #10)
> The node model does not contain the information about the actual grammar
> element that leads to the leaf node in a cross reference. This piece of
> information is missing. That's why we cannot infer the right value converter
> for alternatives in cross refs. However leaf nodes in assignments contain this
> information.
> 
> Assignment -> RuleCall (LeafNode) vs Assignment -> CrossReference (LeafNode)
> 
> It is currenlty not possible to implement this symetrically for foo=(ID|STRING)
> and foo=[Type|(ID|STRING)].

I'd consider this a bug in the node model (or the node model constructor)
Comment 12 Sebastian Zarnekow CLA 2020-03-13 04:34:19 EDT
Keywords can now be used in cross references.

Marking this as fixed. Remaining work includes potentially improved user feedback when alternatives are tried as cross ref terminals and respective quickfixes. These are out-of-scope here.