| Summary: | [GrammarLang] Support alternatives, groups and keywords in cross references | ||
|---|---|---|---|
| Product: | [Modeling] TMF | Reporter: | Sven Efftinge <sven.efftinge> |
| Component: | Xtext | Assignee: | Project Inbox <tmf.xtext-inbox> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | enhancement | ||
| Priority: | P5 | CC: | moritz.eysholdt, sebastian.zarnekow |
| Version: | 1.0.0 | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Mac OS X - Carbon (unsup.) | ||
| See Also: | https://github.com/eclipse/xtext-core/pull/1413 | ||
| Whiteboard: | v2.22 | ||
|
Description
Sven Efftinge
The bad thing about this notation is that a value converter can't be assigned to '+'|'-'. Eg, no validation, no escaping etc. People will start writing things like feature=[JvmOperation| '+'|ID]. It would have to to be stored in the node model whether "+" or "ID" has been parsed. Currently, only the cross reference is stored. for feature=[JvmOperation| '+'|ID|STRING] on serialization: How to decide which value converter to use? I'd much rather like to see: feature=[JvmOperation|MyComp] MyComp: '+'|ID|STRING; Then, people have to implement a value converter for MyComp and can decide on their own whether it should delegate to ID or STRING. I agree with Moritz. This will lead to problems that were present in the past and fixed by http://git.softeys.de/tmf.xtext.git/commit/a6e8d6eefdc0f9e9eb63aeb366d33275f53bd18c However, keywords should be supported as a cross reference terminal. see also bug 298831 In the case of Xbase I had to introduce eight additional rules just to work around this issue. And none of those rules need a value converter.
Just allowing groups and alternatives containing keywords only would have helped in that scenario.
How does the serializer decide which one to take in the following case:
MyRule :
foo=(ID|STRING);
if a node model is not present?
(In reply to comment #4) > How does the serializer decide which one to take in the following case: > > MyRule : > foo=(ID|STRING); > > if a node model is not present? The serializer may use backtracking to make this decision and the node model contains enough information about the grammar so the parser can choose which value converter to use. This is not the case for alternatives in cross references as the assigned grammar element is the cross reference instead of the terminal while it is the terminal in case of a simple assignment. This is not optimal but the current structure of the node model. Just out of curiosity: Why did you need to use so many additional rules? TypeRef: type =[Type|ID] | type = [Type|STRING]; should work without introducing additional rules (if we fixed the keyword as terminal in cross refs issue). Why did you need to use Groups in Cross references. They imply to allow to use multiple tokens which may have hidden tokens etc between them which in turn need to be filtered but if you simple remove them, the new sequence may lead to another keyword. (In reply to comment #4) > In the case of Xbase I had to introduce eight additional rules just to work > around this issue. And none of those rules need a value converter. > Just allowing groups and alternatives containing keywords only would have > helped in that scenario. > > How does the serializer decide which one to take in the following case: > > MyRule : > foo=(ID|STRING); > > if a node model is not present? It takes the first rule for which the value converter doesn't throw an exception. For this, ITokenSerializer.IValueSerializer.isValid(EObject, RuleCall, Object, IErrorAcceptor) is called. Ideally, this process should be part of the backtracking algorithm. With the current implementation, however, it isn't. I.e. if there is an alternative within an assignment, the first option that can be entered is entered and succeeds - or not. What I don't like about scenarios like foo=(ID|STRING) is the huge intersection of ID's and STRING's value space. Actually, ID's value space is a subset of STRING's value space. Consequently, if the serializer tries STRING first, it will always succeed and ID will never be used. The grammar's author will have to care about the order of options in the alternative. (In reply to comment #5) > (In reply to comment #4) > > How does the serializer decide which one to take in the following case: > > > > MyRule : > > foo=(ID|STRING); > > > > if a node model is not present? > > The serializer may use backtracking to make this decision and the node model > contains enough information about the grammar so the parser can choose which > value converter to use. The serializer is supposed to be used for models without any node model. And in fact it doesn't use the node model but uses what Moritz describes. At least that's what my test also show. That approach could be used for cross references as well and would be equally wrong :-) I think in case of a String it would be correct to use the lexer to tokenize the value and depending on the outcome check which way the parser went. But since arbitrary value converters could have made arbitrary changes this is also not bullet proof. But a bit closer. I think one could use the same 'wrong' algorithm for cross references since all the serializer relies on is a piece of text (no matter it's a cross reference or a string value) and a corresponding grammar element. > Just out of curiosity: Why did you need to use so many additional rules? > > TypeRef: type =[Type|ID] | type = [Type|STRING]; Because that is equally noisy, resp. even worst in case you have more than two alternatives, like we do have for some infix operators. > Why did you need to use Groups in Cross > references. I don't need them. (In reply to comment #6) > (In reply to comment #4) > > In the case of Xbase I had to introduce eight additional rules just to work > > around this issue. And none of those rules need a value converter. > > Just allowing groups and alternatives containing keywords only would have > > helped in that scenario. > > > > How does the serializer decide which one to take in the following case: > > > > MyRule : > > foo=(ID|STRING); > > > > if a node model is not present? > > It takes the first rule for which the value converter doesn't throw an > exception. For this, ITokenSerializer.IValueSerializer.isValid(EObject, > RuleCall, Object, IErrorAcceptor) is called. > > Ideally, this process should be part of the backtracking algorithm. With the > current implementation, however, it isn't. I.e. if there is an alternative > within an assignment, the first option that can be entered is entered and > succeeds - or not. As I mentioned in the other comment. It should respect the context less scanning. I.e. it should check the terminal rules in the order they are defined in the lexer not the order they are defined in the alternative. > > What I don't like about scenarios like foo=(ID|STRING) is the huge intersection > of ID's and STRING's value space. Actually, ID's value space is a subset of > STRING's value space. Consequently, if the serializer tries STRING first, it > will always succeed and ID will never be used. The grammar's author will have > to care about the order of options in the alternative. That would be solved, because ID is defined first. But that is just a coincidence... I think we have a more general problem here which is not just about cross reference. The current strategy is a bit asymmetric, since we have a not fully working solution for common assignments and disallow use of complex alternatives a.t.l. in cross references. If we had the possibility to suppress warnings I'ld say we should allow more stuff but give a warning as a hint. That if you need to control value conversion you'll have to introduce a new rule. Maybe this needs some additional offline discussion :-) (In reply to comment #7) > The serializer is supposed to be used for models without any node model. And in > fact it doesn't use the node model but uses what Moritz describes. At least > that's what my test also show. > That approach could be used for cross references as well and would be equally > wrong :-) I totally agree that it would be better if the serializer would use backtracking for this decision. (In reply to comment #8) > As I mentioned in the other comment. It should respect the context less > scanning. I.e. > it should check the terminal rules in the order they are defined in the lexer > not the order they are defined in the alternative. If the serializer checks the terminal rules in the order in which they are defined it will come to different conclusions as the lexer. This is because the lexer matches tokens based on their "string" representation and the serializer matches tokens based on their "value" representation. For example, the lexer will only match strings starting with a quotation mark as STRING but the serializer will match all values as STRING since all values become a valid STRING-token when surrounded by quotations marks. Furthermore, this strategy leaves open how to prioritize when data type rules are involved. The node model does not contain the information about the actual grammar element that leads to the leaf node in a cross reference. This piece of information is missing. That's why we cannot infer the right value converter for alternatives in cross refs. However leaf nodes in assignments contain this information. Assignment -> RuleCall (LeafNode) vs Assignment -> CrossReference (LeafNode) It is currenlty not possible to implement this symetrically for foo=(ID|STRING) and foo=[Type|(ID|STRING)]. (In reply to comment #10) > The node model does not contain the information about the actual grammar > element that leads to the leaf node in a cross reference. This piece of > information is missing. That's why we cannot infer the right value converter > for alternatives in cross refs. However leaf nodes in assignments contain this > information. > > Assignment -> RuleCall (LeafNode) vs Assignment -> CrossReference (LeafNode) > > It is currenlty not possible to implement this symetrically for foo=(ID|STRING) > and foo=[Type|(ID|STRING)]. I'd consider this a bug in the node model (or the node model constructor) Keywords can now be used in cross references. Marking this as fixed. Remaining work includes potentially improved user feedback when alternatives are tried as cross ref terminals and respective quickfixes. These are out-of-scope here. |