Community
Participate
Working Groups
Build Identifier: Neither RuleBasedPartitionScanner or RuleBasedScanner document an important contract with regards to it's rule usage: that each rule should return a different success token. Not knowing this made write a buggy extension of RuleBasedPartitionScanner where the bug was only clear when I looked at the code of RuleBasedPartitionScanner. Let me explain in more detail. Supposed you have this extension: public class DeePartitionScanner extends RuleBasedPartitionScanner { private static final char NO_ESCAPE_CHAR = (char) -1; public DeePartitionScanner() { IToken tkRawString = new Token(DeePartitions.DEE_RAW_STRING); List<IPredicateRule> rules = new ArrayList<IPredicateRule>(); rules.add(new MultiLineRule("`", "`", tkRawString, NO_ESCAPE_CHAR, true)); rules.add(new MultiLineRule("r\"", "\"", tkRawString, NO_ESCAPE_CHAR, true)); setPredicateRules(rules.toArray(new IPredicateRule[rules.size()])); } } when this is used by a FastPartitioner (for example), the partitioning will be broken if something is typed in a line inside a DEE_RAW_STRING partition (except if it the first line) that was created with the "r\"", "\"" rule. This is because the FastPartitioner will request a partial parse (resume) to the RuleBasedPartitionScanner, and this one in turn will look at the contentType to determine which rule to use to resume the scan. Since there are actually two rules for the same contentType, it will simply use the first one (the "`", "`" one), which will break the rest of the partitioning because it will be looking for the "`" terminator instead of "\"". Reproducible: Always
> Since there are actually two > rules for the same contentType, it will simply use the first one The Javadoc of org.eclipse.jface.text.rules.RuleBasedScanner already mentions this: " The scanner is used to get the next token by evaluating its rule in sequence until * one is successful. " Note that you can have more than one rule for the same token but of course they must not be conflicting.
(In reply to comment #1) > > Since there are actually two > > rules for the same contentType, it will simply use the first one > The Javadoc of org.eclipse.jface.text.rules.RuleBasedScanner already mentions > this: > " > The scanner is used to get the next token by evaluating its rule in sequence > until > * one is successful. > " > > Note that you can have more than one rule for the same token but of course they > must not be conflicting. The documentation of RuleBasedScanner looks okay, the issue is just in RuleBasedPartitionScanner it seems (thus I want to correct my first statement, where I said "Neither RuleBasedPartitionScanner or RuleBasedScanner document an important contract" [...]). RuleBasedScanner does say rules are evaluated in sequence, and that's fine because it's what it does, and also because RuleBasedScanner always attempts to scan with each rule from the "beginning of the rule", so to speak. (the rule does a full scane) The problem is in RuleBasedPartitionScanner because it is in this class that it becomes aware of the concept of partitions (content types) and also where it introduces the possibility of starting a scan from the middle of a partition (IPartitionTokenScanner.setPartialRange() ), and consequently of using a rule from the "middle of the rule", so to speak. In other words the rule does a partial scan (resume), instead of a full one, something which does not happen with RuleBasedScanner. (see org.eclipse.jface.text.rules.RuleBasedPartitionScanner.nextToken()) The problem is that it is not documented which rule the RuleBasedPartitionScanner first attempts to do the partial scan with. It is not each rule in succession as in RuleBasedScanner, but rather the first rule that parses a token of the same partition (content type) currently under scan.
> The problem is that it is not documented which rule the > RuleBasedPartitionScanner first attempts to do the partial scan with. It is not > each rule in succession as in RuleBasedScanner, but rather the first rule that > parses a token of the same partition (content type) currently under scan. Agreed. I've added this: * <p> * If a partial range is set (see {@link #setPartialRange(IDocument, int, int, String, int)} with * content type that is not <code>null</code> then this scanner will first try the rules that match * the given content type. * </p>