Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 373991

Summary: Problem with markdown grammar
Product: [ECD] Orion Reporter: Szymon Brandys <Szymon.Brandys>
Component: ClientAssignee: Mark Macdonald <mamacdon>
Status: RESOLVED WONTFIX QA Contact:
Severity: normal    
Priority: P3 CC: mamacdon
Version: 0.5   
Target Milestone: ---   
Hardware: PC   
OS: Windows 7   
Whiteboard:
Attachments:
Description Flags
screenshot none

Description Szymon Brandys CLA 2012-03-12 13:18:09 EDT
Steps:
1) Install http://szbra.github.com/example2/markdownPlugin.html and http://szbra.github.com/example3/markdownPlugin.html
2) Clone git@github.com:szbra/szbra.github.com.git
3) Try to open /szbra.github.com/markdown-sample/markdown-js.md

You will see "Unresponsive script" dialog.

If you change MarkdownGrammar.js and remove:

	{
				"begin": "^$",
				"end": "^$",
				"patterns": [
					{ "match": "> .+", "name": "entity.name.tag.doctype.html" },
					{ "match": "( {3,}|\t).+", "name": "entity.name.tag.doctype.html" }
				]
	},
			
you can open markdown files again.
Comment 1 Mark Macdonald CLA 2012-03-13 02:10:08 EDT
I have a small fix for this -- will test more tomorrow.
Comment 2 Mark Macdonald CLA 2012-03-13 23:18:23 EDT
(In reply to comment #1)
> I have a small fix for this -- will test more tomorrow.

So the "fix" didn't fix the problem. I spent some time reviewing the TextMate manual, and I think the code is working as designed. The problem is with the rule given in Comment 0, which defines an infinite loop when applied to empty lines.

Consider this input:
-----------------------------------------
foo

bar
-----------------------------------------

The procedure is:
1. First line: no matches. Continue to second line.
2. The "begin" rule matches the empty line. The parser position does not advance, because the match ("^$") captures no characters.
   At this point, we enter the begin..end rule's context, so the rules in "patterns" become active, along with the "end" rule.
3. The "end" rule matches the empty line. Again, the parser position does not advance.
   Now we exit the begin..end rule context.
4. Since we're at the top-level context, the "begin" rule matches again. Go to step 2, repeat ad infinitum.

Perhaps these cases can be detected by analyzing the parser state. But recovery would be limited to terminating with an exception. There's no way to correctly apply a grammar like this. (FWIW, TextMate also hangs on this input when given a similar grammar.)

Szymon, we're going to have to find a different technique for recognizing markdown blocks.
Comment 3 Szymon Brandys CLA 2012-03-14 05:38:55 EDT
This is not INVALID bug. As I understand we are not going to fix it due to Textmate limitations.
Comment 4 Szymon Brandys CLA 2012-03-14 06:11:26 EDT
Markdown list may look like this:

<empty line>
* item1

* item2
item2 second line
 item2 third line
 
* item3
item3 second line
 
 item3 third line
 
  item3 fourth line
<empty space>
some text

So the list block starts with <empty space> following by '*' in the next line and ends with <empty line> followed by a line starting with a non-whitespace character.

I was not able to use two-line rules, so Mark advised to use:

"begin": "^$",
"end": "^$",

and then pattern inside. "^$" does not work, but this is not the problem. The real problem is how to describe the Markdown list block. As I wrote above the begin and end rule I would like to use would have to consider two lines instead of just one.
Comment 5 Mark Macdonald CLA 2012-03-15 01:43:03 EDT
(In reply to comment #4)
> I was not able to use two-line rules, so Mark advised to use:

Yeah, I made that suggestion without working through the parsing implications, sorry :(

I think we can do this by breaking it into 2 rules: an outer one that identifies the start of the list, and an inner one that finds the paragraph breaks. We can use two nested begin..end rules and lookaheads.

For a bulleted list, it would look like this:

// outer rule
{ begin: '^ {0,3}([*-+)(?=\\s)',
  end:   '^(?=\\S)',
  name:  'markdown.list',
  patterns: [
    // inner rule
    {  begin: '\\s+(?=\\S)',
       end:   '^\\s*$'
       name:  'markdown.list.paragraph'
    }
  ]
}

- The outer rule matches the list bullet. It ends by seeing a line starting with non-whitespace characters (which terminates the list, like "some text" in your example).

- The inner rule matches starting from a space (which can either be on the same line as the bullet like " item2"; or at the beginning of a new paragraph like " item3 third line"). It ends by matching a blank line.

Whenever the inner rule ends, it's because a blank line was encountered. At this point either the inner rule's begin will match (indicating the list continues), or the outer rule's end will match (indicating the list terminates due to a blank line followed by non-whitespace characters).

I was able to do an OK job highlighting with a grammar based on this approach. I got the idea from a Markdown grammar that I found on Github.
Comment 6 Mark Macdonald CLA 2012-03-15 01:45:19 EDT
Created attachment 212696 [details]
screenshot

Here is a screenshot showing how far I got. (The colors are for demonstration -- I edited the theme to show the different markdown structures that are recognized. You'll want to change the rule names, so that it works with the default theme.)

The grammar I used is here: https://github.com/mamacdon/szbra.github.com/commit/6ad43d749d431a4663d57e0cd379f7d3175bbc20#diff-0
It's only a starting point -- I didn't handle HTML tags and a bunch of other MD features.

Hope this helps
Comment 7 John Arthorne CLA 2015-05-05 15:46:19 EDT
Closing as part of a mass clean up of inactive bugs. Please reopen if this problem still occurs or is relevant to you. For more details see:


https://dev.eclipse.org/mhonarc/lists/orion-dev/msg03444.html
Comment 8 John Arthorne CLA 2015-05-05 16:00:10 EDT
Closing as part of a mass clean up of inactive bugs. Please reopen if this problem still occurs or is relevant to you. For more details see:


https://dev.eclipse.org/mhonarc/lists/orion-dev/msg03444.html