Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 207567

Summary: HTML2TextReader reader get stuck in infinite loop
Product: z_Archived Reporter: Dave Syer <david_syer>
Component: MylynAssignee: George Lindholm <javadev>
Status: RESOLVED FIXED QA Contact:
Severity: critical    
Priority: P2 CC: david_syer, javadev
Version: unspecified   
Target Milestone: 2.2   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Attachments:
Description Flags
Test case
none
Stop loop
none
mylyn/context/zip
none
Unified patch
none
mylyn/context/zip none

Description Dave Syer CLA 2007-10-26 07:20:29 EDT
Eclipse keeps going into overdrive - grabbing all available CPU, making my PC fan spin.  It happens intermittently and always seems to be in the middle of a "Synchronizing queries for Codehaus" (the codehaus JIRA).  It's in a background thread, so I can continue to work on the other processor.  But it looks like a bug.

There are no errors in the Error Log.  But the progress view shows 

Synchronizing 1/2: Doxia All: 27/170 DOXIA-150 Spurious <?xml version="1.0" encodeing"UTF-8"?> in generated <head> section

I have to kill Eclipse to get back my CPU.
Comment 1 Steffen Pingel CLA 2007-10-26 13:01:34 EDT
Dave, I would like to verify if this is the same as bug 207384. Please enable the heap status from Window -> Preferences -> General -> Show heap status and report back if the heap keeps growing and shrinking when the background synchronization runs (you can also trigger the synchronization from the task list).
Comment 2 Dave Syer CLA 2007-10-26 14:21:03 EDT
I don't think that's the same problem.  This one never goes away - the synchronization never finishes and never makes any progress, just ticks away eating CPU.
Comment 3 Mik Kersten CLA 2007-11-01 18:06:04 EDT
David: if you could get a thread dump to see what's hogging the CPU it would help Steffen debug.  See: http://wiki.eclipse.org/Mylyn_Contributor_Reference#Debugging
Comment 4 Dave Syer CLA 2007-11-02 03:19:56 EDT
I tried using the JConsole, but couldn't tell which thread was the important one.  None looked al that interesting to me.  How could I tell which thread dump you need?

If you set up a query to jira.codehaus.org and filter for all issues in project "Maven Doxia" you should get the same behaviour - it's 100% reliable for me.
Comment 5 Steffen Pingel CLA 2007-11-02 03:43:20 EDT
I can reproduce the error. It seems that HTML2TextReader enters an endless loop while trying to parse the string "Spurious <?xml version="1.0" encoding="UTF-8"?> in generated <head> section" from issue DOXIA-150.
Comment 6 George Lindholm CLA 2007-11-19 15:55:24 EST
It's actually a bug in SubstitutionTextReader.read(). It's not detecting EOF properly so it never stops trying to read past the last
character in this string.
Comment 7 George Lindholm CLA 2007-11-19 15:59:33 EST
Created attachment 83280 [details]
Test case

Simple test case that triggers the loop
Comment 8 Dave Syer CLA 2007-11-19 16:44:03 EST
I'm getting more CPU spinning with the original query including DOXIA-150 deleted, and I'd like to know if it's the same bug.  Is there a way to diagnose this from the jconsole?  ALl the threads look like they are RUNNABLE, WAITING or TIMED-WAITING (is that the correct nomenclature, I can't remember?), and I can't tell which one is spinning.
Comment 9 Eugene Kuleshov CLA 2007-11-19 17:19:17 EST
Dave, you can run Eclipse with console (i.e. specify java.exe in -vm param in eclipse.ini) and then hit Ctrl-Break to see the thread dump. If you are using Java 1.6, there is also jstack.exe tool that shows thread dumps for the java processes.
Comment 10 George Lindholm CLA 2007-11-19 19:34:05 EST
Created attachment 83290 [details]
Stop loop

Need to test for EOF before any other special tests
Comment 11 George Lindholm CLA 2007-11-19 19:34:08 EST
Created attachment 83291 [details]
mylyn/context/zip
Comment 12 George Lindholm CLA 2007-11-19 20:36:25 EST
Dave, I would say it is the same issue. Without my fix, mylyn goes into an infinite loop. With the fix, it works fine against DOXIA.

The problem is that HTML2TextReader.computeSubstitution() wasn't dealing properly with EOF in two cases.
If the tags <head> or <pre> were present without a closing tag, the loop was created.
Comment 13 George Lindholm CLA 2007-11-19 23:01:56 EST
Created attachment 83298 [details]
Unified patch

Found the real test case. Reworked to use
FutureTask
Comment 14 George Lindholm CLA 2007-11-19 23:01:58 EST
Created attachment 83299 [details]
mylyn/context/zip
Comment 15 Mik Kersten CLA 2007-11-20 12:43:19 EST
Steffen: please view.
Comment 16 George Lindholm CLA 2007-11-20 14:14:50 EST
Actually, the <pre> tag turned out to be safe, probably by accident :--)
Comment 17 Steffen Pingel CLA 2007-11-20 14:35:54 EST
Thanks for the great work on this George! I have committed your fix and the test cases with slight modifications: I have removed the Future to make the test case easier to understand. As far as I know JUnit will timeout the test if it hangs. I have also added assertions that check the converted text. It would be great if you could take a quick look at these changes.
Comment 18 George Lindholm CLA 2007-11-20 14:58:57 EST
Steffen, it looks good.

I added the Future code after running the test from within Eclipse and finding that the test did not time out, so you may
want to double check that JUnit will indeed time out.
Comment 19 Steffen Pingel CLA 2007-11-20 19:05:28 EST
Thanks for the follow-up. In case we run into stalling tests I'll put it back in.