| Summary: | HTML2TextReader reader get stuck in infinite loop | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | z_Archived | Reporter: | Dave Syer <david_syer> | ||||||||||||
| Component: | Mylyn | Assignee: | George Lindholm <javadev> | ||||||||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||||||||
| Severity: | critical | ||||||||||||||
| Priority: | P2 | CC: | david_syer, javadev | ||||||||||||
| Version: | unspecified | ||||||||||||||
| Target Milestone: | 2.2 | ||||||||||||||
| Hardware: | PC | ||||||||||||||
| OS: | Windows XP | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
Dave Syer
Dave, I would like to verify if this is the same as bug 207384. Please enable the heap status from Window -> Preferences -> General -> Show heap status and report back if the heap keeps growing and shrinking when the background synchronization runs (you can also trigger the synchronization from the task list). I don't think that's the same problem. This one never goes away - the synchronization never finishes and never makes any progress, just ticks away eating CPU. David: if you could get a thread dump to see what's hogging the CPU it would help Steffen debug. See: http://wiki.eclipse.org/Mylyn_Contributor_Reference#Debugging I tried using the JConsole, but couldn't tell which thread was the important one. None looked al that interesting to me. How could I tell which thread dump you need? If you set up a query to jira.codehaus.org and filter for all issues in project "Maven Doxia" you should get the same behaviour - it's 100% reliable for me. I can reproduce the error. It seems that HTML2TextReader enters an endless loop while trying to parse the string "Spurious <?xml version="1.0" encoding="UTF-8"?> in generated <head> section" from issue DOXIA-150. It's actually a bug in SubstitutionTextReader.read(). It's not detecting EOF properly so it never stops trying to read past the last character in this string. Created attachment 83280 [details]
Test case
Simple test case that triggers the loop
I'm getting more CPU spinning with the original query including DOXIA-150 deleted, and I'd like to know if it's the same bug. Is there a way to diagnose this from the jconsole? ALl the threads look like they are RUNNABLE, WAITING or TIMED-WAITING (is that the correct nomenclature, I can't remember?), and I can't tell which one is spinning. Dave, you can run Eclipse with console (i.e. specify java.exe in -vm param in eclipse.ini) and then hit Ctrl-Break to see the thread dump. If you are using Java 1.6, there is also jstack.exe tool that shows thread dumps for the java processes. Created attachment 83290 [details]
Stop loop
Need to test for EOF before any other special tests
Created attachment 83291 [details]
mylyn/context/zip
Dave, I would say it is the same issue. Without my fix, mylyn goes into an infinite loop. With the fix, it works fine against DOXIA. The problem is that HTML2TextReader.computeSubstitution() wasn't dealing properly with EOF in two cases. If the tags <head> or <pre> were present without a closing tag, the loop was created. Created attachment 83298 [details]
Unified patch
Found the real test case. Reworked to use
FutureTask
Created attachment 83299 [details]
mylyn/context/zip
Steffen: please view. Actually, the <pre> tag turned out to be safe, probably by accident :--) Thanks for the great work on this George! I have committed your fix and the test cases with slight modifications: I have removed the Future to make the test case easier to understand. As far as I know JUnit will timeout the test if it hangs. I have also added assertions that check the converted text. It would be great if you could take a quick look at these changes. Steffen, it looks good. I added the Future code after running the test from within Eclipse and finding that the test did not time out, so you may want to double check that JUnit will indeed time out. Thanks for the follow-up. In case we run into stalling tests I'll put it back in. |