| Summary: | Indexer runs forever | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Tools] CDT | Reporter: | Igor Kuralenok <solar> | ||||
| Component: | cdt-indexer | Assignee: | Markus Schorn <mschorn.eclipse> | ||||
| Status: | RESOLVED FIXED | QA Contact: | Markus Schorn <mschorn.eclipse> | ||||
| Severity: | critical | ||||||
| Priority: | P3 | CC: | yevshif | ||||
| Version: | 7.0 | ||||||
| Target Milestone: | 7.0.1 | ||||||
| Hardware: | PC | ||||||
| OS: | All | ||||||
| Whiteboard: | |||||||
| Attachments: |
|
||||||
|
Description
Igor Kuralenok
I don't see an indication for an infinite loop, the parser is trying to read more characters from the input stream. It's clear that it is not a simple infinite loop, but looks like infinite preprocessor interpretation or something. I've let it go for night but still the same stack trace. I don't think that the preprocessor is causing the loop (it requests a new token, which means some sort of progress), maybe there is a loop within org.eclipse.cdt.internal.core.parser.scanner.LazyCharArray.isValidOffset(...), however I cannot see a bug by reviewing the code. Can you run the application in a debugger to see whether this method ever returns? Also it'd be good to know whether the stack-trace on the next day is exactly the same. If you open the Progress View, does it report the same file on the next day? (In reply to comment #3) > Can you run the application > in a debugger to see whether this method ever returns? I'm sorry but I need to get into CDT development first, but right now I have no time for this (few days to vacation :)). > Also it'd be good to know whether the stack-trace on the next day is exactly > the same. If you open the Progress View, does it report the same file on the > next day? Yes the file is the same. Trace is as follows (~36 hours after start). My blind/wild guess (as former IDEA team member :)) is that during preprocessor stage you bump into situation when result of preprocessor stage need to be processed again (like abc -> bca -> abc->etc. with two #defines for abc and bca). It is based on 2 observations: -- no memory is consumed during this process; -- most of process time is system time (io operations). java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.FileDispatcher.read(FileDispatcher.java:26) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:206) at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:144) - locked <000000001ef07938> (a java.lang.Object) at org.eclipse.cdt.internal.core.parser.scanner.FileCharArray.readChunkData(FileCharArray.java:126) at org.eclipse.cdt.internal.core.parser.scanner.LazyCharArray.createChunk(LazyCharArray.java:145) at org.eclipse.cdt.internal.core.parser.scanner.FileCharArray.createChunk(FileCharArray.java:101) at org.eclipse.cdt.internal.core.parser.scanner.LazyCharArray.getChunk(LazyCharArray.java:132) at org.eclipse.cdt.internal.core.parser.scanner.LazyCharArray.getChunkData(LazyCharArray.java:113) at org.eclipse.cdt.internal.core.parser.scanner.LazyCharArray.readUpTo(LazyCharArray.java:88) at org.eclipse.cdt.internal.core.parser.scanner.LazyCharArray.isValidOffset(LazyCharArray.java:65) at org.eclipse.cdt.internal.core.parser.scanner.Lexer.isValidOffset(Lexer.java:114) at org.eclipse.cdt.internal.core.parser.scanner.Lexer.nextCharPhase3(Lexer.java:1018) at org.eclipse.cdt.internal.core.parser.scanner.Lexer.fetchToken(Lexer.java:343) at org.eclipse.cdt.internal.core.parser.scanner.Lexer.nextToken(Lexer.java:171) at org.eclipse.cdt.internal.core.parser.scanner.ScannerContext.nextPPToken(ScannerContext.java:246) at org.eclipse.cdt.internal.core.parser.scanner.CPreprocessor.internalFetchToken(CPreprocessor.java:734) at org.eclipse.cdt.internal.core.parser.scanner.CPreprocessor.fetchToken(CPreprocessor.java:469) at org.eclipse.cdt.internal.core.parser.scanner.CPreprocessor.nextToken(CPreprocessor.java:563) at org.eclipse.cdt.internal.core.dom.parser.AbstractGNUSourceCodeParser.fetchToken(AbstractGNUSourceCodeParser.java:260) at org.eclipse.cdt.internal.core.dom.parser.AbstractGNUSourceCodeParser.nextToken(AbstractGNUSourceCodeParser.java:284) at org.eclipse.cdt.internal.core.dom.parser.AbstractGNUSourceCodeParser.lookaheadToken(AbstractGNUSourceCodeParser.java:294) at org.eclipse.cdt.internal.core.dom.parser.AbstractGNUSourceCodeParser.LA(AbstractGNUSourceCodeParser.java:317) at org.eclipse.cdt.internal.core.dom.parser.AbstractGNUSourceCodeParser.LT(AbstractGNUSourceCodeParser.java:444) at org.eclipse.cdt.internal.core.dom.parser.AbstractGNUSourceCodeParser.skipToSemiOrClosingBrace(AbstractGNUSourceCodeParser.java:734) at org.eclipse.cdt.internal.core.dom.parser.AbstractGNUSourceCodeParser.skipProblemDeclaration(AbstractGNUSourceCodeParser.java:707) at org.eclipse.cdt.internal.core.dom.parser.AbstractGNUSourceCodeParser.problemDeclaration(AbstractGNUSourceCodeParser.java:1697) at org.eclipse.cdt.internal.core.dom.parser.AbstractGNUSourceCodeParser.declarationList(AbstractGNUSourceCodeParser.java:1321) at org.eclipse.cdt.internal.core.dom.parser.AbstractGNUSourceCodeParser.parseTranslationUnit(AbstractGNUSourceCodeParser.java:1253) at org.eclipse.cdt.internal.core.dom.parser.AbstractGNUSourceCodeParser.translationUnit(AbstractGNUSourceCodeParser.java:1248) at org.eclipse.cdt.internal.core.dom.parser.AbstractGNUSourceCodeParser.parse(AbstractGNUSourceCodeParser.java:645) at org.eclipse.cdt.core.dom.parser.AbstractCLikeLanguage.getASTTranslationUnit(AbstractCLikeLanguage.java:143) at org.eclipse.cdt.internal.core.pdom.AbstractIndexerTask.createAST(AbstractIndexerTask.java:286) at org.eclipse.cdt.internal.core.pdom.AbstractIndexerTask.createAST(AbstractIndexerTask.java:259) at org.eclipse.cdt.internal.core.pdom.AbstractIndexerTask.parseFile(AbstractIndexerTask.java:753) at org.eclipse.cdt.internal.core.pdom.AbstractIndexerTask.parseLinkage(AbstractIndexerTask.java:636) at org.eclipse.cdt.internal.core.pdom.AbstractIndexerTask.runTask(AbstractIndexerTask.java:345) at org.eclipse.cdt.internal.core.pdom.indexer.PDOMIndexerTask.run(PDOMIndexerTask.java:127) at org.eclipse.cdt.internal.core.pdom.indexer.PDOMRebuildTask.run(PDOMRebuildTask.java:84) at org.eclipse.cdt.internal.core.pdom.PDOMIndexerJob.run(PDOMIndexerJob.java:137) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54) It looks I've found the problem.
org.eclipse.cdt.internal.core.parser.scanner.FileCharArray.readChunkData
does not take into account result of decoder operation. It must break the loop in case of decoder result is "overflow" but not to wait until dest buffer is full.
here is my variant of loop (not optimal but with minimum of changes):
while (dest.position() < CHUNK_SIZE && !endOfInput) {
fChannel.position(fileOffset);
in.clear();
int count= fChannel.read(in);
if (count == -1) {
break;
}
endOfInput= count < in.capacity();
in.flip();
if (fileOffset == 0) {
skipUTF8ByteOrderMark(in, fCharSet);
}
final CoderResult decodeRC = decoder.decode(in, dest, endOfInput);
fileOffset += in.position();
if (decodeRC.isOverflow())
break;
}
This might be OSX specific trouble.
IK
(In reply to comment #5) > It looks I've found the problem. > org.eclipse.cdt.internal.core.parser.scanner.FileCharArray.readChunkData > does not take into account result of decoder operation. It must break the loop > in case of decoder result is "overflow" but not to wait until dest buffer is > full. > here is my variant of loop (not optimal but with minimum of changes): > while (dest.position() < CHUNK_SIZE && !endOfInput) { > fChannel.position(fileOffset); > in.clear(); > int count= fChannel.read(in); > if (count == -1) { > break; > } > endOfInput= count < in.capacity(); > in.flip(); > if (fileOffset == 0) { > skipUTF8ByteOrderMark(in, fCharSet); > } > final CoderResult decodeRC = decoder.decode(in, dest, endOfInput); > fileOffset += in.position(); > if (decodeRC.isOverflow()) > break; > } > This might be OSX specific trouble. > IK Thanks for diving into this. To me this looks like a bug in the JVM: When the overflow occurs I would expect 'dest.position()' to equal 'CHUNK_SIZE' which is the limit of the output buffer. There is two possibilities for how the JVM is wrong: (a) it does not decode as many characters as possible and therefore the destination buffer's position is not set to the end, or (b) it does fill the destination buffer and simply fails to adjust its position. Because it is unclear what happens, there is no clean way to recover from this situation, just breaking the loop will create an array of chars that is not long enough. Well, I can not find your variant of contract in docs (http://developer.apple.com/mac/library/documentation/Java/Reference/JavaSE6_API/api/java/nio/charset/CharsetDecoder.html#decode(java.nio.ByteBuffer,%20java.nio.CharBuffer,%20boolean)). What I've found is that it returns CoderResult which is the only indicator of what is going on. Not a word on buffer remaining and other stuff. So JRE is correct, the problem is in interpretation of contract which is wrong. Please fix this. (In reply to comment #7) > Well, I can not find your variant of contract in docs > (http://developer.apple.com/mac/library/documentation/Java/Reference/JavaSE6_API/api/java/nio/charset/CharsetDecoder.html#decode(java.nio.ByteBuffer,%20java.nio.CharBuffer,%20boolean)). > What I've found is that it returns CoderResult which is the only indicator of > what is going on. Not a word on buffer remaining and other stuff. So JRE is > correct, the problem is in interpretation of contract which is wrong. Please > fix this. Hmm, the contract says (among other things): ... The buffers' positions will be advanced to reflect the bytes read and the characters written, but their marks and limits will not be modified. So when an overflow situation occurs I should be safe to assume that the output buffers postition is set to its limit. In any way I don't know how I should interpret a different buffer position? Your JVM somehow says that there is no more room in the output buffer and at the same time it indicates that it is only partially filled. Which of the two is true, i.e. is the position not set correctly or is the buffer not filled? If the former is the case I could fix it by adjusting the position, in the latter case I'd need to fill the rest of the buffer, which is unclear to me how that could work (by providing less input??). Looks like bugzilla don't attach reply letters to comments. (In reply to comment #8) > Hmm, the contract says (among other things): > ... The buffers' positions will be advanced to reflect the bytes read and the > characters written, but their marks and limits will not be modified. Not a word on overflow situation. Limits are modified correctly. Last char is not filled. > So when an overflow situation occurs I should be safe to assume that the output > buffers postition is set to its limit. In any way I don't know how I should > interpret a different buffer position? Your JVM somehow says that there is no > more room in the output buffer and at the same time it indicates that it is > only partially filled. Which of the two is true, i.e. is the position not set > correctly or is the buffer not filled? Well. You set endOfStream argument to false. I believe 32-bit or 64-bit optimized version of decoder won't convert last char in this situation, expecting next chunk to come. I don't see any bugs here. It works according to spec, but not according expectations :). > If the former is the case I could fix it by adjusting the position, in the > latter case I'd need to fill the rest of the buffer, which is unclear to me how > that could work (by providing less input??). Anyway you work this around will be way better then infinite loop with CPU/disk load of 100%. (In reply to comment #9) > Not a word on overflow situation. Limits are modified correctly. Last char is > not filled. I did not make that up, Java 1.5 documentation says: CoderResult.OVERFLOW indicates that the output buffer is full. This method should be invoked again with a non-full output buffer. Looking at the 1.6 documentation I can see that they have changed the contract :-(: CoderResult.OVERFLOW indicates that there is insufficient space in the output buffer to decode any more bytes. This method should be invoked again with an output buffer that has more remaining characters. > Well. You set endOfStream argument to false. I believe 32-bit or 64-bit > optimized version of decoder won't > convert last char in this situation, expecting next chunk to come. Setting the endOfStream argument will not help, it has an effect when there are no more input bytes, however in our situation there is sufficient input. > ... > Anyway you work this around will be way better then infinite loop with > CPU/disk load of 100%. I am looking for a more complete solution for the problem. Because I don't see a way to fill the buffer (with knowing the file-position afterwards) I will need to change the way FileCharArray works. Thanks for your help in figuring out why the infinite loop occurs. (In reply to comment #10) > Setting the endOfStream argument will not help, it has an effect when there are > no more input bytes, however in our situation there is sufficient input. I was hoping that it works like flush. But you are right, this does not help. > > ... > > Anyway you work this around will be way better then infinite loop with > > CPU/disk load of 100%. > I am looking for a more complete solution for the problem. Because I don't see > a way to fill the buffer (with knowing the file-position afterwards) I will > need to change the way FileCharArray works. I'd suggest increasing dest buffer instead as quick fix. I've just checked that this fixes my trouble (final CharBuffer dest= CharBuffer.allocate(CHUNK_SIZE + 10);). (In reply to comment #11) > I'd suggest increasing dest buffer instead as quick fix. I've just checked that > this fixes my trouble (final CharBuffer dest= CharBuffer.allocate(CHUNK_SIZE + > 10);). .. and will lead to other problems. I will work on a complete fix. Created attachment 177398 [details]
fix
Attached is my proposal for the fix. Because I was not able to reproduce the issue on my Windows machine, it'd be helpful if you can test the patch on your mac.
Fixed in 7.0.1 and 8.0 > 20100825. *** cdt cvs genie on behalf of mschorn *** Bug 320157: Endless loop decoding large file. [*] LazyCharArray.java 1.4.2.1 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt-core/org.eclipse.cdt.core/parser/org/eclipse/cdt/internal/core/parser/scanner/LazyCharArray.java?root=Tools_Project&r1=1.4&r2=1.4.2.1 [*] FileCharArray.java 1.4.2.1 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt-core/org.eclipse.cdt.core/parser/org/eclipse/cdt/internal/core/parser/scanner/FileCharArray.java?root=Tools_Project&r1=1.4&r2=1.4.2.1 [*] ScannerTestSuite.java 1.9.2.1 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt-core/org.eclipse.cdt.core.tests/parser/org/eclipse/cdt/core/parser/tests/scanner/ScannerTestSuite.java?root=Tools_Project&r1=1.9&r2=1.9.2.1 [+] FileCharArrayTests.java http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt-core/org.eclipse.cdt.core.tests/parser/org/eclipse/cdt/core/parser/tests/scanner/FileCharArrayTests.java?root=Tools_Project&revision=1.1&view=markup [*] ScannerTestSuite.java 1.10 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt-core/org.eclipse.cdt.core.tests/parser/org/eclipse/cdt/core/parser/tests/scanner/ScannerTestSuite.java?root=Tools_Project&r1=1.9&r2=1.10 [+] FileCharArrayTests.java http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt-core/org.eclipse.cdt.core.tests/parser/org/eclipse/cdt/core/parser/tests/scanner/FileCharArrayTests.java?root=Tools_Project&revision=1.1&view=markup [*] LazyCharArray.java 1.6 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt-core/org.eclipse.cdt.core/parser/org/eclipse/cdt/internal/core/parser/scanner/LazyCharArray.java?root=Tools_Project&r1=1.5&r2=1.6 [*] FileCharArray.java 1.5 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt-core/org.eclipse.cdt.core/parser/org/eclipse/cdt/internal/core/parser/scanner/FileCharArray.java?root=Tools_Project&r1=1.4&r2=1.5 |