| Summary: | Loading a 25GB heap fails with a NegativeArraySizeException | ||
|---|---|---|---|
| Product: | [Tools] MAT | Reporter: | Christopher Ebert <chris.ebert> |
| Component: | Core | Assignee: | Andrew Johnson <andrew_johnson> |
| Status: | RESOLVED NOT_ECLIPSE | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | CC: | kevin.grigorenko |
| Version: | 1.1 | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| See Also: | https://bugs.eclipse.org/bugs/show_bug.cgi?id=404679 | ||
| Whiteboard: | |||
The failing line is
byte[] chars = new byte[(int) (length - idSize)];
idSize is either 4 or 8, and is probably 8.
Length must be too small, or else giant i.e. > 2G for the exception to occur, and a constant pool string shouldn't be that big.
The question is whether the dump corrupt, or the HPROF parser is not reading it properly.
I was wondering if it was > 2GB. Is there any additional logging I can turn on to check? I can't see any more logging in the code. You could run MAT under the Eclipse debugger by following these instructions: http://wiki.eclipse.org/index.php?title=MemoryAnalyzer/Contributor_Reference#Workspace and set a breakpoint on the throwing of NegativeArraySizeException and see what the value of length is. It doesn't affect the bug, but this looks wrong
String value = new String(chars);
as the conversion depends on the file encoding that MAT is running with.
This should be
String value = new String(chars, "UTF8");
Bug 368816 addresses the conversion problem. I've added some more error checking for the length - the SnapshotException should indicate more about why the problem occurs. OK -- I'll work on building MAT and running the debugger on it. Drat. So I followed the instructions given, I believe, and am having trouble. I installed the DTFJ plugin: eclipse info dialog shows Diagnostic Tool Framework for Java 1.5.0.201103111151 com.ibm.dtfj.feature.feature.group but org.eclipse.mat.ibmdumps isn't finding it. I tried doing the maven build, had some trouble, but got at least the 'prepare' working. Specifically, I'm missing: import com.ibm.tools.attach.AgentInitializationException; import com.ibm.tools.attach.AgentLoadException; import com.ibm.tools.attach.AttachNotSupportedException; import com.ibm.tools.attach.VirtualMachine; import com.ibm.tools.attach.VirtualMachineDescriptor; on trunk. Any help appreciated: seems like MAT won't start without it? The com.ibm.tools.attach classes and com.ibm.jvm.Dump class are provided by IBM VMs, not by the DTFJ. They are used for MAT to generate dumps on running IBM VM processes. The intention of the build process is that the files in org.eclipse.mat.ibmdumps/src2 can only be compiled with an IBM VM (or possibly with access to jars from an IBM VM), so I checked the class files built by Eclipse on my machine into SVN in the classes directory, and the build process should not need to compile them. Bug 320082 and http://wiki.eclipse.org/MemoryAnalyzer/Building_MAT_With_Tycho explain about the Tycho/Maven build process. The ibmdumps project is optional - MAT should start without it, so you could exclude the whole project. You could also take a snapshot nightly build from http://www.eclipse.org/mat/snapshotBuilds.php which should have a slightly better error message. I got MAT running under debug. I also got the updates from SVN, and when I run with my nice ~29G file, the log has: Task: Parsing jmap_dumplive-20111214.hprof [Task: Parsing /opt/heapdumps/algol/jmap_dumplive-20111214.hprof [ Subtask: Scanning /opt/heapdumps/algol/jmap_dumplive-20111214.hprof [........] !SESSION 2012-01-30 16:42:22.548 ----------------------------------------------- eclipse.buildId=unknown java.version=1.6.0_27 java.vendor=Sun Microsystems Inc. BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk, NL=en_US Framework arguments: -product org.eclipse.mat.ui.MemoryAnalyserPlugin -application org.eclipse.mat.api.parse /opt/heapdumps/algol/jmap_dumplive-20111214.hprof Command-line arguments: -product org.eclipse.mat.ui.MemoryAnalyserPlugin -data /home/cebert/workspace-misc/../runtime-New_configuration -dev file:/home/cebert/workspace-misc/.metadata/.plugins/org.eclipse.pde.core/MAT/dev.properties -os linux -ws gtk -arch x86_64 -consoleLog -application org.eclipse.mat.api.parse /opt/heapdumps/algol/jmap_dumplive-20111214.hprof !ENTRY org.eclipse.osgi 4 0 2012-01-30 16:43:37.172 !MESSAGE Application error !STACK 1 org.eclipse.mat.SnapshotException: Illegal record length 2,685,160,960 at byte 12,413,682,384 for record type 1 at org.eclipse.mat.hprof.Pass1Parser.read(Pass1Parser.java:108) at org.eclipse.mat.hprof.HprofIndexBuilder.fill(HprofIndexBuilder.java:65) at org.eclipse.mat.parser.internal.SnapshotFactoryImpl.parse(SnapshotFactoryImpl.java:209) at org.eclipse.mat.parser.internal.SnapshotFactoryImpl.openSnapshot(SnapshotFactoryImpl.java:120) at org.eclipse.mat.snapshot.SnapshotFactory.openSnapshot(SnapshotFactory.java:145) at org.eclipse.mat.internal.apps.ParseSnapshotApp.parse(ParseSnapshotApp.java:114) at org.eclipse.mat.internal.apps.ParseSnapshotApp.start(ParseSnapshotApp.java:100) at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:369) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:620) at org.eclipse.equinox.launcher.Main.basicRun(Main.java:575) at org.eclipse.equinox.launcher.Main.run(Main.java:1408) at org.eclipse.equinox.launcher.Main.main(Main.java:1384) An error has occurred. See the log file /home/cebert/workspace-misc/.metadata/.plugins/org.eclipse.pde.core/MAT/1327970542814.log. It seems I do have a record that's larger than 2G, although I don't know exactly how. Turning this into a string seems problematic. I don't know why a dump would have a constant pool string that big. The javac compiler won't allow field names, string constants etc. that long. Perhaps something earlier in the dump is corrupt and it is misparsing subsequent data. 2,685,160,960 is 0xA00C4E00, so the bytes in the dump are those in big-endian format. Does jhat read the dump? Yes, it seems to. It isn't responsive, but I have to allocate ~16G to get it to read the file, and that's as much physical RAM as I have (well, OK, I've got 18) so I imagine it's getting swapped. I did turn on debug logging and it seems to get through the parse. Where should I instrument MAT to get a similar log. My jHat log looks something like this:
Read heap sub-record type 35 at position 0xeb79aa
Read heap sub-record type 34 at position 0xeb79f4
Read heap sub-record type 35 at position 0xeb7a65
Read heap sub-record type 35 at position 0xeb7a87
Read heap sub-record type 35 at position 0xeb7aa9
for many, many lines. Eventually it starts messages about not finding objects in the heap, so I assume it's done parsing. I could leave it to see if it actually starts with webservice but that would probably take all night...
Oh, no. Lucky me -- 'expect 12312 dots'. It looks like it will load OK. Any advice on next steps?
Do you think there are any arrays >2Gbytes in size? There's some code I see which might go wrong in that case. E.g. in.skipBytes(elementSize * size); where elementSize*size might overflow an int. Also the PositionInputStream.skipBytes(long) method looks suspicious. It takes a long input but only returns a int for bytes skipped. If the underlying stream skip skips all the bytes requested then this method would return 0 for a skip of 0x80000000L bytes. For a skip of 0x100000001L bytes then it would skip 0x100000001 bytes initially, then cast the 0x100000001 to an int (0x1) then do another skip of the difference 0x100000000 then stop as the return of the superclass skip would be cast to a 0, then it would return 1 to the caller. It might not explain this bug, as it should skip correctly with inputs 0x0 to 0xFFFFFFFFL if the skip is done in one go by the underlying stream even though the returned value could be wrong. I'm not sure how to fix this one as changing the return type to a long is an incompatible change. The class is mentioned by http://wiki.eclipse.org/MemoryAnalyzer/Adding_a_new_heap_dump_format_to_the_Memory_Analyzer The package documentation for org.eclipse.mat.parser.io says: Utilities for the parser component (not API). so perhaps changing the return type is okay. I'll try putting some logging around those calls and see what happens. It does seem the signature of skipBytes method should either be skipBytes(long):long (which is like a file) or skipBytes(int):int but if you're stuck by what's published, well, c'est la vie. It is possible that I have ~250M of something, particularly character arrays, so maybe a structure that keeps track of them all would be large enough? It's a large heap, so there's room in there for a big array or two. Also, I'll leave jHat loading overnight and see if I can check for large arrays that way. Ah. I just went through the jHat logs and found: Read record type 12, length 3320392219 at position 0x601c0f I think that's probably what's killing MAT. So yes, I have a big array, I'll bet (I'm not sure what 'record type 12' is) Hm. I've put in my logging and find something mysterious. In the Pass1Parser at about line 100 I added a line to log the record type, length and position, figuring it would look very much like the output of jHat.
a diff of the logs shows: (< is MAT output, > is jHat, ignoring whitespace)
124203,124212c124205,61690020
< Read record type 33, length 1053164360 at position 0xc649463b
< Read record type 0, length 157555 at position 0x1050f498c
< Read record type 0, length 553648128 at position 0x10511b108
< Read record type 99, length 553648128 at position 0x12611b111
< Read record type 0, length 3989043200 at position 0x14711b11a
< Read record type 160, length 3 at position 0x234d5a923
< Read record type 200, length 885541336 at position 0x234d5a92f
< Read record type 3, length 0 at position 0x2699df310
< Read record type 0, length 35 at position 0x2699df319
< Read record type 1, length 2685160864 at position 0x2699df345
---
> Read heap sub-record type 32 at position 0x601c18
> Read heap sub-record type 32 at position 0x601d5a
> Read heap sub-record type 32 at position 0x601dd4
> Read heap sub-record type 32 at position 0x601e62
> Read heap sub-record type 32 at position 0x601ecc
> Read heap sub-record type 32 at position 0x601f36
> Read heap sub-record type 32 at position 0x601f86
> Read heap sub-record type 32 at position 0x601fe7
> Read heap sub-record type 32 at position 0x602052
> Read heap sub-record type 32 at position 0x602123
Ah. And the line before the problem starts is
Read record type 12, length 3320392219 at position 0x601c0f
So yes, it's the big #12 for certain.
Thank you for your analysis. I've made some changes to ensure skipBytes(long) skips enough bytes. We can consider the return type later as that isn't being used. I've also changed the skipping for arrays to avoid integer overflow. You may need to add some logging for sub-records at around line 230 in Pass1Parser. Perhaps we are skipping some of the dump - I see some code which will skip multiple heap dumps in a single file, but it looks like it would skip Heap dump segments after the first too. I added some debug logging to the method that handles type 12, and found it reads everything just like jHat does. Apparently, it doesn't stop reading new items from the heap correctly, though: jHat switches to processing the heap for objects at that point, and MAT keeps reading objects. At least that's what the log looks like: (jHat and MAT now both output these two lines) Read heap sub-record type 33 at position 0xc64945a9 Read heap sub-record type 33 at position 0xc64945f2 (only MAT tries to keep reading records from the heap) Read record type 33, length 1053164360 at position 0xc649463b Read record type 0, length 157555 at position 0x1050f498c Read record type 0, length 553648128 at position 0x10511b108 Read record type 99, length 553648128 at position 0x12611b111 Read record type 0, length 3989043200 at position 0x14711b11a Further, the method for handling heaps and fragments seems to handle long lengths just fine. All the math is done with 'long' variables. I'll keep looking, but right now it doesn't seem to be a size-of-int sort of thing. I'm trying to get the source for jHat to compare to MAT. Any advice? I found the svn link on java.net, and I added myself to the project, but I can't download the source :( I don't want to see any source for jHat as the license (GPL or Sun/Oracle license) won't be compatible with the Eclipse Public License, so jHat code must be kept separate from the Eclipse Memory Analyzer code. The Read record type 33, length 1053164360 at position 0xc649463b looks suspicious as plain Java object (not array) shouldn't be that big, and the instance record probably should only be inside another heap dump segment. Oh -- OK. No jHat: that makes sense. The log from jHat switches at that point to processing the heap; the log for MAT seems to keep trying to process objects. I'll look at it some more and see what I can figure out. (In reply to comment #15) > Ah. I just went through the jHat logs and found: > > Read record type 12, length 3320392219 at position 0x601c0f > > I think that's probably what's killing MAT. So yes, I have a big array, I'll > bet (I'm not sure what 'record type 12' is) That is a Constants.Record.HEAP_DUMP record, (rather than a Constants.Record.HEAP_DUMP_SEGMENT record). The sub-records are inside this record, and when they finish parsing should resume at 0x601c0f + 1 + 4 + 4 + 3320392219 = 0xc6494633 rather than 0xc649463b as shown by comment #18 > Read record type 33, length 1053164360 at position 0xc649463b That suggests that this sub-record might be too long? > Read heap sub-record type 33 at position 0xc64945f2 From reading the code for readInstanceDump() the format of sub record in byte sizes 1 - type 4/8 - address 4 - time stamp 4/8 - class id 4 - size size - field values so if that size field were 0x28 then the ending position would be 0xc64945f2 + 1 + 8 + 4 + 8 + 4 + 0x28 = 0xc6494533 and if it were 0x30 then the ending position would be 0xc649453b I've put a bit more tracing into the Hprof parser - it's just like the tracing in Comment #18, so might not add much to what we already know, but it will be useful for other problems. To enable, run with -debug with a .options file of org.eclipse.mat.hprof/debug=true org.eclipse.mat.hprof/debug/parser=true You may want to use eclipsec.exe from the overnight builds as then the log is visible on the console. Perhaps we should ship a MemoryAnalyzerC.exe etc. for this purpose. I have an excerpt of the heap at the point of failure (nothing sensitive in it). Here's the hexdump starting at the previous object: c64945f0 00 00 21 00 00 00 02 3e c6 03 20 00 00 00 01 00 |..!....>.. .....| c6494600 00 00 01 a0 04 26 a0 00 00 00 30 00 00 00 02 3e |.....&....0....>| c6494610 c6 03 48 00 00 00 02 3e c6 02 f8 00 00 00 02 3e |..H....>.......>| c6494620 c6 56 38 00 00 00 01 c0 25 38 c8 00 00 00 00 00 |.V8.....%8......| c6494630 00 00 00 00 00 00 00 00 00 00 00 21 00 00 00 02 |...........!....| c6494640 3e c6 03 48 00 00 00 01 00 00 00 01 a0 04 26 a0 |>..H..........&.| c6494650 00 00 00 30 00 00 00 02 3e c6 03 70 00 00 00 02 |...0....>..p....| c6494660 3e c6 03 20 00 00 00 02 3e c6 56 28 00 00 00 01 |>.. ....>.V(....| c6494670 c0 25 38 c8 00 00 00 00 00 00 00 00 00 00 00 00 |.%8.............| c6494680 00 00 00 00 21 00 00 00 02 3e c6 03 70 00 00 00 |....!....>..p...| c6494690 01 00 00 00 01 a0 04 26 a0 00 00 00 30 00 00 00 |.......&....0...| c64946a0 02 3e c6 03 98 00 00 00 02 3e c6 03 48 00 00 00 |.>.......>..H...| c64946b0 02 3e c6 53 38 00 00 00 01 c0 25 38 c8 00 00 00 |.>.S8.....%8....| c64946c0 00 00 00 00 00 00 00 00 00 00 00 00 00 21 00 00 |.............!..| c64946d0 00 02 3e c6 03 98 00 00 00 01 00 00 00 01 a0 04 |..>.............| c64946e0 26 a0 00 00 00 30 00 00 00 02 3e c6 03 c0 00 00 |&....0....>.....| c64946f0 00 02 3e c6 03 70 00 00 00 02 3e c6 53 28 00 00 |..>..p....>.S(..| c6494700 00 01 c0 25 38 c8 00 00 00 00 00 00 00 00 00 00 |...%8...........| c6494710 00 00 00 00 00 00 21 00 00 00 02 3e c6 03 c0 00 |......!....>....| c6494720 00 00 01 00 00 00 01 a0 04 26 a0 00 00 00 30 00 |.........&....0.| c6494730 00 00 02 3e c6 03 e8 00 00 00 02 3e c6 03 98 00 |...>.......>....| c6494740 00 00 02 3e c6 50 38 00 00 00 01 c0 25 38 c8 00 |...>.P8.....%8..| c6494750 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 21 |...............!| c6494760 00 00 00 02 3e c6 03 e8 00 00 00 01 00 00 00 01 |....>...........| c6494770 a0 04 26 a0 00 00 00 30 00 00 00 02 3e c6 04 10 |..&....0....>...| c6494780 00 00 00 02 3e c6 03 c0 00 00 00 02 3e c6 50 28 |....>.......>.P(| c6494790 00 00 00 01 c0 25 38 c8 00 00 00 00 00 00 00 00 |.....%8.........| c64947a0 00 00 00 00 00 00 00 00 21 00 00 00 02 3e c6 04 |........!....>..| c64947b0 10 00 00 00 01 00 00 00 01 a0 04 26 a0 00 00 00 |...........&....| c64947c0 30 00 00 00 02 3e c6 04 38 00 00 00 02 3e c6 03 |0....>..8....>..| c64947d0 e8 00 00 00 02 3e c6 4d 38 00 00 00 01 c0 25 38 |.....>.M8.....%8| c64947e0 c8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| c64947f0 00 21 00 00 00 02 3e c6 04 38 00 00 00 01 00 00 |.!....>..8......| c6494800 00 01 a0 04 26 a0 00 00 00 30 00 00 00 02 3e c6 |....&....0....>.| c6494810 04 60 00 00 00 02 3e c6 04 10 00 00 00 02 3e c6 |.`....>.......>.| c6494820 4d 28 00 00 00 01 c0 25 38 c8 00 00 00 00 00 00 |M(.....%8.......| c6494830 00 00 00 00 00 00 00 00 00 00 21 00 00 00 02 3e |..........!....>| c6494840 c6 04 60 00 00 00 01 00 00 00 01 a0 04 26 a0 00 |..`..........&..| c6494850 00 00 30 00 00 00 02 3e c6 04 88 00 00 00 02 3e |..0....>.......>| c6494860 c6 04 38 00 00 00 02 3e c6 4a 38 00 00 00 01 c0 |..8....>.J8.....| c6494870 25 38 c8 00 00 00 00 00 00 00 00 00 00 00 00 00 |%8..............| c6494880 00 00 00 21 00 00 00 02 3e c6 04 88 00 00 00 01 |...!....>.......| c6494890 00 00 00 01 a0 04 26 a0 00 00 00 30 00 00 00 02 |......&....0....| c64948a0 3e c6 04 b0 00 00 00 02 3e c6 04 60 00 00 00 02 |>.......>..`....| c64948b0 3e c6 4a 28 00 00 00 01 c0 25 38 c8 00 00 00 00 |>.J(.....%8.....| c64948c0 00 00 00 00 00 00 00 00 00 00 00 00 21 00 00 00 |............!...| c64948d0 02 3e c6 04 b0 00 00 00 01 00 00 00 01 a0 04 26 |.>.............&| c64948e0 a0 00 00 00 30 00 00 00 02 3e c6 04 d8 00 00 00 |....0....>......| c64948f0 02 3e c6 04 88 00 00 00 02 3e c6 47 38 00 00 00 |.>.......>.G8...| c6494900 01 c0 25 38 c8 00 00 00 00 00 00 00 00 00 00 00 |..%8............| c6494910 00 00 00 00 00 21 00 00 00 02 3e c6 04 d8 00 00 |.....!....>.....| c6494920 00 01 00 00 00 01 a0 04 26 a0 00 00 00 30 00 00 |........&....0..| c6494930 00 02 3e c6 05 00 00 00 00 02 3e c6 04 b0 00 00 |..>.......>.....| c6494940 00 02 3e c6 47 28 00 00 00 01 c0 25 38 c8 00 00 |..>.G(.....%8...| c6494950 00 00 00 00 00 00 00 00 00 00 00 00 00 00 21 00 |..............!.| c6494960 00 00 02 3e c6 05 00 00 00 00 01 00 00 00 01 a0 |...>............| c6494970 04 26 a0 00 00 00 30 00 00 00 02 3e c6 05 28 00 |.&....0....>..(.| c6494980 00 00 02 3e c6 04 d8 00 00 00 02 3e c6 44 38 00 |...>.......>.D8.| c6494990 00 00 01 c0 25 38 c8 00 00 00 00 00 00 00 00 00 |....%8..........| c64949a0 00 00 00 00 00 00 00 21 00 00 00 02 3e c6 05 28 |.......!....>..(| It looks like something gets off track. The record at c649483b looks valid enough, but seems like it should end at c6494883. This would make sense -- there should be a *lot* more read (the length of the segment seems maybe wrong as well? The rough pattern of '21' every so many bytes continues for a loooong time. It's a 29G dump file and we're less than 4G through it. So, having looked at the code and the output, I've decided that the original length must somehow be wrong - it's only a ~3G heap dump. The dump is much larger than that, so it ought to be a segment, or be much longer. From the data below in the file, it seems the latter. The relevant bytes: 00601c00 02 1d 08 00 00 00 01 00 00 00 01 a0 00 0b b8 0c |................| 00601c10 00 00 00 00 c5 e9 2a 1b 20 00 00 00 01 a5 fb 47 |......*. ......G| 00601c20 e8 00 00 00 01 00 00 00 01 a0 02 21 e8 00 00 00 |...........!....| 00601c30 03 5a d9 88 18 00 00 00 00 00 00 00 00 00 00 00 |.Z..............| 00601c40 03 5a d9 8a d0 00 00 00 00 00 00 00 00 00 00 00 |.Z..............| 00601c50 00 00 00 00 00 00 00 00 51 00 00 00 06 00 00 00 |........Q.......| 00601c60 01 a0 18 a4 58 0a 00 00 20 00 00 00 00 01 a2 99 |....X... .......| 00601c70 9f 90 02 00 00 00 03 5a da 2f 80 00 00 00 01 a2 |.......Z./......| 00601c80 99 a0 70 02 00 00 00 03 5a da 2f 00 00 00 00 01 |..p.....Z./.....| 00601c90 a2 99 a0 98 02 00 00 00 03 5a da 2e 80 00 00 00 |.........Z......| 00601ca0 01 a2 99 a0 c0 02 00 00 00 03 5a da 2e 50 00 00 |..........Z..P..| 00601cb0 00 01 a2 99 a0 e8 02 00 00 00 01 c0 88 e2 78 00 |..............x.| 00601cc0 11 00 00 00 01 a2 99 9f b0 02 00 00 00 01 a1 ad |................| 00601cd0 93 70 02 00 00 00 01 a0 58 63 e0 0a 00 00 00 01 |.p......Xc......| 00601ce0 a2 99 73 10 04 00 00 00 01 a0 33 71 c8 0a 00 00 |..s.......3q....| 00601cf0 00 01 a2 99 a0 10 0a 00 00 00 01 a2 99 a0 30 0a |..............0.| The record starts at 601c0f, which is indeed 0c (12). I'm a little stumped, and beginning to think that jHat doesn't actually parse the file correctly, it just doesn't fail dramatically. Here's my Java version: cebert@cebert:~/workspace-misc$ java -version java version "1.6.0_27" Java(TM) SE Runtime Environment (build 1.6.0_27-b07) Java HotSpot(TM) 64-Bit Server VM (build 20.2-b06, mixed mode) OK. Looks like this is actually a VM problem. The length of the heap is > 4G. When I hack the code to tack a leading 6 on the length of the heap segment, it seems to parse. I don't think there's anything MAT could do to fix this, really. The question is why is the JVM putting out an invalid dump? It seems that it's because it's under memory stress, as the dumps I get when everything is fine work. Unfortunately, memory stress is precisely when I want the dump. Maybe a feature to add some (long) constant to the size of the dump, estimated by the size of the file, would allow someone to process a heap such as I have. And I guess I'll send a bug to Oracle, too. Or maybe they'll explain how it actually works. Thanks for all your help. I'm returning this defect as one as MAT attempting to guess how to parse an invalid dump sounds tricky, and those dumps don't seem to occur very frequently. We do now have some debug trace to help diagnose similar problems in the future. Hey Chris, we may have found a workaround for your bug report in bug #404679. Do you still have this dump to test the fix on? |
Fails after a minute or two during initial parsing. I tried turning on debug (-debug, -consoleLog) but I don't see much extra output. Below is a transcript of the log. I've worked with smaller (~8G) heap files without trouble, but when I try my large heap files they all break. I have three. All fail apparently identically. Unfortunately, I don't think I can send one of my dump files as they have client data in them, and I don't know how I could 'clean' it without compromising the data. If you know of a tool to do that, I'd be happy to send the heap. Start VM: /usr/local/java/bin/java -Xmx10024m -jar /home/cebert/Applications/MemoryAnalyzer/mat/plugins/org.eclipse.equinox.launcher_1.1.1.R36x_v20101122_1400.jar -os linux -ws gtk -arch x86_64 -showsplash -launcher /home/cebert/Applications/MemoryAnalyzer/mat/MemoryAnalyzer -name MemoryAnalyzer --launcher.library /home/cebert/Applications/MemoryAnalyzer/mat/plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.2.R36x_v20101019_1345/eclipse_1310.so -startup /home/cebert/Applications/MemoryAnalyzer/mat/plugins/org.eclipse.equinox.launcher_1.1.1.R36x_v20101122_1400.jar -exitdata 5ea8010 -debug -consoleLog -data /home/cebert/workspace-mat -vm /usr/local/java/bin/java -vmargs -Xmx10024m -jar /home/cebert/Applications/MemoryAnalyzer/mat/plugins/org.eclipse.equinox.launcher_1.1.1.R36x_v20101122_1400.jar Install location: file:/home/cebert/Applications/MemoryAnalyzer/mat/ Configuration file: file:/home/cebert/Applications/MemoryAnalyzer/mat/configuration/config.ini loaded Configuration location: file:/home/cebert/Applications/MemoryAnalyzer/mat/configuration/ Framework located: file:/home/cebert/Applications/MemoryAnalyzer/mat/plugins/org.eclipse.osgi_3.6.2.R36x_v20110210.jar Framework classpath: file:/home/cebert/Applications/MemoryAnalyzer/mat/plugins/org.eclipse.osgi_3.6.2.R36x_v20110210.jar Splash location: null Debug options: file:/home/cebert/workspace-mat/.options loaded Time to load bundles: 5 Starting application: 734 Application Started: 3368 !SESSION 2012-01-10 14:07:05.039 ----------------------------------------------- eclipse.buildId=unknown java.version=1.6.0_27 java.vendor=Sun Microsystems Inc. BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk, NL=en_US Command-line arguments: -os linux -ws gtk -arch x86_64 -debug -consoleLog -data /home/cebert/workspace-mat !ENTRY org.eclipse.core.jobs 4 2 2012-01-10 14:07:29.741 !MESSAGE An internal error occurred during: "Parsing heap dump from '/opt/heapdumps/algol/jmap_dumplive-20111229.hprof'". !STACK 0 java.lang.NegativeArraySizeException at org.eclipse.mat.hprof.Pass1Parser.readString(Pass1Parser.java:167) at org.eclipse.mat.hprof.Pass1Parser.read(Pass1Parser.java:107) at org.eclipse.mat.hprof.HprofIndexBuilder.fill(HprofIndexBuilder.java:65) at org.eclipse.mat.parser.internal.SnapshotFactoryImpl.parse(SnapshotFactoryImpl.java:203) at org.eclipse.mat.parser.internal.SnapshotFactoryImpl.openSnapshot(SnapshotFactoryImpl.java:114) at org.eclipse.mat.snapshot.SnapshotFactory.openSnapshot(SnapshotFactory.java:145) at org.eclipse.mat.ui.snapshot.ParseHeapDumpJob.run(ParseHeapDumpJob.java:83) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)