Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 368300

Summary: Loading a 25GB heap fails with a NegativeArraySizeException
Product: [Tools] MAT Reporter: Christopher Ebert <chris.ebert>
Component: CoreAssignee: Andrew Johnson <andrew_johnson>
Status: RESOLVED NOT_ECLIPSE QA Contact:
Severity: normal    
Priority: P3 CC: kevin.grigorenko
Version: 1.1   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
See Also: https://bugs.eclipse.org/bugs/show_bug.cgi?id=404679
Whiteboard:

Description Christopher Ebert CLA 2012-01-10 17:13:04 EST
Fails after a minute or two during initial parsing.

I tried turning on debug (-debug, -consoleLog) but I don't see much extra output.

Below is a transcript of the log. I've worked with smaller (~8G) heap files without trouble, but when I try my large heap files they all break. I have three. All fail apparently identically. Unfortunately, I don't think I can send one of my dump files as they have client data in them, and I don't know how I could 'clean' it without compromising the data. If you know of a tool to do that, I'd be happy to send the heap. 


Start VM: /usr/local/java/bin/java
-Xmx10024m
-jar /home/cebert/Applications/MemoryAnalyzer/mat/plugins/org.eclipse.equinox.launcher_1.1.1.R36x_v20101122_1400.jar
-os linux
-ws gtk
-arch x86_64
-showsplash
-launcher /home/cebert/Applications/MemoryAnalyzer/mat/MemoryAnalyzer
-name MemoryAnalyzer
--launcher.library /home/cebert/Applications/MemoryAnalyzer/mat/plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.2.R36x_v20101019_1345/eclipse_1310.so
-startup /home/cebert/Applications/MemoryAnalyzer/mat/plugins/org.eclipse.equinox.launcher_1.1.1.R36x_v20101122_1400.jar
-exitdata 5ea8010
-debug
-consoleLog
-data /home/cebert/workspace-mat
-vm /usr/local/java/bin/java
-vmargs
-Xmx10024m
-jar /home/cebert/Applications/MemoryAnalyzer/mat/plugins/org.eclipse.equinox.launcher_1.1.1.R36x_v20101122_1400.jar 
Install location:
    file:/home/cebert/Applications/MemoryAnalyzer/mat/
Configuration file:
    file:/home/cebert/Applications/MemoryAnalyzer/mat/configuration/config.ini loaded
Configuration location:
    file:/home/cebert/Applications/MemoryAnalyzer/mat/configuration/
Framework located:
    file:/home/cebert/Applications/MemoryAnalyzer/mat/plugins/org.eclipse.osgi_3.6.2.R36x_v20110210.jar
Framework classpath:
    file:/home/cebert/Applications/MemoryAnalyzer/mat/plugins/org.eclipse.osgi_3.6.2.R36x_v20110210.jar
Splash location:
    null
Debug options:
    file:/home/cebert/workspace-mat/.options loaded
Time to load bundles: 5
Starting application: 734
Application Started: 3368
!SESSION 2012-01-10 14:07:05.039 -----------------------------------------------
eclipse.buildId=unknown
java.version=1.6.0_27
java.vendor=Sun Microsystems Inc.
BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk, NL=en_US
Command-line arguments:  -os linux -ws gtk -arch x86_64 -debug -consoleLog -data /home/cebert/workspace-mat

!ENTRY org.eclipse.core.jobs 4 2 2012-01-10 14:07:29.741
!MESSAGE An internal error occurred during: "Parsing heap dump from '/opt/heapdumps/algol/jmap_dumplive-20111229.hprof'".
!STACK 0
java.lang.NegativeArraySizeException
	at org.eclipse.mat.hprof.Pass1Parser.readString(Pass1Parser.java:167)
	at org.eclipse.mat.hprof.Pass1Parser.read(Pass1Parser.java:107)
	at org.eclipse.mat.hprof.HprofIndexBuilder.fill(HprofIndexBuilder.java:65)
	at org.eclipse.mat.parser.internal.SnapshotFactoryImpl.parse(SnapshotFactoryImpl.java:203)
	at org.eclipse.mat.parser.internal.SnapshotFactoryImpl.openSnapshot(SnapshotFactoryImpl.java:114)
	at org.eclipse.mat.snapshot.SnapshotFactory.openSnapshot(SnapshotFactory.java:145)
	at org.eclipse.mat.ui.snapshot.ParseHeapDumpJob.run(ParseHeapDumpJob.java:83)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
Comment 1 Andrew Johnson CLA 2012-01-11 07:42:57 EST
The failing line is
        byte[] chars = new byte[(int) (length - idSize)];
idSize is either 4 or 8, and is probably 8.
Length must be too small, or else giant i.e. > 2G for the exception to occur, and a constant pool string shouldn't be that big.
The question is whether the dump corrupt, or the HPROF parser is not reading it properly.
Comment 2 Christopher Ebert CLA 2012-01-12 03:00:31 EST
I was wondering if it was > 2GB. Is there any additional logging I can turn on to check?
Comment 3 Andrew Johnson CLA 2012-01-16 11:02:26 EST
I can't see any more logging in the code. You could run MAT under the Eclipse debugger by following these instructions:
http://wiki.eclipse.org/index.php?title=MemoryAnalyzer/Contributor_Reference#Workspace
and set a breakpoint on the throwing of NegativeArraySizeException and see what the value of length is.
Comment 4 Andrew Johnson CLA 2012-01-16 11:27:52 EST
It doesn't affect the bug, but this looks wrong
        String value = new String(chars);
as the conversion depends on the file encoding that MAT is running with.
This should be
        String value = new String(chars, "UTF8");
Comment 5 Andrew Johnson CLA 2012-01-17 08:55:11 EST
Bug 368816 addresses the conversion problem.

I've added some more error checking for the length - the SnapshotException should indicate more about why the problem occurs.
Comment 6 Christopher Ebert CLA 2012-01-17 16:49:43 EST
OK -- I'll work on building MAT and running the debugger on it.
Comment 7 Christopher Ebert CLA 2012-01-17 20:48:11 EST
Drat. So I followed the instructions given, I believe, and am having trouble.

I installed the DTFJ plugin: eclipse info dialog shows   Diagnostic Tool Framework for Java	1.5.0.201103111151	com.ibm.dtfj.feature.feature.group

but org.eclipse.mat.ibmdumps isn't finding it.

I tried doing the maven build, had some trouble, but got at least the 'prepare' working.
Specifically, I'm missing:

import com.ibm.tools.attach.AgentInitializationException;
import com.ibm.tools.attach.AgentLoadException;
import com.ibm.tools.attach.AttachNotSupportedException;
import com.ibm.tools.attach.VirtualMachine;
import com.ibm.tools.attach.VirtualMachineDescriptor;
 on trunk.

Any help appreciated: seems like MAT won't start without it?
Comment 8 Andrew Johnson CLA 2012-01-20 06:19:55 EST
The com.ibm.tools.attach classes and com.ibm.jvm.Dump class are provided by IBM VMs, not by the DTFJ. They are used for MAT to generate dumps on running IBM VM processes.
The intention of the build process is that the files in org.eclipse.mat.ibmdumps/src2 can only be compiled with an IBM VM (or possibly with access to jars from an IBM VM), so I checked the class files built by Eclipse on my machine into SVN in the classes directory, and the build process should not need to compile them.
Bug 320082 and http://wiki.eclipse.org/MemoryAnalyzer/Building_MAT_With_Tycho explain about the Tycho/Maven build process.

The ibmdumps project is optional - MAT should start without it, so you could exclude the whole project.

You could also take a snapshot nightly build from http://www.eclipse.org/mat/snapshotBuilds.php which should have a slightly better error message.
Comment 9 Christopher Ebert CLA 2012-01-30 19:56:55 EST
I got MAT running under debug. I also got the updates from SVN, and when I run with my nice ~29G file, the log has:

Task: Parsing jmap_dumplive-20111214.hprof
[Task: Parsing /opt/heapdumps/algol/jmap_dumplive-20111214.hprof
[
Subtask: Scanning /opt/heapdumps/algol/jmap_dumplive-20111214.hprof
[........]
!SESSION 2012-01-30 16:42:22.548 -----------------------------------------------
eclipse.buildId=unknown
java.version=1.6.0_27
java.vendor=Sun Microsystems Inc.
BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk, NL=en_US
Framework arguments:  -product org.eclipse.mat.ui.MemoryAnalyserPlugin -application org.eclipse.mat.api.parse /opt/heapdumps/algol/jmap_dumplive-20111214.hprof
Command-line arguments:  -product org.eclipse.mat.ui.MemoryAnalyserPlugin -data /home/cebert/workspace-misc/../runtime-New_configuration -dev file:/home/cebert/workspace-misc/.metadata/.plugins/org.eclipse.pde.core/MAT/dev.properties -os linux -ws gtk -arch x86_64 -consoleLog -application org.eclipse.mat.api.parse /opt/heapdumps/algol/jmap_dumplive-20111214.hprof

!ENTRY org.eclipse.osgi 4 0 2012-01-30 16:43:37.172
!MESSAGE Application error
!STACK 1
org.eclipse.mat.SnapshotException: Illegal record length 2,685,160,960 at byte 12,413,682,384 for record type 1
	at org.eclipse.mat.hprof.Pass1Parser.read(Pass1Parser.java:108)
	at org.eclipse.mat.hprof.HprofIndexBuilder.fill(HprofIndexBuilder.java:65)
	at org.eclipse.mat.parser.internal.SnapshotFactoryImpl.parse(SnapshotFactoryImpl.java:209)
	at org.eclipse.mat.parser.internal.SnapshotFactoryImpl.openSnapshot(SnapshotFactoryImpl.java:120)
	at org.eclipse.mat.snapshot.SnapshotFactory.openSnapshot(SnapshotFactory.java:145)
	at org.eclipse.mat.internal.apps.ParseSnapshotApp.parse(ParseSnapshotApp.java:114)
	at org.eclipse.mat.internal.apps.ParseSnapshotApp.start(ParseSnapshotApp.java:100)
	at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
	at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
	at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:369)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:620)
	at org.eclipse.equinox.launcher.Main.basicRun(Main.java:575)
	at org.eclipse.equinox.launcher.Main.run(Main.java:1408)
	at org.eclipse.equinox.launcher.Main.main(Main.java:1384)
An error has occurred. See the log file
/home/cebert/workspace-misc/.metadata/.plugins/org.eclipse.pde.core/MAT/1327970542814.log.

It seems I do have a record that's larger than 2G, although I don't know exactly how. Turning this into a string seems problematic.
Comment 10 Andrew Johnson CLA 2012-01-31 05:06:57 EST
I don't know why a dump would have a constant pool string that big. The javac compiler won't allow field names, string constants etc. that long.
Perhaps something earlier in the dump is corrupt and it is misparsing subsequent data. 2,685,160,960 is 0xA00C4E00, so the bytes in the dump are those in big-endian format.

Does jhat read the dump?
Comment 11 Christopher Ebert CLA 2012-01-31 16:10:54 EST
Yes, it seems to. It isn't responsive, but I have to allocate ~16G to get it to read the file, and that's as much physical RAM as I have (well, OK, I've got 18) so I imagine it's getting swapped. I did turn on debug logging and it seems to get through the parse. Where should I instrument MAT to get a similar log. My jHat log looks something like this:

    Read heap sub-record type 35 at position 0xeb79aa
    Read heap sub-record type 34 at position 0xeb79f4
    Read heap sub-record type 35 at position 0xeb7a65
    Read heap sub-record type 35 at position 0xeb7a87
    Read heap sub-record type 35 at position 0xeb7aa9

for many, many lines. Eventually it starts messages about not finding objects in the heap, so I assume it's done parsing. I could leave it to see if it actually starts with webservice but that would probably take all night...

Oh, no. Lucky me -- 'expect 12312 dots'. It looks like it will load OK. Any advice on next steps?
Comment 12 Andrew Johnson CLA 2012-01-31 16:30:35 EST
Do you think there are any arrays >2Gbytes in size? There's some code I see which might go wrong in that case. E.g.
in.skipBytes(elementSize * size);
where elementSize*size might overflow an int.
Comment 13 Andrew Johnson CLA 2012-02-01 09:53:06 EST
Also the PositionInputStream.skipBytes(long) method looks suspicious. It takes a long input but only returns a int for bytes skipped. If the underlying stream skip skips all the bytes requested then this method would return 0 for a skip of 0x80000000L bytes. For a skip of 0x100000001L bytes then it would skip 0x100000001 bytes initially, then cast the 0x100000001 to an int (0x1) then do another skip of the difference 0x100000000 then stop as the return of the superclass skip would be cast to a 0, then it would return 1 to the caller.

It might not explain this bug, as it should skip correctly with inputs 0x0 to 0xFFFFFFFFL if the skip is done in one go by the underlying stream even though the returned value could be wrong.

I'm not sure how to fix this one as changing the return type to a long is an incompatible change.
The class is mentioned by http://wiki.eclipse.org/MemoryAnalyzer/Adding_a_new_heap_dump_format_to_the_Memory_Analyzer
The package documentation for org.eclipse.mat.parser.io says:
Utilities for the parser component (not API). 
so perhaps changing the return type is okay.
Comment 14 Christopher Ebert CLA 2012-02-02 19:18:08 EST
I'll try putting some logging around those calls and see what happens. It does seem the signature of skipBytes method should either be skipBytes(long):long (which is like a file) or skipBytes(int):int but if you're stuck by what's published, well, c'est la vie.

It is possible that I have ~250M of something, particularly character arrays, so maybe a structure that keeps track of them all would be large enough? It's a large heap, so there's room in there for a big array or two.

Also, I'll leave jHat loading overnight and see if I can check for large arrays that way.
Comment 15 Christopher Ebert CLA 2012-02-02 19:27:19 EST
Ah. I just went through the jHat logs and found:

Read record type 12, length 3320392219 at position 0x601c0f

I think that's probably what's killing MAT. So yes, I have a big array, I'll bet (I'm not sure what 'record type 12' is)
Comment 16 Christopher Ebert CLA 2012-02-02 20:56:06 EST
Hm. I've put in my logging and find something mysterious. In the Pass1Parser at about line 100 I added a line to log the record type, length and position, figuring it would look very much like the output of jHat.

a diff of the logs shows: (< is MAT output, > is jHat, ignoring whitespace)
124203,124212c124205,61690020
< Read record type 33, length 1053164360 at position 0xc649463b
< Read record type 0, length 157555 at position 0x1050f498c
< Read record type 0, length 553648128 at position 0x10511b108
< Read record type 99, length 553648128 at position 0x12611b111
< Read record type 0, length 3989043200 at position 0x14711b11a
< Read record type 160, length 3 at position 0x234d5a923
< Read record type 200, length 885541336 at position 0x234d5a92f
< Read record type 3, length 0 at position 0x2699df310
< Read record type 0, length 35 at position 0x2699df319
< Read record type 1, length 2685160864 at position 0x2699df345
---
>     Read heap sub-record type 32 at position 0x601c18
>     Read heap sub-record type 32 at position 0x601d5a
>     Read heap sub-record type 32 at position 0x601dd4
>     Read heap sub-record type 32 at position 0x601e62
>     Read heap sub-record type 32 at position 0x601ecc
>     Read heap sub-record type 32 at position 0x601f36
>     Read heap sub-record type 32 at position 0x601f86
>     Read heap sub-record type 32 at position 0x601fe7
>     Read heap sub-record type 32 at position 0x602052
>     Read heap sub-record type 32 at position 0x602123

Ah. And the line before the problem starts is 

Read record type 12, length 3320392219 at position 0x601c0f

So yes, it's the big #12 for certain.
Comment 17 Andrew Johnson CLA 2012-02-03 03:47:38 EST
Thank you for your analysis. I've made some changes to ensure skipBytes(long) skips enough bytes. We can consider the return type later as that isn't being used. I've also changed the skipping for arrays to avoid integer overflow.

You may need to add some logging for sub-records at around line 230 in Pass1Parser. 

Perhaps we are skipping some of the dump - I see some code which will skip multiple heap dumps in a single file, but it looks like it would skip Heap dump segments after the first too.
Comment 18 Christopher Ebert CLA 2012-02-03 14:44:49 EST
I added some debug logging to the method that handles type 12, and found it reads everything just like jHat does. Apparently, it doesn't stop reading new items from the heap correctly, though: jHat switches to processing the heap for objects at that point, and MAT keeps reading objects. At least that's what the log looks like:

(jHat and MAT now both output these two lines)
Read heap sub-record type 33 at position 0xc64945a9
Read heap sub-record type 33 at position 0xc64945f2

(only MAT tries to keep reading records from the heap)
Read record type 33, length 1053164360 at position 0xc649463b
Read record type 0, length 157555 at position 0x1050f498c
Read record type 0, length 553648128 at position 0x10511b108
Read record type 99, length 553648128 at position 0x12611b111
Read record type 0, length 3989043200 at position 0x14711b11a

Further, the method for handling heaps and fragments seems to handle long lengths just fine. All the math is done with 'long' variables.

I'll keep looking, but right now it doesn't seem to be a size-of-int sort of thing.
Comment 19 Christopher Ebert CLA 2012-02-03 17:16:23 EST
I'm trying to get the source for jHat to compare to MAT. Any advice? I found the svn link on java.net, and I added myself to the project, but I can't download the source :(
Comment 20 Andrew Johnson CLA 2012-02-03 18:31:01 EST
I don't want to see any source for jHat as the license (GPL or Sun/Oracle license) won't be compatible with the Eclipse Public License, so jHat code must be kept separate from the Eclipse Memory Analyzer code.

The 
Read record type 33, length 1053164360 at position 0xc649463b
looks suspicious as plain Java object (not array) shouldn't be that big, and the instance record probably should only be inside another heap dump segment.
Comment 21 Christopher Ebert CLA 2012-02-06 14:14:47 EST
Oh -- OK. No jHat: that makes sense.
The log from jHat switches at that point to processing the heap; the log for MAT seems to keep trying to process objects. I'll look at it some more and see what I can figure out.
Comment 22 Andrew Johnson CLA 2012-02-07 06:30:18 EST
(In reply to comment #15)
> Ah. I just went through the jHat logs and found:
> 
> Read record type 12, length 3320392219 at position 0x601c0f
> 
> I think that's probably what's killing MAT. So yes, I have a big array, I'll
> bet (I'm not sure what 'record type 12' is)
That is a Constants.Record.HEAP_DUMP record, (rather than a Constants.Record.HEAP_DUMP_SEGMENT record).

The sub-records are inside this record, and when they finish parsing should resume at 0x601c0f + 1 + 4 + 4 + 3320392219 = 0xc6494633 rather than 0xc649463b as shown by comment #18
> Read record type 33, length 1053164360 at position 0xc649463b

That suggests that this sub-record might be too long?
> Read heap sub-record type 33 at position 0xc64945f2
From reading the code for readInstanceDump() the format of sub record in byte sizes
1 - type
4/8 - address
4 - time stamp
4/8 - class id
4 - size
size - field values

so if that size field were 0x28 then the ending position would be
0xc64945f2 + 1 + 8 + 4 + 8 + 4 + 0x28 = 0xc6494533
and if it were 0x30 then the ending position would be 0xc649453b
Comment 23 Andrew Johnson CLA 2012-02-10 08:12:34 EST
I've put a bit more tracing into the Hprof parser - it's just like the tracing in Comment #18, so might not add much to what we already know, but it will be useful for other problems.
To enable, run with -debug with a .options file of

org.eclipse.mat.hprof/debug=true
org.eclipse.mat.hprof/debug/parser=true

You may want to use eclipsec.exe from the overnight builds as then the log is visible on the console. Perhaps we should ship a MemoryAnalyzerC.exe etc. for this purpose.
Comment 24 Christopher Ebert CLA 2012-02-15 16:53:56 EST
I have an excerpt of the heap at the point of failure (nothing sensitive in it). Here's the hexdump starting at the previous object:

c64945f0  00 00 21 00 00 00 02 3e  c6 03 20 00 00 00 01 00  |..!....>.. .....|
c6494600  00 00 01 a0 04 26 a0 00  00 00 30 00 00 00 02 3e  |.....&....0....>|
c6494610  c6 03 48 00 00 00 02 3e  c6 02 f8 00 00 00 02 3e  |..H....>.......>|
c6494620  c6 56 38 00 00 00 01 c0  25 38 c8 00 00 00 00 00  |.V8.....%8......|
c6494630  00 00 00 00 00 00 00 00  00 00 00 21 00 00 00 02  |...........!....|
c6494640  3e c6 03 48 00 00 00 01  00 00 00 01 a0 04 26 a0  |>..H..........&.|
c6494650  00 00 00 30 00 00 00 02  3e c6 03 70 00 00 00 02  |...0....>..p....|
c6494660  3e c6 03 20 00 00 00 02  3e c6 56 28 00 00 00 01  |>.. ....>.V(....|
c6494670  c0 25 38 c8 00 00 00 00  00 00 00 00 00 00 00 00  |.%8.............|
c6494680  00 00 00 00 21 00 00 00  02 3e c6 03 70 00 00 00  |....!....>..p...|
c6494690  01 00 00 00 01 a0 04 26  a0 00 00 00 30 00 00 00  |.......&....0...|
c64946a0  02 3e c6 03 98 00 00 00  02 3e c6 03 48 00 00 00  |.>.......>..H...|
c64946b0  02 3e c6 53 38 00 00 00  01 c0 25 38 c8 00 00 00  |.>.S8.....%8....|
c64946c0  00 00 00 00 00 00 00 00  00 00 00 00 00 21 00 00  |.............!..|
c64946d0  00 02 3e c6 03 98 00 00  00 01 00 00 00 01 a0 04  |..>.............|
c64946e0  26 a0 00 00 00 30 00 00  00 02 3e c6 03 c0 00 00  |&....0....>.....|
c64946f0  00 02 3e c6 03 70 00 00  00 02 3e c6 53 28 00 00  |..>..p....>.S(..|
c6494700  00 01 c0 25 38 c8 00 00  00 00 00 00 00 00 00 00  |...%8...........|
c6494710  00 00 00 00 00 00 21 00  00 00 02 3e c6 03 c0 00  |......!....>....|
c6494720  00 00 01 00 00 00 01 a0  04 26 a0 00 00 00 30 00  |.........&....0.|
c6494730  00 00 02 3e c6 03 e8 00  00 00 02 3e c6 03 98 00  |...>.......>....|
c6494740  00 00 02 3e c6 50 38 00  00 00 01 c0 25 38 c8 00  |...>.P8.....%8..|
c6494750  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 21  |...............!|
c6494760  00 00 00 02 3e c6 03 e8  00 00 00 01 00 00 00 01  |....>...........|
c6494770  a0 04 26 a0 00 00 00 30  00 00 00 02 3e c6 04 10  |..&....0....>...|
c6494780  00 00 00 02 3e c6 03 c0  00 00 00 02 3e c6 50 28  |....>.......>.P(|
c6494790  00 00 00 01 c0 25 38 c8  00 00 00 00 00 00 00 00  |.....%8.........|
c64947a0  00 00 00 00 00 00 00 00  21 00 00 00 02 3e c6 04  |........!....>..|
c64947b0  10 00 00 00 01 00 00 00  01 a0 04 26 a0 00 00 00  |...........&....|
c64947c0  30 00 00 00 02 3e c6 04  38 00 00 00 02 3e c6 03  |0....>..8....>..|
c64947d0  e8 00 00 00 02 3e c6 4d  38 00 00 00 01 c0 25 38  |.....>.M8.....%8|
c64947e0  c8 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
c64947f0  00 21 00 00 00 02 3e c6  04 38 00 00 00 01 00 00  |.!....>..8......|
c6494800  00 01 a0 04 26 a0 00 00  00 30 00 00 00 02 3e c6  |....&....0....>.|
c6494810  04 60 00 00 00 02 3e c6  04 10 00 00 00 02 3e c6  |.`....>.......>.|
c6494820  4d 28 00 00 00 01 c0 25  38 c8 00 00 00 00 00 00  |M(.....%8.......|
c6494830  00 00 00 00 00 00 00 00  00 00 21 00 00 00 02 3e  |..........!....>|
c6494840  c6 04 60 00 00 00 01 00  00 00 01 a0 04 26 a0 00  |..`..........&..|
c6494850  00 00 30 00 00 00 02 3e  c6 04 88 00 00 00 02 3e  |..0....>.......>|
c6494860  c6 04 38 00 00 00 02 3e  c6 4a 38 00 00 00 01 c0  |..8....>.J8.....|
c6494870  25 38 c8 00 00 00 00 00  00 00 00 00 00 00 00 00  |%8..............|
c6494880  00 00 00 21 00 00 00 02  3e c6 04 88 00 00 00 01  |...!....>.......|
c6494890  00 00 00 01 a0 04 26 a0  00 00 00 30 00 00 00 02  |......&....0....|
c64948a0  3e c6 04 b0 00 00 00 02  3e c6 04 60 00 00 00 02  |>.......>..`....|
c64948b0  3e c6 4a 28 00 00 00 01  c0 25 38 c8 00 00 00 00  |>.J(.....%8.....|
c64948c0  00 00 00 00 00 00 00 00  00 00 00 00 21 00 00 00  |............!...|
c64948d0  02 3e c6 04 b0 00 00 00  01 00 00 00 01 a0 04 26  |.>.............&|
c64948e0  a0 00 00 00 30 00 00 00  02 3e c6 04 d8 00 00 00  |....0....>......|
c64948f0  02 3e c6 04 88 00 00 00  02 3e c6 47 38 00 00 00  |.>.......>.G8...|
c6494900  01 c0 25 38 c8 00 00 00  00 00 00 00 00 00 00 00  |..%8............|
c6494910  00 00 00 00 00 21 00 00  00 02 3e c6 04 d8 00 00  |.....!....>.....|
c6494920  00 01 00 00 00 01 a0 04  26 a0 00 00 00 30 00 00  |........&....0..|
c6494930  00 02 3e c6 05 00 00 00  00 02 3e c6 04 b0 00 00  |..>.......>.....|
c6494940  00 02 3e c6 47 28 00 00  00 01 c0 25 38 c8 00 00  |..>.G(.....%8...|
c6494950  00 00 00 00 00 00 00 00  00 00 00 00 00 00 21 00  |..............!.|
c6494960  00 00 02 3e c6 05 00 00  00 00 01 00 00 00 01 a0  |...>............|
c6494970  04 26 a0 00 00 00 30 00  00 00 02 3e c6 05 28 00  |.&....0....>..(.|
c6494980  00 00 02 3e c6 04 d8 00  00 00 02 3e c6 44 38 00  |...>.......>.D8.|
c6494990  00 00 01 c0 25 38 c8 00  00 00 00 00 00 00 00 00  |....%8..........|
c64949a0  00 00 00 00 00 00 00 21  00 00 00 02 3e c6 05 28  |.......!....>..(|
 It looks like something gets off track. The record at c649483b looks valid enough, but seems like it should end at c6494883. This would make sense -- there should be a *lot* more read (the length of the segment seems maybe wrong as well? The rough pattern of '21' every so many bytes continues for a loooong time. It's a 29G dump file and we're less than 4G through it.
Comment 25 Christopher Ebert CLA 2012-02-15 17:27:10 EST
So, having looked at the code and the output, I've decided that the original length must somehow be wrong - it's only a ~3G heap dump. The dump is much larger than that, so it ought to be a segment, or be much longer. From the data below in the file, it seems the latter.

The relevant bytes:
00601c00  02 1d 08 00 00 00 01 00  00 00 01 a0 00 0b b8 0c  |................|
00601c10  00 00 00 00 c5 e9 2a 1b  20 00 00 00 01 a5 fb 47  |......*. ......G|
00601c20  e8 00 00 00 01 00 00 00  01 a0 02 21 e8 00 00 00  |...........!....|
00601c30  03 5a d9 88 18 00 00 00  00 00 00 00 00 00 00 00  |.Z..............|
00601c40  03 5a d9 8a d0 00 00 00  00 00 00 00 00 00 00 00  |.Z..............|
00601c50  00 00 00 00 00 00 00 00  51 00 00 00 06 00 00 00  |........Q.......|
00601c60  01 a0 18 a4 58 0a 00 00  20 00 00 00 00 01 a2 99  |....X... .......|
00601c70  9f 90 02 00 00 00 03 5a  da 2f 80 00 00 00 01 a2  |.......Z./......|
00601c80  99 a0 70 02 00 00 00 03  5a da 2f 00 00 00 00 01  |..p.....Z./.....|
00601c90  a2 99 a0 98 02 00 00 00  03 5a da 2e 80 00 00 00  |.........Z......|
00601ca0  01 a2 99 a0 c0 02 00 00  00 03 5a da 2e 50 00 00  |..........Z..P..|
00601cb0  00 01 a2 99 a0 e8 02 00  00 00 01 c0 88 e2 78 00  |..............x.|
00601cc0  11 00 00 00 01 a2 99 9f  b0 02 00 00 00 01 a1 ad  |................|
00601cd0  93 70 02 00 00 00 01 a0  58 63 e0 0a 00 00 00 01  |.p......Xc......|
00601ce0  a2 99 73 10 04 00 00 00  01 a0 33 71 c8 0a 00 00  |..s.......3q....|
00601cf0  00 01 a2 99 a0 10 0a 00  00 00 01 a2 99 a0 30 0a  |..............0.|

The record starts at 601c0f, which is indeed 0c (12). I'm a little stumped, and beginning to think that jHat doesn't actually parse the file correctly, it just doesn't fail dramatically.


Here's my Java version:

cebert@cebert:~/workspace-misc$ java -version
java version "1.6.0_27"
Java(TM) SE Runtime Environment (build 1.6.0_27-b07)
Java HotSpot(TM) 64-Bit Server VM (build 20.2-b06, mixed mode)
Comment 26 Christopher Ebert CLA 2012-02-16 18:55:28 EST
OK. Looks like this is actually a VM problem. The length of the heap is > 4G. When I hack the code to tack a leading 6 on the length of the heap segment, it seems to parse.

I don't think there's anything MAT could do to fix this, really. The question is why is the JVM putting out an invalid dump? It seems that it's because it's under memory stress, as the dumps I get when everything is fine work. Unfortunately, memory stress is precisely when I want the dump. Maybe a feature to add some (long) constant to the size of the dump, estimated by the size of the file, would allow someone to process a heap such as I have.

And I guess I'll send a bug to Oracle, too. Or maybe they'll explain how it actually works. Thanks for all your help.
Comment 27 Andrew Johnson CLA 2012-08-27 05:36:51 EDT
I'm returning this defect as one as MAT attempting to guess how to parse an invalid dump sounds tricky, and those dumps don't seem to occur very frequently. We do now have some debug trace to help diagnose similar problems in the future.
Comment 28 Kevin Grigorenko CLA 2013-04-02 20:39:14 EDT
Hey Chris, we may have found a workaround for your bug report in bug #404679. Do you still have this dump to test the fix on?