Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 348028

Summary: OEM Error during pack200 of jpt.doc.isv jar
Product: [WebTools] WTP Releng Reporter: David Williams <david_williams>
Component: relengAssignee: David Williams <david_williams>
Status: RESOLVED FIXED QA Contact: David Williams <david_williams>
Severity: normal    
Priority: P3 CC: dgolovin, neil.hauge, tranle1
Version: 3.10Flags: david_williams: pmc_approved+
david_williams: pmc_approved? (raghunathan.srinivasan)
david_williams: pmc_approved? (naci.dai)
david_williams: pmc_approved? (deboer)
david_williams: pmc_approved? (neil.hauge)
david_williams: pmc_approved? (kaloyan)
david_williams: pmc_approved? (cbridgha)
Target Milestone: 3.10.0   
Hardware: PC   
OS: Linux   
Whiteboard: PMC_approved

Description David Williams CLA 2011-06-01 17:58:52 EDT
By chance, I happened to notice a VM dump on build machine ... and appeared to happen during our "promotion" of RC3. That's when we "pack200" the appropriate files. The dump contained this as command line: 


 200 2CIENVVAR      IBM_JAVA_COMMAND_LINE=/opt/public/webtools/apps/ibm-java2-sdk-5.0-12.1-linux-i386/jre/bin/pack200 -E4 /opt/public/webtools/committers/wtp-R3.3.0-S/20110527214303/S-3.3.0RC3-2011052     7214303/repository/plugins/org.eclipse.jpt.doc.isv_2.0.0.v201105240000.jar.pack.gz /opt/public/webtools/committers/wtp-R3.3.0-S/20110527214303/S-3.3.0RC3-20110527214303/repository/plugins/temp.or     g.eclipse.jpt.doc.isv_2.0.0.v201105240000.jar/org.eclipse.jpt.doc.isv_2.0.0.v201105240000.jar

Just wanted to capture it, in case happened again. We dont' even need to pack200 doc jars ... we may want to explicitly not pack that one, if some common problem.
Comment 1 David Williams CLA 2011-06-02 17:51:46 EDT
To cross reference, see also bug 348147.
Comment 2 David Williams CLA 2011-06-02 21:25:49 EDT
I do see that doc jar is the largest we have, nearly twice as big as the next largest ... so, maybe it really does need more memory allocated. I'll experiment. 

Here is the 5 largest jars we have ... I do wonder why that dali doc is so big ... to many bitmaps? Not enough jpegs? :) 

13116K 2011-05-31 15:45 org.eclipse.jpt.doc.isv_2.0.0.v201105240000.jar
  7929K 2011-05-31 15:46 org.eclipse.wst.jsdt.ui_1.1.100.v201105041953.jar
  5256K 2011-05-31 15:46 org.eclipse.wst.jsdt.doc_1.0.400.v201011052052.jar
  4613K 2011-05-31 15:46 org.eclipse.jst.jsf.doc.dev_1.2.0.v20100604.jar
  4299K 2011-05-31 15:46 org.eclipse.wst.jsdt.ui.source_1.1.100.v201105041953.jar
Comment 3 David Williams CLA 2011-06-02 21:26:31 EDT
It seems we currently do specify -Xms128M -Xmx256M when we run the pack200 to create the pack.gz files. 

At some level, pack200 is also ran when we sign (so it normalizes jars) but I don't see right off we set anything particular for mx in that case (though, we could, and maybe I just don't see it right now) ... but, for now, I'm assuming if that repack can run ok, then our pack run should be able to run easily enough (that is, no need to exclude from packing, or anything.
Comment 4 David Williams CLA 2011-06-03 02:56:51 EDT
Doing some "local" testing, it seems that jar can be pack200'd if we specify 1G for max heap. (512M was not enough). 

Therefore, I think the "quick fix" is to just use more memory when we do a promotion, which is when we process the artifacts. 

I'm going to see if I can recreate the repo for RC3 ... just to test it to make sure the quick fix works before we do the final build/promotion. 

Long term, we can consider other options, such as not packing that file at all ... or, better, making it smaller! :) 


[Note: one complication in debugging was that specifying -Xmx on the command line didn't seem to "take" so I had to use 'export IBM_JAVA_OPTIONS' in our promote script. The command line should take precedence, so not sure what's going on there ... some oddness in the scripts, I'd guess? ... but just wanted to make a note, to leave some trail of why that's there.]

Also, to cross reference, I opened bug 348166 since ideally the ant task would throw a build-failed, or similar. 

Also, to cross reference, I opened bug 348167 on ourselves, since there are ways we can detect bad pack.gz files using b2 aggregator. There might be other advantages to using the aggregator, so just wanted to capture this reason too.
Comment 5 David Williams CLA 2011-06-03 04:01:17 EDT
*** Bug 348147 has been marked as a duplicate of this bug. ***
Comment 6 David Williams CLA 2011-06-03 04:05:22 EDT
The fix, using 1G max heap seemed to work ok. A new, larger jpt.doc bundle was created that I could unzip with gunzip. 

It was tricky actually "patching" the repo, since, apparently, pack200 creates the new pack.gz file with the same date/time stamp as the jar, so nothing would get replace, with the way I have rsync set. So, rather than figuring that out, I think safer to just remove (rename) the original 'repository' and replace the whole thing with the newly generated repository. So, there would have been a few minutes this evening when the repo would have been even more broken ... hopefully mirrors, etc, will get the fix ... but, at any rate, we are ready for our final build and final promotion. 

Thanks again,
Comment 7 David Williams CLA 2011-06-03 04:09:18 EDT
Marking 'PMC' for awareness. We'd basically have to do something (can't leave bad packed files in repo, for ever) so its a question of fixing releng scripts now, or "hand fixing" all the repos generated in the future (until fixed some other way). 

This won't effect code, or anything, but since it is a change, wanted to be sure rest of PMC was aware.
Comment 8 Neil Hauge CLA 2011-06-03 10:33:47 EDT
(In reply to comment #2)
> I do see that doc jar is the largest we have, nearly twice as big as the next
> largest ... so, maybe it really does need more memory allocated. I'll
> experiment. 
> 
> Here is the 5 largest jars we have ... I do wonder why that dali doc is so big
> ... to many bitmaps? Not enough jpegs? :) 
> 
> 13116K 2011-05-31 15:45 org.eclipse.jpt.doc.isv_2.0.0.v201105240000.jar
>   7929K 2011-05-31 15:46 org.eclipse.wst.jsdt.ui_1.1.100.v201105041953.jar
>   5256K 2011-05-31 15:46 org.eclipse.wst.jsdt.doc_1.0.400.v201011052052.jar
>   4613K 2011-05-31 15:46 org.eclipse.jst.jsf.doc.dev_1.2.0.v20100604.jar
>   4299K 2011-05-31 15:46
> org.eclipse.wst.jsdt.ui.source_1.1.100.v201105041953.jar


The Dali doc.isv jar is simply the Javadoc for the Dali provisional API.  The Dali provisional API is huge.  I have been generating the Javadoc with the standard Eclipse options, which produces a Hierarchy Tree page, and this seems to be the bulk of the problem.  This file alone is 173MB, about 2/3 of the contents of that jar (uncompressed).  I think foregoing the Hierarchy Tree would make sense, as it doesn't really add any value to someone who is using the Eclipse IDE.  And it should shave about 8MB off of the WTP install zip.

I'll open another bug for this and perhaps we can respin RC4 to yank that file?
Comment 9 David Williams CLA 2011-06-03 12:31:48 EDT
> 
> I'll open another bug for this and perhaps we can respin RC4 to yank that file?

It'd be ok by me, since we want to respin anyway and a "wasted" 8 M does seem worth attending to. (But, wouldn't respin just for that :)
Comment 10 David Williams CLA 2011-06-04 12:53:06 EDT
The new doc bundle was much better behaved. So much better, the pack200 step ran fine with default max heap settings. I did, though, leave commented out comments and options in the promote.sh file, in case it happens again, in future, and we need to increase the heap again. 

I suppose, in theory, we could add some checks in our promote script, to check the file system before and after the run, to see if any "javacore.* files were created during the run, and flag an error if so, to make it more obvious an error occurred, but ... that seems a bit hacky, so would only suggest that if the problem happened again, in future. 

For some final (approximate) numbers on the doc jar: 

                 old       new

size of jar:     13M       10M
size unzipped:  256M       93M
size of pack.gz   7M        4M 

since the jar itself didn't change size all that much, my guess is that
the OEM was caused during processing of that one huge "tree" file. That is, 
by example, pack200 could probably handle a 400M jar file, if all files in it were relatively small, and similarly, might fail on a 5M jar file, if one of its uncompressed files were huge.