Community
Participate
Working Groups
Build Identifier: 20110916-0149 I tried cloning a remote repository which is >800MB. I got this exception: java.lang.OutOfMemoryError: Java heap space at org.eclipse.jgit.transport.PackParser.inflateAndReturn(PackParser.java:1436) at org.eclipse.jgit.transport.PackParser.resolveDeltas(PackParser.java:587) at org.eclipse.jgit.transport.PackParser.resolveDeltas(PackParser.java:568) at org.eclipse.jgit.transport.PackParser.resolveDeltas(PackParser.java:532) at org.eclipse.jgit.transport.PackParser.parse(PackParser.java:489) at org.eclipse.jgit.storage.file.ObjectDirectoryPackParser.parse(ObjectDirectoryPackParser.java:178) at org.eclipse.jgit.transport.PackParser.parse(PackParser.java:431) at org.eclipse.jgit.transport.BasePackFetchConnection.receivePack(BasePackFetchConnection.java:672) at org.eclipse.jgit.transport.BasePackFetchConnection.doFetch(BasePackFetchConnection.java:284) at org.eclipse.jgit.transport.BasePackFetchConnection.fetch(BasePackFetchConnection.java:229) at org.eclipse.jgit.transport.FetchProcess.fetchObjects(FetchProcess.java:225) at org.eclipse.jgit.transport.FetchProcess.executeImp(FetchProcess.java:151) at org.eclipse.jgit.transport.FetchProcess.execute(FetchProcess.java:113) at org.eclipse.jgit.transport.Transport.fetch(Transport.java:1062) at org.eclipse.jgit.api.FetchCommand.call(FetchCommand.java:136) at org.eclipse.jgit.api.CloneCommand.fetch(CloneCommand.java:174) at org.eclipse.jgit.api.CloneCommand.call(CloneCommand.java:118) at org.eclipse.egit.core.op.CloneOperation.run(CloneOperation.java:142) at org.eclipse.egit.ui.internal.clone.GitCloneWizard.executeCloneOperation(GitCloneWizard.java:306) at org.eclipse.egit.ui.internal.clone.GitCloneWizard.access$3(GitCloneWizard.java:299) at org.eclipse.egit.ui.internal.clone.GitCloneWizard$5.run(GitCloneWizard.java:278) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54) I tried first with standard configuration. Then I increased the heap space to 512MB. It did not work either. In the end it workes with 2GB heap space. But this is no solution. Reproducible: Always
What's the size of the largest blob?
With blob you mean a binary file? The largests is about 20MB. But in total the branch is about 600MB. With lots of small files (~ 16000 files).
Shawn, any clue as to what may cause this?
Looks like this is a pretty old build of JGit, from about 1 year ago? PackParser is in the middle of resolving a delta chain by applying each child delta on top of the base. As it does this it retains the base object in memory so that a sibling delta can be applied after this delta is processed. If the object being processed is 20 MiB and the delta chain is 10 objects deep, this is a minimum of 200 MiB of RAM required to process the chain. In practice a delta chain can easily be as deep as 50 objects. If we assume the a chain depth of about 50 here, a 512 MiB heap would easily overflow if each object was just 10 MiB in size. JGit needs a bunch of additional working memory to handle the set of objects being processed from the pack file, etc. We try hard to avoid unnecessary memory usage, but we also don't spill working set to disk, we assume we can get sufficient RAM allocated from the JVM heap to do what we need in this operation without writing our own swap management for transient working set data. It is entirely possible you need nearly 2 GiB in the JVM heap to process an 800 MiB repository. One of the bigger impacts to memory usage in Git and JGit isn't the size, but the number of objects. The Linux kernel is some 2.2M objects and yet only 400 MiB of data. 2.2M objects each needing more than 100 bytes of RAM to track in the transient working set is over 220 MiB of RAM. Factor in additional space needed by the Java GC to be free for upcoming allocations, and all of the other stuff running around in the process, and yes... you may need 2 GiB in Eclipse to process a big repository. Native Git gets around this by forking a new UNIX process for each operation and just letting the OS assign 2G or more of RAM to that process, then returning it all to the OS when the command finishes and the process dies. JGit assumes the JVM can provide the RAM. We may have to change PackParser to start spilling to temporary files on disk beyond a certain amount of RAM used, and implement our own swap system in Java if we can't get the memory we need from the JVM heap.
It's old, but I don't think we've had dramatic changes in this area. So, using the aggressive packing option on the server hurts JGit, if that is the issue here. Overflowing to disk, don't we have vm for that? Normally oversizing the mx parameter isn't bad, if it's just temporary and does norm overflow your available RAM, but from my observations of Eclipse it seems it does not release memory to the OS as Java apps seems to do. I therefore did and experiment and by chance (used a random google search result) [1] I tried these options: -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=25 Maximum memory usage is about the same, but the heap is kept smaller. The major drawback is probably increased CPU usage since the GC has to do more work to keep the heap small. On the positive side, the drawbacks of using a large MX isn't as severe since, more like a normal application. [1] http://stackoverflow.com/questions/3776041/eclipse-release-heap-back-to-system
(In reply to comment #5) > It's old, but I don't think we've had dramatic changes in this area. You are right, we have not had many changes in this section. I thought I had implemented a feature to discard the objects at the top of the delta tree when memory ran low, but I didn't. This is actually hard to do in the JVM because every "new" call must be wrapped with try/catch. Within a dedicated command line JGit tool, we might be able to do this if the only allocation activity is in this section of code. Within EGit that is impossible, the workbench could be doing other work at the same time. So I didn't even try to release memory from the higher levels of the delta tree. > So, using the aggressive packing option on the server hurts JGit, if that > is the issue here. Possible. But a depth of 50 is common, and is still a problem. --aggressive can use an even deeper delta chain, and yes that hurts even more if the object being delta compressed is large. > Overflowing to disk, don't we have vm for that? Yes, but we only get virtual memory if the JVM heap was sized big enough to begin with. And usually the JVM heap going to swap causes some really bad performance for the overall system. We "know" we don't need the object data until later, and we know how we will use it. An explicit movement of this data to disk once we get over some small threshold (e.g. 50,000 objects) may out-perform any attempt made by the OS to manage our memory. But it will slow us down if the system has sufficient RAM. And it makes the code a lot more complex. What is ugly about doing our own explicit VM here is we have to sort the data. Right now we rely on Arrays.sort() to do that for us in RAM, using the JVM's native sorting routine. If we spill to disk we have to implement our own disk based merge-sort algorithm. Doable but annoying. > Normally oversizing the mx parameter isn't bad, if it's just temporary and > does norm overflow your available RAM, but from my observations of Eclipse > it seems it does not release memory to the OS as Java apps seems to do. I > therefore did > and experiment and by chance (used a random google search result) [1] I > tried these options: > > -XX:+UnlockExperimentalVMOptions > -XX:+UseG1GC > -XX:MinHeapFreeRatio=5 > -XX:MaxHeapFreeRatio=25 > > Maximum memory usage is about the same, but the heap is kept smaller. The > major > drawback is probably increased CPU usage since the GC has to do more work to > keep the heap small. On the positive side, the drawbacks of using a large MX > isn't as severe since, more like a normal application. > > [1] > http://stackoverflow.com/questions/3776041/eclipse-release-heap-back-to- > system Thing is, this isn't the normal way to run Eclipse. Making users restart their workspace with special JVM flags that are requiring experimental VM options just to use EGit against a large repository is pretty annoying for the end-user.
*** Bug 388106 has been marked as a duplicate of this bug. ***
We see exactly the same OOME traceback when cloning, our environment: - EGit 4.1.1.201511131810-r on Eclipse 4.5.2 (Mars SR2) - Windows 7 64-bit, Oracle JRE 1.8.0_45, -vmargs -client -Xms40m -Xmx768m - jgit java7 fragment is not installed The repo is around 1GB in size. Increasing to -Xmx2048m resolves the issue. But we are reluctant increasing the -Xmx in the product for all users, since the product may also run on smaller machines. And very few users have large repositories. My question: Can we expect better heap efficiency with newer EGit-3.6 snapshot improvements like https://bugs.eclipse.org/bugs/show_bug.cgi?id=440722 ? Could a fix for https://bugs.eclipse.org/bugs/show_bug.cgi?id=388582 help ? CQ:WIND00-WB4-6286 Or will we have to live with "large git repos take much heap on EGit" ? Could Eclipse maybe assist users increasing their -Xmx when the default config is low?