| Summary: | Status update and commits slow on CDT repo | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Technology] EGit | Reporter: | Doug Schaefer <cdtdoug> | ||||
| Component: | Core | Assignee: | Project Inbox <egit.core-inbox> | ||||
| Status: | CLOSED FIXED | QA Contact: | |||||
| Severity: | major | ||||||
| Priority: | P3 | CC: | aleherb+eclipse, alex.blewitt, angvoz.dev, bugs.eclipse.org, caniszczyk, d_a_carver, jamesblackburn+eclipse, jesper.eskilson, malaperle, marc.khouzam, markus.duft, matthias.sohn, pwebster, remy.suen, robin.rosenberg, sop | ||||
| Version: | 1.0 | ||||||
| Target Milestone: | --- | ||||||
| Hardware: | PC | ||||||
| OS: | Windows 7 | ||||||
| Whiteboard: | |||||||
| Attachments: |
|
||||||
|
Description
Doug Schaefer
I can reproduce this on the Mac too. 80% of the time is spent writing/deleting .git/objects/nozXXXX.tmp during commit 10% is scanning full directory for changed files in repository prior to commit dialog Seems to traverse ignorable directories (eg target,bin) in full sweep Does the repository have a lot of directories? I suspect all of the writing/deleting activity in .git/objects is the creation of trees that already exist, but are thought to be potentially modified by the DirCache because we still don't fully support the notion of the TREE extension inside of the DirCache file. I should probably modify the inserter code to try and favor computing the SHA-1 of a tree and testing if it exists before writing the tree out to disk; trees in general are small and unfortunately currently very likely to already exist, so avoiding spooling them to disk is worthwhile. Jgit also seems to be trying to open a .git folder per folder as well (whereas cgit doesn't). Also fwiw cgit is doing an lstat on files & skipping if no change whereas jgit is doing a stat followed by an open Git repo has 12k files in it (at least in the +edc one) probably 200-500 directories? sudo fs_usage | grep clipse Shows what is going on under the covers (In reply to comment #3) > Jgit also seems to be trying to open a .git folder per folder as well (whereas > cgit doesn't). Ouch! This is caused by FileTreeIterator.FileEntry constructor running a stat() call for entry/.git whenever entry is a directory. This is necessary to set the file mode right, so the damn thing sorts in the right place in the DirCache. Linus Torvalds made a huge mistake here, and we get to pay for it. I honestly don't know how to work around the stat(). The only solution might be to always use NameConflictTreeWalk when you care about submodules, remove this check from FileEntry's constructor, and push it instead into the code that manages the index. That is, when doing an add to the DirCache, stat for ".git" to see if this should be a submodule addition. When comparing modes, if the index says FileMode.GITLINK and the working tree says FileMode.TREE, re-check the working tree to see if its actually FileMode.GITLINK with the additional entry/.git status check. I talked briefly to Junio Hamano about this, he says C Git only scans the index and does lstat()s on the paths exactly as they are listed in the index, and thus avoids the need to merge the working tree against the DirCache. However, this only works if you don't care about untracked files. We almost always care about untracked files in EGit, because the status decorators want to show you what you can add to the repository if it has not been added yet, and is not ignored. > Also fwiw cgit is doing an lstat on files & skipping if no > change whereas jgit is doing a stat followed by an open This sounds like we are deciding the entry is racily clean, or just assume it is dirty. We should certainly be skipping the open() on almost all files. In egit at least, we probably only care about untracked files that are reachable from the workspace (ie open projects). In my test I had one project of the many that CDT has open; I'd expect it not to pick up untracked files in other directories outside of that project, even if they are in the same repo (or that it was an option which could be enabled/disabled) We are also looking at files we know won't be in the tree (eg those specified by gitignore) when we shouldn't need to. Can egit pass a filter to jgit to say "only consider these directories"? (In reply to comment #7) > In egit at least, we probably only care about untracked files that are > reachable from the workspace (ie open projects). This is what people usually expect from a team provider. At the very least the behaviour of whether or not to stat non-Eclipse controlled resources should be configurable. The reason CVS/SVN etc. can be much faster at commit is because they use the core.resources resource deltas. Egit should could be notified about resource deltas from the workspace as it happens rather than stating the entire tree every time. Here's some data I pulled from both git and egit whilst doing a commit: Native Git 20:13:14 lstat all/org.eclipse.cdt.core.linux.ia64/.project 0.000004 git 20:13:14 lstat all/org.eclipse.cdt.core.linux.ia64/META-INF/MANIFEST.MF 0.000004 git 20:13:14 lstat all/org.eclipse.cdt.core.linux.ia64/about.html 0.000004 git 20:13:14 lstat all/org.eclipse.cdt.core.linux.ia64/build.properties 0.000004 git 20:13:14 lstat all/org.eclipse.cdt.core.linux.ia64/os/linux/ia64/libpty.so 0.000006 git 20:13:14 lstat all/org.eclipse.cdt.core.linux.ia64/os/linux/ia64/libspawner.so 0.000004 git 20:13:14 lstat all/org.eclipse.cdt.core.linux.ppc/.project 0.000004 git JGit (0.12.1) 20:14:46 stat64 /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/META-INF 0.000005 eclipse 20:14:46 stat64 /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/META-INF/.git 0.000004 eclipse 20:14:46 stat64 /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/about.html 0.000005 eclipse 20:14:46 access /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/about.html 0.000005 eclipse 20:14:46 stat64 /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/build.properties 0.000006 eclipse 20:14:46 access /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/build.properties 0.000005 eclipse 20:14:46 stat64 /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/os 0.000005 eclipse 20:14:46 stat64 /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/os/.git 0.000004 eclipse 20:14:46 stat64 /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/.project 0.000005 eclipse 20:14:46 stat64 /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/.project 0.000005 eclipse 20:14:46 stat64 /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/META-INF 0.000005 eclipse 20:14:46 stat64 /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/META-INF 0.000005 eclipse 20:14:46 getattrlist /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64 0.000004 eclipse 20:14:46 getattrlist /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/META-INF 0.000004 eclipse 20:14:46 open /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/META-INF 0.000007 eclipse followed by (main time) 20:16:17 close 0.000040 eclipse 20:16:17 lstat64 /private/var/tmp/cdt/.git/objects/noz5040580375106656155.tmp 0.000007 eclipse 20:16:17 WrData[async] 0.001492 W eclipse 20:16:17 unlink /private/var/tmp/cdt/.git/objects/noz5040580375106656155.tmp 0.001600 W eclipse 20:16:17 open /private/var/tmp/cdt/.git/objects/noz2160820080515860388.tmp 0.000092 eclipse 20:16:17 close 0.000007 eclipse 20:16:17 stat64 /private/var/tmp/cdt/.git/objects/noz2160820080515860388.tmp 0.000009 eclipse 20:16:17 open /private/var/tmp/cdt/.git/objects/noz2160820080515860388.tmp 0.000022 eclipse 20:16:17 write 0.000025 eclipse 20:16:17 write 0.000006 eclipse 20:16:17 close 0.000031 eclipse 20:16:17 lstat64 /private/var/tmp/cdt/.git/objects/noz2160820080515860388.tmp 0.000007 eclipse 20:16:17 WrData[async] 0.000527 W eclipse 20:16:17 unlink /private/var/tmp/cdt/.git/objects/noz2160820080515860388.tmp 0.000632 W eclipse 20:16:17 open /private/var/tmp/cdt/.git/objects/noz7446803944790488252.tmp 0.000057 eclipse 20:16:17 close 0.000005 eclipse 20:16:17 stat64 /private/var/tmp/cdt/.git/objects/noz7446803944790488252.tmp 0.000008 eclipse 20:16:17 open /private/var/tmp/cdt/.git/objects/noz7446803944790488252.tmp 0.000021 eclipse Created attachment 197982 [details]
Filesystem access during commit message
21:59:37 - start of commit dialog
21:59:37 - stat64 of .git in all directories
21:59:37-44 no access (maybe I opened the commit dialog here...)
21:59:44-54 conclusion of stat64 of .git in all directories
So, best part of 10s+ to show commit dialog in first place
22:00:01 - commit dialog 'commit' pressed
22:00:01 - stat64 of .git in all directories
22:00:01-32 - generation of .git/objects/noz*.tmp files
22:00:32-34 - end of commit dialog
Also: CDT has 2400 directories, not 250. Oops ... Mounted the CDT folder in a RAM disk and same commit takes ~1s. Seems that the noz*.tmp object creation/deletion is the latency problem here. After doing a fresh clone, things weren't to bad, but after doing a number of git operations, creating branches, merging, etc. I started to see the 30 second performance again. Any possibility of addressing this early in the upcoming EGit development cycle. It's starting to make the CDT folks really edgy about moving over to git. Yes there is a work around (go to the command line), but that solution won't go over very well. See https://bugs.eclipse.org/bugs/show_bug.cgi?id=316208 for more information about the CDT conversion to git. I would be happy just having an EGit committer clone our repo and test with it for a while and give us their blessing that EGit is ready for us. (In reply to comment #14) > I would be happy just having an EGit committer clone our repo and test with it > for a while and give us their blessing that EGit is ready for us. I am going to update my CDT installation to Indigo and will do some testing on your repository (I am working on Mac) BTW, setup instructions are here:
http://wiki.eclipse.org/CDT/git
This covers the build dependencies, which aren't too much.
my system: - MacBook Pro 2.5 GHz i5 - 4GB 1067 MHz DDR3 RAM - Mac OS X 10.6.7 preparation: - updated my CDT to Indigo RC5, EGit/JGit to 1.0.0.201106090707-r - cloned org.eclipse.cdt.git and imported all its projects - added .gitignore in repository root containing "bin/" to exclude all binary folders created by the build (any reason why this isn't in the repository ?) - switched off automatic build in Eclipse in order not to get disturbed by incremental builds - opened the following git views : - History View (configured to show all branches and follow selection) - Repositories View - Staging View tests: - modify e.g. org.eclipse.cdt.internal.core.macosx.ProcessInfo and save the modification -> observations (repeated multiple times) - decorators react with delay of a couple of seconds (5-10) this delay is probably caused by the background repository change scanner job which is scheduled to run every 10 seconds. - staging view reacts almost instantaneously - drag & drop the modified file from unstaged to staged pane - commit from staging view finishes in around 1 second - commit dialog opens in < 1 sec - commit from commit dialog finishes in around 1 second So on my box this looks usable for cdt repository ... When trying the same with the linux kernel repository performance looks a lot worse. Especially history view seems to eat a lot of memory. Also decoration takes a lot of time, so it seems the decorator job has to learn from the staging view which updates quickly also on linux kernel repository. So as a workaround until we have improvements you could try to switch off the git decorator (Preferences > General > Appearance > Label Decorations > Git) and the background scanning job (Preferences > Team > Git > Refresh resources when index changes") and use the staging view instead. I'll try to look into that in some more detail next week. I'm on a T610 running F15, I noticed similar performance characteristics to Matthias, nothing really taking longer than a couple seconds to do. I wonder if we should revisit our native lstat() patch to further help with performance. If any CDT folks could help with that effort that would be nice. http://egit.eclipse.org/r/#change,1815 (In reply to comment #18) > I'm on a T610 running F15, I noticed similar performance characteristics to > Matthias, nothing really taking longer than a couple seconds to do. I wonder if > we should revisit our native lstat() patch to further help with performance. If > any CDT folks could help with that effort that would be nice. > > http://egit.eclipse.org/r/#change,1815 Keep playing with it, you'll eventually run into the 30 second update/commit. Try a few branches and rebases. We'd be happy to help test anything you can put together. (In reply to comment #0) > A commit also takes about 30 seconds to determine changed files. Sorry, just to be sure since I haven't used EGit in like a year, we're talking about the progress dialog that says 'Calculating changes in selected resources' after going 'Team > Commit...', correct? If it is that one I'm getting sub-par speeds. It was really slow on the first time (possibly a minute), now it's taking maybe 3-5 seconds every go. I haven't even tried branching and rebasing yet. This is on Windows XP, I'm running 4.1. (In reply to comment #18) > I'm on a T610 running F15, I noticed similar performance characteristics to > Matthias, nothing really taking longer than a couple seconds to do. I wonder if > we should revisit our native lstat() patch to further help with performance. If > any CDT folks could help with that effort that would be nice. > > http://egit.eclipse.org/r/#change,1815 I think this patch is ready to accept, except build integration. The open question for the native lstat patch is how to run the native maven build on the required platforms and how to integrate it into the java build on hudson, if anyone can provide some hints or even better an example I could take that up. (In reply to comment #21) > (In reply to comment #18) > > I'm on a T610 running F15, I noticed similar performance characteristics to > > Matthias, nothing really taking longer than a couple seconds to do. I wonder if > > we should revisit our native lstat() patch to further help with performance. If > > any CDT folks could help with that effort that would be nice. > > > > http://egit.eclipse.org/r/#change,1815 > > I think this patch is ready to accept, except build integration. The open > question for the native lstat patch is how to run the native maven build on the > required platforms and how to integrate it into the java build on hudson, if > anyone can provide some hints or even better an example I could take that up. You may actually want to look at: http://www.jfrog.com/sites/jade-plugins/1.3/jade-plugin-parent/jade-build-plugin-parent/jade-native-plugin/index.html I'd only worry about a couple common platforms, and if something doesn't support a native compiled version, fall back to the current method, at least until something faster comes along. The above plugin is a combination of the Maven Native plugin, and the Maven NAR plugin. It does have several examples of how to use it. We can possibly build for 3 platforms at least, Mac, Windows, and Linux as we have slaves for Hudson that run on all three. Or you can do what CDT does. Assuming the binary doesn't change very often, you could check in the native library. If you still have work to do on it, then that's not a very good option. Dave mentions an option. I'm not sure there a maven plug-in that lets you just run a shell command, which you could use to call make on a Makefile. I'm removing this bug from the blocker list on CDT's move to git. While this is still an annoyance, I am glad you guys are looking at it. As I mentioned, we'll be happy to test any changes as soon as you have them available. (In reply to comment #23) > Or you can do what CDT does. Assuming the binary doesn't change very often, you > could check in the native library. > > If you still have work to do on it, then that's not a very good option. Dave > mentions an option. I'm not sure there a maven plug-in that lets you just run a > shell command, which you could use to call make on a Makefile. Yes there is a plugin for that. http://mojo.codehaus.org/exec-maven-plugin/ Allows you to run any command/script from within maven. (In reply to comment #22) > (In reply to comment #21) > > (In reply to comment #18) > > > I'm on a T610 running F15, I noticed similar performance characteristics to > > > Matthias, nothing really taking longer than a couple seconds to do. I wonder if > > > we should revisit our native lstat() patch to further help with performance. If > > > any CDT folks could help with that effort that would be nice. > > > > > > http://egit.eclipse.org/r/#change,1815 > > > > I think this patch is ready to accept, except build integration. The open > > question for the native lstat patch is how to run the native maven build on the > > required platforms and how to integrate it into the java build on hudson, if > > anyone can provide some hints or even better an example I could take that up. > > > You may actually want to look at: > > http://www.jfrog.com/sites/jade-plugins/1.3/jade-plugin-parent/jade-build-plugin-parent/jade-native-plugin/index.html > > I'd only worry about a couple common platforms, and if something doesn't > support a native compiled version, fall back to the current method, at least > until something faster comes along. > > The above plugin is a combination of the Maven Native plugin, and the Maven NAR > plugin. It does have several examples of how to use it. > > We can possibly build for 3 platforms at least, Mac, Windows, and Linux as we > have slaves for Hudson that run on all three. We already have a working maven build which can do the native compilation using the native-maven-plugin, look at [1]. The open question is how to share the build results between the different build jobs we need to run on different platforms to do the native compilation (Linux, Mac, Windows). Ideally we could just deploy the individual results to a central Nexus maven repository. This would also help local builds which could simply download the native build results from Nexus. Is there central Maven repository available at Eclipse ? Is there any other project at Eclipse doing something similar so that I could learn from their build setup on hudson ? [1] http://egit.eclipse.org/r/#patch,unified,1815,12,libjgit_native/pom.xml http://egit.eclipse.org/r/#patch,unified,1815,12,pom.xml Yes there is an internal maven repository, the details are on the Maven Eclipse wiki page. Change http://egit.eclipse.org/r/3778 may help much of the temporary file IO that is occurring in Alex's trace: 11146 22:00:25 open /private/var/tmp/cdt/.git/objects/noz177171467416318152.tmp 0.000043 eclipse 11147 22:00:25 stat64 /private/var/tmp/cdt/.git/objects/noz177171467416318152.tmp 0.000006 eclipse 11148 22:00:25 open /private/var/tmp/cdt/.git/objects/noz177171467416318152.tmp 0.000019 eclipse 11149 22:00:25 lstat64 /private/var/tmp/cdt/.git/objects/noz177171467416318152.tmp 0.000006 eclipse 11150 22:00:25 unlink /private/var/tmp/cdt/.git/objects/noz177171467416318152.tmp 0.004007 W eclipse 11151 22:00:25 open /private/var/tmp/cdt/.git/objects/noz1575388353581733414.tmp 0.000137 eclipse 11152 22:00:25 stat64 /private/var/tmp/cdt/.git/objects/noz1575388353581733414.tmp 0.000018 eclipse 11153 22:00:25 open /private/var/tmp/cdt/.git/objects/noz1575388353581733414.tmp 0.000044 eclipse 11154 22:00:25 lstat64 /private/var/tmp/cdt/.git/objects/noz1575388353581733414.tmp 0.000012 eclipse 11155 22:00:25 unlink /private/var/tmp/cdt/.git/objects/noz1575388353581733414.tmp 0.003602 W eclipse http://egit.eclipse.org/r/#change,3809 should reduce object allocations when switching branches http://egit.eclipse.org/r/3778 was merged as 1eecc82cec2409ed3b69cf96f3206a2fb6c7b9f1 (In reply to comment #29) > http://egit.eclipse.org/r/#change,3809 should reduce object allocations when > switching branches http://egit.eclipse.org/r/#change,3809 was merged as 1067f82f562231cca1a7d75c6f0c2d2ce04c7381 New decorator implementation http://egit.eclipse.org/r/#change,4200 was merged as f332331f4c3b5e69be2846213908acc12b3cd109 and improves performance of the git decorator considerably. Cool. I haven't seen any improvement in 1.1. Hopefully this will help. (In reply to comment #33) > Cool. I haven't seen any improvement in 1.1. Hopefully this will help. Could you retry with a recent nightly build ? Performance has been greatly enhanced in 2.2. Please reopen and add details if you disagree. |