Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 349361

Summary: Status update and commits slow on CDT repo
Product: [Technology] EGit Reporter: Doug Schaefer <cdtdoug>
Component: CoreAssignee: Project Inbox <egit.core-inbox>
Status: CLOSED FIXED QA Contact:
Severity: major    
Priority: P3 CC: aleherb+eclipse, alex.blewitt, angvoz.dev, bugs.eclipse.org, caniszczyk, d_a_carver, jamesblackburn+eclipse, jesper.eskilson, malaperle, marc.khouzam, markus.duft, matthias.sohn, pwebster, remy.suen, robin.rosenberg, sop
Version: 1.0   
Target Milestone: ---   
Hardware: PC   
OS: Windows 7   
Whiteboard:
Attachments:
Description Flags
Filesystem access during commit message none

Description Doug Schaefer CLA 2011-06-14 15:01:21 EDT
Repo is here:

http://git.eclipse.org/c/cdt/test3/org.eclipse.cdt.git

Build: 1.0.0.201106090707-r
OS: Windows 7 (I'll try Linux next).

Changing a file takes about 30 seconds for the '>' to show up in the Package Explorer. A commit also takes about 30 seconds to determine changed files.

With the command line git, these operations, 'git status' and 'git commit -a' are almost instantaneous.
Comment 1 Alex Blewitt CLA 2011-06-14 15:27:37 EDT
I can reproduce this on the Mac too. 

80% of the time is spent writing/deleting .git/objects/nozXXXX.tmp during commit

10% is scanning full directory for changed files in repository prior to commit dialog 

Seems to traverse ignorable directories (eg target,bin) in full sweep
Comment 2 Shawn Pearce CLA 2011-06-14 15:31:00 EDT
Does the repository have a lot of directories?

I suspect all of the writing/deleting activity in .git/objects is the creation of trees that already exist, but are thought to be potentially modified by the DirCache because we still don't fully support the notion of the TREE extension inside of the DirCache file.

I should probably modify the inserter code to try and favor computing the SHA-1 of a tree and testing if it exists before writing the tree out to disk; trees in general are small and unfortunately currently very likely to already exist, so avoiding spooling them to disk is worthwhile.
Comment 3 Alex Blewitt CLA 2011-06-14 15:33:12 EDT
Jgit also seems to be trying to open a .git folder per folder as well (whereas cgit doesn't). Also fwiw cgit is doing an lstat on files & skipping if no change whereas jgit is doing a stat followed by an open
Comment 4 Alex Blewitt CLA 2011-06-14 15:35:38 EDT
Git repo has 12k files in it (at least in the +edc one) probably 200-500 directories?
Comment 5 Alex Blewitt CLA 2011-06-14 15:40:26 EDT
sudo fs_usage | grep clipse

Shows what is going on under the covers
Comment 6 Shawn Pearce CLA 2011-06-14 15:47:28 EDT
(In reply to comment #3)
> Jgit also seems to be trying to open a .git folder per folder as well (whereas
> cgit doesn't).

Ouch! This is caused by FileTreeIterator.FileEntry constructor running a stat() call for entry/.git whenever entry is a directory. This is necessary to set the file mode right, so the damn thing sorts in the right place in the DirCache.

Linus Torvalds made a huge mistake here, and we get to pay for it.

I honestly don't know how to work around the stat(). The only solution might be to always use NameConflictTreeWalk when you care about submodules, remove this check from FileEntry's constructor, and push it instead into the code that manages the index. That is, when doing an add to the DirCache, stat for ".git" to see if this should be a submodule addition. When comparing modes, if the index says FileMode.GITLINK and the working tree says FileMode.TREE, re-check the working tree to see if its actually FileMode.GITLINK with the additional entry/.git status check.

I talked briefly to Junio Hamano about this, he says C Git only scans the index and does lstat()s on the paths exactly as they are listed in the index, and thus avoids the need to merge the working tree against the DirCache. However, this only works if you don't care about untracked files. We almost always care about untracked files in EGit, because the status decorators want to show you what you can add to the repository if it has not been added yet, and is not ignored.

> Also fwiw cgit is doing an lstat on files & skipping if no
> change whereas jgit is doing a stat followed by an open

This sounds like we are deciding the entry is racily clean, or just assume it is dirty. We should certainly be skipping the open() on almost all files.
Comment 7 Alex Blewitt CLA 2011-06-14 16:03:56 EDT
In egit at least, we probably only care about untracked files that are reachable from the workspace (ie open projects). In my test I had one project of the many that CDT has open; I'd expect it not to pick up untracked files in other directories outside of that project, even if they are in the same repo (or that it was an option which could be enabled/disabled)

We are also looking at files we know won't be in the tree (eg those specified by gitignore) when we shouldn't need to. 

Can egit pass a filter to jgit to say "only consider these directories"?
Comment 8 James Blackburn CLA 2011-06-14 16:13:08 EDT
(In reply to comment #7)
> In egit at least, we probably only care about untracked files that are
> reachable from the workspace (ie open projects). 

This is what people usually expect from a team provider. At the very least the behaviour of whether or not to stat non-Eclipse controlled resources should be configurable.  

The reason CVS/SVN etc. can be much faster at commit is because they use the core.resources resource deltas.  Egit should could be notified about resource deltas from the workspace as it happens rather than stating the entire tree every time.
Comment 9 Alex Blewitt CLA 2011-06-14 16:58:28 EDT
Here's some data I pulled from both git and egit whilst doing a commit:


Native Git

20:13:14  lstat             all/org.eclipse.cdt.core.linux.ia64/.project                                     0.000004   git         
20:13:14  lstat             all/org.eclipse.cdt.core.linux.ia64/META-INF/MANIFEST.MF                         0.000004   git         
20:13:14  lstat             all/org.eclipse.cdt.core.linux.ia64/about.html                                   0.000004   git         
20:13:14  lstat             all/org.eclipse.cdt.core.linux.ia64/build.properties                             0.000004   git         
20:13:14  lstat             all/org.eclipse.cdt.core.linux.ia64/os/linux/ia64/libpty.so                      0.000006   git         
20:13:14  lstat             all/org.eclipse.cdt.core.linux.ia64/os/linux/ia64/libspawner.so                  0.000004   git         
20:13:14  lstat             all/org.eclipse.cdt.core.linux.ppc/.project                                      0.000004   git         

JGit (0.12.1)

20:14:46  stat64            /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/META-INF                0.000005   eclipse     
20:14:46  stat64            /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/META-INF/.git           0.000004   eclipse     
20:14:46  stat64            /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/about.html              0.000005   eclipse     
20:14:46  access            /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/about.html              0.000005   eclipse     
20:14:46  stat64            /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/build.properties        0.000006   eclipse     
20:14:46  access            /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/build.properties        0.000005   eclipse     
20:14:46  stat64            /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/os                      0.000005   eclipse     
20:14:46  stat64            /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/os/.git                 0.000004   eclipse     
20:14:46  stat64            /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/.project                0.000005   eclipse     
20:14:46  stat64            /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/.project                0.000005   eclipse     
20:14:46  stat64            /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/META-INF                0.000005   eclipse     
20:14:46  stat64            /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/META-INF                0.000005   eclipse     
20:14:46  getattrlist       /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64                         0.000004   eclipse     
20:14:46  getattrlist       /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/META-INF                0.000004   eclipse     
20:14:46  open              /private/var/tmp/cdt/all/org.eclipse.cdt.core.linux.ia64/META-INF                0.000007   eclipse     

followed by (main time)

20:16:17  close                                                                                              0.000040   eclipse     
20:16:17  lstat64           /private/var/tmp/cdt/.git/objects/noz5040580375106656155.tmp                     0.000007   eclipse     
20:16:17    WrData[async]                                                                                    0.001492 W eclipse     
20:16:17  unlink            /private/var/tmp/cdt/.git/objects/noz5040580375106656155.tmp                     0.001600 W eclipse     
20:16:17  open              /private/var/tmp/cdt/.git/objects/noz2160820080515860388.tmp                     0.000092   eclipse     
20:16:17  close                                                                                              0.000007   eclipse     
20:16:17  stat64            /private/var/tmp/cdt/.git/objects/noz2160820080515860388.tmp                     0.000009   eclipse     
20:16:17  open              /private/var/tmp/cdt/.git/objects/noz2160820080515860388.tmp                     0.000022   eclipse     
20:16:17  write                                                                                              0.000025   eclipse     
20:16:17  write                                                                                              0.000006   eclipse     
20:16:17  close                                                                                              0.000031   eclipse     
20:16:17  lstat64           /private/var/tmp/cdt/.git/objects/noz2160820080515860388.tmp                     0.000007   eclipse     
20:16:17    WrData[async]                                                                                    0.000527 W eclipse     
20:16:17  unlink            /private/var/tmp/cdt/.git/objects/noz2160820080515860388.tmp                     0.000632 W eclipse     
20:16:17  open              /private/var/tmp/cdt/.git/objects/noz7446803944790488252.tmp                     0.000057   eclipse     
20:16:17  close                                                                                              0.000005   eclipse     
20:16:17  stat64            /private/var/tmp/cdt/.git/objects/noz7446803944790488252.tmp                     0.000008   eclipse     
20:16:17  open              /private/var/tmp/cdt/.git/objects/noz7446803944790488252.tmp                     0.000021   eclipse
Comment 10 Alex Blewitt CLA 2011-06-14 17:17:55 EDT
Created attachment 197982 [details]
Filesystem access during commit message

21:59:37 - start of commit dialog
21:59:37 - stat64 of .git in all directories
21:59:37-44 no access (maybe I opened the commit dialog here...)
21:59:44-54 conclusion of stat64 of .git in all directories

So, best part of 10s+ to show commit dialog in first place

22:00:01 - commit dialog 'commit' pressed
22:00:01 - stat64 of .git in all directories
22:00:01-32 - generation of .git/objects/noz*.tmp files
22:00:32-34 - end of commit dialog
Comment 11 Alex Blewitt CLA 2011-06-14 17:24:51 EDT
Also: CDT has 2400 directories, not 250. Oops ...

Mounted the CDT folder in a RAM disk and same commit takes ~1s. Seems that the noz*.tmp object creation/deletion is the latency problem here.
Comment 12 Doug Schaefer CLA 2011-06-16 12:38:48 EDT
After doing a fresh clone, things weren't to bad, but after doing a number of git operations, creating branches, merging, etc. I started to see the 30 second performance again.
Comment 13 David Carver CLA 2011-06-16 16:39:45 EDT
Any possibility of addressing this early in the upcoming EGit development cycle. It's starting to make the CDT folks really edgy about moving over to git.  Yes there is a work around (go to the command line), but that solution won't go over very well.

See https://bugs.eclipse.org/bugs/show_bug.cgi?id=316208 for more information about the CDT conversion to git.
Comment 14 Doug Schaefer CLA 2011-06-16 17:14:42 EDT
I would be happy just having an EGit committer clone our repo and test with it for a while and give us their blessing that EGit is ready for us.
Comment 15 Matthias Sohn CLA 2011-06-17 07:43:49 EDT
(In reply to comment #14)
> I would be happy just having an EGit committer clone our repo and test with it
> for a while and give us their blessing that EGit is ready for us.

I am going to update my CDT installation to Indigo and will do some testing on your repository (I am working on Mac)
Comment 16 Doug Schaefer CLA 2011-06-17 10:12:22 EDT
BTW, setup instructions are here:

    http://wiki.eclipse.org/CDT/git

This covers the build dependencies, which aren't too much.
Comment 17 Matthias Sohn CLA 2011-06-17 10:36:19 EDT
my system: 
- MacBook Pro 2.5 GHz i5 
- 4GB 1067 MHz DDR3 RAM 
- Mac OS X 10.6.7

preparation:
- updated my CDT to Indigo RC5, EGit/JGit to 1.0.0.201106090707-r
- cloned org.eclipse.cdt.git and imported all its projects
- added .gitignore in repository root containing "bin/" to exclude all binary folders
  created by the build (any reason why this isn't in the repository ?)
- switched off automatic build in Eclipse in order not to get disturbed by incremental builds
- opened the following git views : 
  - History View (configured to show all branches and follow selection)
  - Repositories View
  - Staging View

tests:
- modify e.g. org.eclipse.cdt.internal.core.macosx.ProcessInfo 
   and save the modification

-> observations (repeated multiple times)
- decorators react with delay of a couple of seconds (5-10) this delay is 
   probably caused by the background repository change scanner job
   which is scheduled to run every 10 seconds.

- staging view reacts almost instantaneously
- drag & drop the modified file from unstaged to staged pane
- commit from staging view finishes in around 1 second

- commit dialog opens in < 1 sec
- commit from commit dialog finishes in around 1 second

So on my box this looks usable for cdt repository ...

When trying the same with the linux kernel repository performance looks a lot worse.
Especially history view seems to eat a lot of memory.

Also decoration takes a lot of time, so it seems the decorator job has to learn from
the staging view which updates quickly also on linux kernel repository.

So as a workaround until we have improvements you could try to switch off
the git decorator (Preferences > General > Appearance > Label Decorations > Git)
and the background scanning job (Preferences > Team > Git > Refresh resources when 
index changes") and use the staging view instead.

I'll try to look into that in some more detail next week.
Comment 18 Chris Aniszczyk CLA 2011-06-17 11:11:59 EDT
I'm on a T610 running F15, I noticed similar performance characteristics to Matthias, nothing really taking longer than a couple seconds to do. I wonder if we should revisit our native lstat() patch to further help with performance. If any CDT folks could help with that effort that would be nice.

http://egit.eclipse.org/r/#change,1815
Comment 19 Doug Schaefer CLA 2011-06-17 11:25:53 EDT
(In reply to comment #18)
> I'm on a T610 running F15, I noticed similar performance characteristics to
> Matthias, nothing really taking longer than a couple seconds to do. I wonder if
> we should revisit our native lstat() patch to further help with performance. If
> any CDT folks could help with that effort that would be nice.
> 
> http://egit.eclipse.org/r/#change,1815

Keep playing with it, you'll eventually run into the 30 second update/commit. Try a few branches and rebases.

We'd be happy to help test anything you can put together.
Comment 20 Remy Suen CLA 2011-06-17 11:56:09 EDT
(In reply to comment #0)
> A commit also takes about 30 seconds to determine changed files.

Sorry, just to be sure since I haven't used EGit in like a year, we're talking about the progress dialog that says 'Calculating changes in selected resources' after going 'Team > Commit...', correct?

If it is that one I'm getting sub-par speeds. It was really slow on the first time (possibly a minute), now it's taking maybe 3-5 seconds every go. I haven't even tried branching and rebasing yet.

This is on Windows XP, I'm running 4.1.
Comment 21 Matthias Sohn CLA 2011-06-18 17:36:09 EDT
(In reply to comment #18)
> I'm on a T610 running F15, I noticed similar performance characteristics to
> Matthias, nothing really taking longer than a couple seconds to do. I wonder if
> we should revisit our native lstat() patch to further help with performance. If
> any CDT folks could help with that effort that would be nice.
> 
> http://egit.eclipse.org/r/#change,1815

I think this patch is ready to accept, except build integration. The open question for the native lstat patch is how to run the native maven build on the required platforms and how to integrate it into the java build on hudson, if anyone can provide some hints or even better an example I could take that up.
Comment 22 David Carver CLA 2011-06-19 10:55:14 EDT
(In reply to comment #21)
> (In reply to comment #18)
> > I'm on a T610 running F15, I noticed similar performance characteristics to
> > Matthias, nothing really taking longer than a couple seconds to do. I wonder if
> > we should revisit our native lstat() patch to further help with performance. If
> > any CDT folks could help with that effort that would be nice.
> > 
> > http://egit.eclipse.org/r/#change,1815
> 
> I think this patch is ready to accept, except build integration. The open
> question for the native lstat patch is how to run the native maven build on the
> required platforms and how to integrate it into the java build on hudson, if
> anyone can provide some hints or even better an example I could take that up.


You may actually want to look at:

http://www.jfrog.com/sites/jade-plugins/1.3/jade-plugin-parent/jade-build-plugin-parent/jade-native-plugin/index.html

I'd only worry about a couple common platforms, and if something doesn't support a native compiled version, fall back to the current method, at least until something faster comes along.

The above plugin is a combination of the Maven Native plugin, and the Maven NAR plugin.  It does have several examples of how to use it.

We can possibly build for 3 platforms at least, Mac, Windows, and Linux as we have slaves for Hudson that run on all three.
Comment 23 Doug Schaefer CLA 2011-06-20 10:00:38 EDT
Or you can do what CDT does. Assuming the binary doesn't change very often, you could check in the native library.

If you still have work to do on it, then that's not a very good option. Dave mentions an option. I'm not sure there a maven plug-in that lets you just run a shell command, which you could use to call make on a Makefile.
Comment 24 Doug Schaefer CLA 2011-06-20 11:36:11 EDT
I'm removing this bug from the blocker list on CDT's move to git. While this is still an annoyance, I am glad you guys are looking at it. As I mentioned, we'll be happy to test any changes as soon as you have them available.
Comment 25 David Carver CLA 2011-06-20 12:52:26 EDT
(In reply to comment #23)
> Or you can do what CDT does. Assuming the binary doesn't change very often, you
> could check in the native library.
> 
> If you still have work to do on it, then that's not a very good option. Dave
> mentions an option. I'm not sure there a maven plug-in that lets you just run a
> shell command, which you could use to call make on a Makefile.

Yes there is a plugin for that.

http://mojo.codehaus.org/exec-maven-plugin/

Allows you to run any command/script from within maven.
Comment 26 Matthias Sohn CLA 2011-06-22 12:02:01 EDT
(In reply to comment #22)
> (In reply to comment #21)
> > (In reply to comment #18)
> > > I'm on a T610 running F15, I noticed similar performance characteristics to
> > > Matthias, nothing really taking longer than a couple seconds to do. I wonder if
> > > we should revisit our native lstat() patch to further help with performance. If
> > > any CDT folks could help with that effort that would be nice.
> > > 
> > > http://egit.eclipse.org/r/#change,1815
> > 
> > I think this patch is ready to accept, except build integration. The open
> > question for the native lstat patch is how to run the native maven build on the
> > required platforms and how to integrate it into the java build on hudson, if
> > anyone can provide some hints or even better an example I could take that up.
> 
> 
> You may actually want to look at:
> 
> http://www.jfrog.com/sites/jade-plugins/1.3/jade-plugin-parent/jade-build-plugin-parent/jade-native-plugin/index.html
> 
> I'd only worry about a couple common platforms, and if something doesn't
> support a native compiled version, fall back to the current method, at least
> until something faster comes along.
> 
> The above plugin is a combination of the Maven Native plugin, and the Maven NAR
> plugin.  It does have several examples of how to use it.
> 
> We can possibly build for 3 platforms at least, Mac, Windows, and Linux as we
> have slaves for Hudson that run on all three.

We already have a working maven build which can do the native compilation using the native-maven-plugin, look at [1]. The open question is how to share the build results between the different build jobs we need to run on different platforms to do the native compilation (Linux, Mac, Windows). Ideally we could just deploy the individual results to a central Nexus maven repository. This would also help local builds which could simply download the native build results from Nexus. Is there central Maven repository available at Eclipse ? Is there any other project at Eclipse doing something similar so that I could learn from their build setup on hudson ?

[1] http://egit.eclipse.org/r/#patch,unified,1815,12,libjgit_native/pom.xml
     http://egit.eclipse.org/r/#patch,unified,1815,12,pom.xml
Comment 27 Alex Blewitt CLA 2011-06-22 12:26:48 EDT
Yes there is an internal maven repository, the details are on the Maven Eclipse wiki page.
Comment 28 Shawn Pearce CLA 2011-06-24 17:19:05 EDT
Change http://egit.eclipse.org/r/3778 may help much of the temporary file IO that is occurring in Alex's trace:


  11146 22:00:25  open              /private/var/tmp/cdt/.git/objects/noz177171467416318152.tmp                      0.000043   eclipse
  11147 22:00:25  stat64            /private/var/tmp/cdt/.git/objects/noz177171467416318152.tmp                      0.000006   eclipse
  11148 22:00:25  open              /private/var/tmp/cdt/.git/objects/noz177171467416318152.tmp                      0.000019   eclipse  
  11149 22:00:25  lstat64           /private/var/tmp/cdt/.git/objects/noz177171467416318152.tmp                      0.000006   eclipse  
  11150 22:00:25  unlink            /private/var/tmp/cdt/.git/objects/noz177171467416318152.tmp                      0.004007 W eclipse  
  11151 22:00:25  open              /private/var/tmp/cdt/.git/objects/noz1575388353581733414.tmp                     0.000137   eclipse
  11152 22:00:25  stat64            /private/var/tmp/cdt/.git/objects/noz1575388353581733414.tmp                     0.000018   eclipse  
  11153 22:00:25  open              /private/var/tmp/cdt/.git/objects/noz1575388353581733414.tmp                     0.000044   eclipse  
  11154 22:00:25  lstat64           /private/var/tmp/cdt/.git/objects/noz1575388353581733414.tmp                     0.000012   eclipse
  11155 22:00:25  unlink            /private/var/tmp/cdt/.git/objects/noz1575388353581733414.tmp                     0.003602 W eclipse
Comment 29 Matthias Sohn CLA 2011-07-01 10:17:44 EDT
http://egit.eclipse.org/r/#change,3809 should reduce object allocations when switching branches
Comment 30 Matthias Sohn CLA 2011-07-02 18:55:13 EDT
http://egit.eclipse.org/r/3778 was merged as 1eecc82cec2409ed3b69cf96f3206a2fb6c7b9f1
Comment 31 Matthias Sohn CLA 2011-08-02 11:55:41 EDT
(In reply to comment #29)
> http://egit.eclipse.org/r/#change,3809 should reduce object allocations when
> switching branches

http://egit.eclipse.org/r/#change,3809 was merged as 1067f82f562231cca1a7d75c6f0c2d2ce04c7381
Comment 32 Matthias Sohn CLA 2011-09-23 17:48:56 EDT
New decorator implementation 
http://egit.eclipse.org/r/#change,4200
was merged as f332331f4c3b5e69be2846213908acc12b3cd109
and improves performance of the git decorator considerably.
Comment 33 Doug Schaefer CLA 2011-09-26 14:53:43 EDT
Cool. I haven't seen any improvement in 1.1. Hopefully this will help.
Comment 34 Matthias Sohn CLA 2011-11-24 17:20:13 EST
(In reply to comment #33)
> Cool. I haven't seen any improvement in 1.1. Hopefully this will help.

Could you retry with a recent nightly build ?
Comment 35 Robin Rosenberg CLA 2012-12-30 17:32:26 EST
Performance has been greatly enhanced in 2.2. Please reopen and add details if you disagree.