Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 388582 - Major performance issues with TreeWalk (walks ignored directories)
Summary: Major performance issues with TreeWalk (walks ignored directories)
Status: RESOLVED FIXED
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 2.0   Edit
Hardware: PC Windows 7
: P3 blocker with 2 votes (vote)
Target Milestone: 5.0   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 448774 (view as bug list)
Depends on:
Blocks: 532300 425555
  Show dependency tree
 
Reported: 2012-08-31 18:19 EDT by Jason CLA
Modified: 2018-05-21 21:04 EDT (History)
10 users (show)

See Also:


Attachments
Perf Analysis (59.09 KB, image/png)
2012-08-31 18:19 EDT, Jason CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jason CLA 2012-08-31 18:19:16 EDT
Created attachment 220629 [details]
Perf Analysis

I'm using EGit with Eclipse Indigo.  Whenever I try to do anything git-related from the UI such as committing or deleting files, Eclipse hangs for a couple minutes before finally finishing the operation.  This happens for even the smallest of git-related operations.

I've attached the output of a performance sampling which shows that the bottleneck is when JGit is walking the filesystem.

Not sure you'll be able to reproduce, I might write a little program which just walks my filesystem to see if that runs slowly.
Comment 1 Jason CLA 2012-08-31 18:33:06 EDT
I just made a quick program which walks my git repo.

It took 46 seconds to walk 237,280 files

When I told it to exclude 'build' directories,
it took half a second to walk 14,646 files


Weird how the times aren't linear.


I'm guessing that the File Tree Iterator is not ignoring certain directories that should be excluded from the walk and this is probably why the performance is so slow.
Comment 2 Jason CLA 2012-08-31 18:38:47 EDT
Yeah, just confirmed my theory.  I did a gradle clean which wiped the build directories so that only source was left.  Then I did a commit from eclipse and it was super-fast.
Comment 3 Jason CLA 2012-08-31 18:45:55 EDT
My conclusion is that JGit is not respecting the .gitignore file when walking the file hierarchy.  Seems like if it only walked the parts of the tree that were not in the ignore file, it would be much faster.
Comment 4 Robin Rosenberg CLA 2012-09-11 15:10:20 EDT
I think the real problem is that IndexDiffFilter is not working. E.g. the contents of bin directories should be ignored by most operations unless we
already track content there. Try this:

#jgit init
#mkdir d
#touch d/f
#touch f
#jgit add f
#jgit diff
diff --git a/d/f b/d/f
new file mode 100644
index 0000000..e69de29
--- /dev/null
+++ b/d/f

jgit diff should not output anything here, but apparently it decends "d"
here.

Ignoring directories completely due to patterns may be hard in general.

E.g the following two rules would ignore everything in a folder named bin,
unless it contains class files at any depth.

bin/
!*.class
Comment 5 Jason CLA 2012-09-11 15:49:25 EDT
I am not an expert on file matching patterns, but I think if this logic was added to the filter, performance would be vastly increased (especially with large git repos).
Comment 6 Robin Rosenberg CLA 2012-09-11 16:00:51 EDT
Jens, got an idea of why the IndexDiffFilter does not appear to work as advertised? Is it only usable with IndexDiff?
Comment 7 Robin Rosenberg CLA 2012-09-11 18:10:38 EDT
See https://git.eclipse.org/r/7721 for failing unit test
Comment 8 Martin Oberhuber CLA 2013-03-10 01:15:44 EST
I think I'm seeing the same problem or a related one:

I have a really tiny git repo, but my worktree has some really deep subtrees (some physical and some symlinked, but all are ignored by .gitignore). Commandline git is lightning fast, but when I commit even a single file through egit, it takes in the range of minutes. My ignored subtrees are mostly at the root level, even outside the scope of any Eclipse project that I'm working on.
Comment 9 Robin Stocker CLA 2014-11-08 23:43:28 EST
*** Bug 448774 has been marked as a duplicate of this bug. ***
Comment 10 Thomas Wolf CLA 2018-03-15 19:03:38 EDT
As far as I see FileTreeIterator indeed iterates over everything. TreeWalk advances the iterator, and only then applies the filter (NotIgnoredFilter, or IndexDiffFilter), and then skips over ignored files. But it still iterates through them, and the FileTreeIterator does a directory.listFiles() for each ignored directory (and subdirectory) and then even a FS.getAttributes() on each file listed. FS.getAttributes() may be a relatively expensive operation, especially on Windows.

What one would need is a FileTreeIterator that would skip over ignored directories transparently, without doing the expensive directory listing and getting attributes, unless the directory contained tracked files, in which case the working tree directory still would need to be traversed, but only for the tracked paths.
Comment 11 Eclipse Genie CLA 2018-03-28 05:34:28 EDT
New Gerrit change created: https://git.eclipse.org/r/120337
Comment 12 Eclipse Genie CLA 2018-05-21 21:04:30 EDT
Gerrit change https://git.eclipse.org/r/120337 was merged to [master].
Commit: http://git.eclipse.org/c/jgit/jgit.git/commit/?id=d7deda98d0a18ca1e3a1fbb70acf8e7cbcf25833