Community
Participate
Working Groups
Here is what I posted on the mailing list Re: [jgit-dev] Avoiding stat calls (again) 13 nov 2010 kl. 16:36 skrev Robin Rosenberg: This issue pops up again and again, I guess when one stops being tired of it. I got an idea that does not involve JNI. If the time stamp hasn't changed, couldn't we assume size didn't change either? I believe all OS:es update the timestamp when we write to a file, not just on open/close. Given some definition of old, we could avoid retrieving the file size if the (old) timestamp in the index equals the one on disk when checking for a possible modification. What would old be? 2 seconds (FAT) or much more? See the Gerrit http://egit.eclipse.org/r/#change,1905. The main thing I wanted, was to make the Commit dialog pop up faster. Now, what I found is that it actually opens and read every file, No wonder it takes forever. Git is supposed to be fast. Daemon Thread [ModalContext] (Suspended) ContainerTreeIterator(WorkingTreeIterator).idBuffer() line: 213 ContainerTreeIterator(AbstractTreeIterator).idEqual(AbstractTreeIterator) line: 368 TreeWalk.idEqual(int, int) line: 696 TreeFilter$2.include(TreeWalk) line: 133 AndTreeFilter$List.include(TreeWalk) line: 162 TreeWalk.next() line: 551 IndexDiff.diff() line: 184 CommitActionHandler.buildIndexHeadDiffList(IProject[], IProgressMonitor) line: 324 CommitActionHandler.access$0(CommitActionHandler, IProject[], IProgressMonitor) line: 292 CommitActionHandler$1.run(IProgressMonitor) line: 100 ModalContext$ModalContextThread.run() line: 121 The TreeWalk needs a better interface to the TreeIterators to support this cleanly and TreeWalk must be able to receive parameters regarding how to compare with regards to filemode and crlf etc. I think we need to break the TreeWalk API. Let's suprise Shawn when he gets back from vacation }:> -- robin
Shawns comments: On Thu, Nov 18, 2010 at 11:56 AM, Shawn Pearce <spearce@spearce.org> wrote: On Tue, Nov 16, 2010 at 2:57 PM, Robin Rosenberg <robin.rosenberg@dewire.com> wrote: See the Gerrit http://egit.eclipse.org/r/#change,1905. The main thing I wanted, was to make the Commit dialog pop up faster. Now, what I found is that it actually opens and read every file, No wonder it takes forever. Git is supposed to be fast. Daemon Thread [ModalContext] (Suspended) ContainerTreeIterator(WorkingTreeIterator).idBuffer() line: 213 ContainerTreeIterator(AbstractTreeIterator).idEqual(AbstractTreeIterator) line: 368 TreeWalk.idEqual(int, int) line: 696 TreeFilter$2.include(TreeWalk) line: 133 Ouch. Someone is using the default ANY_DIFF with an iterator that cannot provide fast access to the ObjectId of the current entry (aka a WorkingTreeIterator). No wonder its taking ages, we have to SHA-1 checksum the entire working directory. That's a bad use of the ANY_DIFF filter. ... Idea 2) Add knowledge of the TreeWalk and the position of the DirCacheIterator to the WorkingTreeIterator. That is, maybe we add a method like: public void setDirCacheIterator(TreeWalk tw, int pos); This is proposed in http://egit.eclipse.org/r/1927 But, I'm yet not convinced we should commit this. If we rely on only this change, ANY_DIFF will still call idBuffer() for working files that are modified, because they don't match the DirCacheIterator. If the files are sufficiently large, JGit will still crawl because idBuffer() is spending a lot of time to compute a SHA-1 checksum that doesn't matter. I never meant for TreeFilter.ANY_DIFF to be used with a WorkingTreeIterator. It can be used... but its performance results are disastrous because it relies solely on idBuffer(), and idBuffer() is an inherently slow operation for working tree files. I intended to use something like the WorkingTreeIterator's isModified() method, through a different TreeFilter implementation. The earlier rewrite of the resource decorator in EGit was the first such example (and predates the isModified method by a lot). It may simply be the case that the offending application code should stop using ANY_DIFF when there is a WorkingTreeIterator involved, and instead use a different TreeFilter implementation that can more effectively take advantage of the isModified() method. This filter probably should be defined as a stock filter in JGit, and take the various relevant indexes as constructor parameters. An alternative approach is to add some sort of "compare" method to AbstractTreeIterator that TreeWalk can use, so that instead of idEqual(int,int) we have entryEqual(int, int) that delegates down into that compare routine. Then WorkingTreeIterator and DirCacheIterator can override the compare routine to try and special case a compare of those two iterator types, relying on isModified() instead of idBuffer(). This does get ugly implementation wise, as both iterator types need to override the method with identical delegation to a common compare routine. If we ever grew additional iterator types, the permutations spread out through the various implementations quite quickly, making for a very bug-prone system. -- Shawn.
Christians Halstrick's comments Hi, An alternative approach is to add some sort of "compare" method to I wanted to suggest this approach of having an explicit comparison even before I read your mail. The fixes you suggested intend to make idBuffer() faster. But in the use case of ANY_DIFF we are not interested in the content IDs. We want to compare to TreeIterators whether they currently point to the same content or not (I guess we are only interested in content equality not in greater, lesser,...). I see also problems in implementating this because DirCacheIterator.compare(WorkTreeIterator) and WorkTreeIterator.compare(DirCacheIterator) should both use an implementation probably implemented in WorkTreeIterator (that implementations should be clever enough to call isModified() instead of blindly calling idBuffer()). We could solve this by additionally introducing a TreeIteratorComparator which implements a compare(TreeIterator a, TreeIterator b). That compare method would be public and the compare methods on the iterators itself would be protected, so we enforce that the clever TreeIteratorComparator. TeeIteratorComparator would delegate to the compare method of either "a" or "b" depending on the types of "a" and "b". The knowledge which comparison is better is completely hidding in the TeeIteratorComparator and the Iteratores themselfes don't have to care. E.g. He would take care to call the WorkingTreeIterators compare method (which would use isModified()) if he see's thath one of the iterators is a WorkingTreeIterator. AbstractTreeIterator that TreeWalk can use, so that instead of idEqual(int,int) we have entryEqual(int, int) that delegates down into that compare routine. Then WorkingTreeIterator and DirCacheIterator can override the compare routine to try and special case a compare of those two iterator types, relying on isModified() instead of idBuffer(). This does get ugly implementation wise, as both iterator types need to override the method with identical delegation to a common compare routine. If we ever grew additional iterator types, the permutations spread out through the various implementations quite quickly, making for a very bug-prone system. -- Chris
Hi guys! We have a strange situation with this issue. One of developers has very low performance in operations with WorkingTreeIterator. I have added some logging and found that isModified() method always computes SHA. Methods entry.getLength() and entry.getLastModified() return 0 in most cases. It results in checking content of every file. I have no idea how such wrong index entries can be created. When I copy his .git/index file into my repo I can reproduce the problem. But after reset index backs to normal and performance is ok. The question is how does the index become broken? In my environment I never had such problem. We both use Ubuntu with ext4 file system.
Hi Ilya, which version of JGit are you using? Are you working on a shared repo so I can have a look at the repo where it happens? If I can't have access to the real repo and index file then please attach the output "debug-show-dir-cache" command from jgit command line. (How to use the jgit command line is described here: http://wiki.eclipse.org/JGit/User_Guide).
I think it is safe to say that 45edb41a9ed1248964cb4bf0d635f2643406d123, 342de38e57e3052d3d2f12e93d629bdc63f123be and possibly other commits have fixed this issue.