Community
Participate
Working Groups
Eclipse EGit (Incubation) 0.12.0.201103251413 org.eclipse.egit.feature.group Eclipse JGit (Incubation) 0.12.0.201103250414 org.eclipse.jgit.feature.group I notice constant CPU usage by Eclipse. When I look in the progress view, with "Show sleeping and system operations" checked, I see that every 5-20s or so a job is scheduled called: "Repository Change Scanner: Scanning Git repositories for changes" Egit really shouldn't be doing periodic polling like (though I'm not sure what it's doing...). For resources under the WS it should be responding and handling resource change events.
Even worse, on cygwin I notice with Process Explorer that every few seconds "cygpath" program is called. I cannot pinpoint to egit exactly but have a strong suspicion. Working on (java) code under git gets bumpy.
My Eclipse has been locking up for the last couple days. It seems I/O related. When it's being slow I jstack is showing egit doing I/O in a background thread: "Worker-692" prio=10 tid=0x0000002add5e5000 nid=0x7060 runnable [0x0000000041c44000] java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.list(Native Method) at java.io.File.list(File.java:973) at java.io.File.list(File.java:1004) at org.eclipse.jgit.storage.file.RefDirectory$LooseScanner.scanTree(RefDirectory.java:384) at org.eclipse.jgit.storage.file.RefDirectory$LooseScanner.scanTree(RefDirectory.java:391) at org.eclipse.jgit.storage.file.RefDirectory$LooseScanner.scanTree(RefDirectory.java:391) at org.eclipse.jgit.storage.file.RefDirectory$LooseScanner.scan(RefDirectory.java:353) at org.eclipse.jgit.storage.file.RefDirectory.getRefs(RefDirectory.java:287) at org.eclipse.jgit.lib.Repository.getAllRefs(Repository.java:740) at org.eclipse.jgit.storage.file.FileRepository.scanForRepoChanges(FileRepository.java:380) at org.eclipse.egit.ui.Activator$RepositoryChangeScanner.run(Activator.java:438) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
Created attachment 195886 [details] full eclipse backtrace In this backtrace you can see 14(!) GitDecoratorJob Threads all of which are doing I/O simultaneously. I/O on NFS is expensive here. And I guess this many concurrent jobs explains why main is being stuttery.
(In reply to comment #1) > Even worse, on cygwin I notice with Process Explorer that every few seconds > "cygpath" program is called. I cannot pinpoint to egit exactly but have a > strong suspicion. Working on (java) code under git gets bumpy. This is due to the assumption that if cygwin is the path, cygwin path may be used and therefor it wants to translate them. We probably do this way more often than necessary. In addition I believe most cygwin users actually do not have cygwin in their PATH anyway, so the strategy does probably does not work and when used it should try harder not to resolve paths all the time. Workaround: remove cygwin from PATH. You may want to open a specific issue since the use of cygpath is not directly related to the orginal report.
(In reply to comment #5) > Eclipse EGit (Incubation) 0.12.0.201103251413 > org.eclipse.egit.feature.group > Eclipse JGit (Incubation) 0.12.0.201103250414 > org.eclipse.jgit.feature.group > > I notice constant CPU usage by Eclipse. When I look in the progress view, with > "Show sleeping and system operations" checked, I see that every 5-20s or so a > job is scheduled called: > > "Repository Change Scanner: Scanning Git repositories for changes" > > Egit really shouldn't be doing periodic polling like (though I'm not sure what > it's doing...). For resources under the WS it should be responding and > handling resource change events. You can disable scanning of the repository in Settigs / Team / Git /Automatic Refresh, The reason it does this scanning is that automatic workspace refresh is very expensive while scanning the .git refs is very cheap in comparison and it allows Eclipse to react snappily to changes made using by command line Git.
(In reply to comment #5) > You can disable scanning of the repository in Settigs / Team / Git /Automatic > Refresh, > > The reason it does this scanning is that automatic workspace refresh is very > expensive while scanning the .git refs is very cheap in comparison and it > allows Eclipse to react snappily to changes made using by command line Git. Well this raise a few points: - If I disable this is there a risk of bad things happening if I use command line git and then subsequently use egit? If it's just the UI then I may not care as much, but if it's possible I can corrupt my repo / branch heads when I next try to commit then that's less good... - 14 threads running and doing I/O concurrently is very bad. On our 8-core box the refresh is doing so much work that the UI thread stutters. - On NFS such continuous fs operations can hammer a filer. With many users using egit may exacerbate the problem. - On all platforms this will keep CPU and disk going eating battery on laptops By all means workspace refresh is too slow, but egit refresh needs to be much more intelligent about how much resources it uses.
(In reply to comment #6) > (In reply to comment #5) > > You can disable scanning of the repository in Settigs / Team / Git /Automatic > > Refresh, > > > > The reason it does this scanning is that automatic workspace refresh is very > > expensive while scanning the .git refs is very cheap in comparison and it > > allows Eclipse to react snappily to changes made using by command line Git. > > Well this raise a few points: > - If I disable this is there a risk of bad things happening if I use command > line git and then subsequently use egit? If it's just the UI then I may not > care as much, but if it's possible I can corrupt my repo / branch heads when I > next try to commit then that's less good... I don't see a way to corrupt the repo by disabling it, but without the refresh you will get the damn "resource not refreshed" or Eclipse running half the code from one version and half some other version, which is at best annoying and when it happened all the time I decided to introduce the refresh. > - 14 threads running and doing I/O concurrently is very bad. On our 8-core > box the refresh is doing so much work that the UI thread stutters. I've never seen that, so I believe this is not typical. 14 threads to do parallel refresh is certainly insane. The repo refresh should only trigger one refresh, so I don't know why it tries 14, but there have been changes to the decorators I think. Can you identifiy a version of EGit that does not exhibit this behavior? > - On NFS such continuous fs operations can hammer a filer. With many users > using egit may exacerbate the problem. > - On all platforms this will keep CPU and disk going eating battery on > laptops > > By all means workspace refresh is too slow, but egit refresh needs to be much > more intelligent about how much resources it uses. It sure could be more intelligent. Using a little more memory it could decide to only refresh those resources it thinks has changed. Currently it refreshes all projects connected to the repo for which a change has been detected. Until now the suboptimal behavior has beeen a theoretical problem. My guess is you do not have the standard workspace refresh enabled as well? Can you describe the workspace you use in terms of size, # of project, # repos and directory depth? Which Eclipse version are you using? I believe there are changes in 3.7 that affects how workspace refresh is done.
I recall that the Linus and other have in the past stronly recommended against using Git on a network file system because of different semantics as well as performance. Git is built on the assumption that users have fast file systems, fast CPU:s and lots of memory and that we have well use those resources. Network file system violates the assumption about fast file systems. Back to the issues: Anyone?: Can we control the number of decorator threads, is it Eclipse or us?
(In reply to comment #7) > I don't see a way to corrupt the repo by disabling it, but without the refresh > you will get the damn "resource not refreshed" I fixed that for reads see 'lightweight-refresh' in 3.7 : bug 303517 > I've never seen that, so I believe this is not typical. 14 threads to do > parallel refresh is certainly insane. The repo refresh should only trigger one > refresh, so I don't know why it tries 14, but there have been changes to the > decorators I think. Can you identifiy a version of EGit that does not exhibit > this behavior? I've only been updating to the major releases, and I only noticed this issue in 0.12 when NFS was being slow. I'm not sure if it's better or worse since 0.11... AFAICS it looks like the fs accesses are coming off the decorator job. > It sure could be more intelligent. Using a little more memory it could decide > to only refresh those resources it thinks has changed. Currently it refreshes > all projects connected to the repo for which a change has been detected. Until > now the suboptimal behavior has beeen a theoretical problem. My guess is you > do not have the standard workspace refresh enabled as well? Yes, auto-refresh is on, and I'm on Linux. > Can you describe the workspace you use in terms of size, # of project, # repos > and directory depth? I think this one is my main Eclipse PDE workspace. It's 125 Java plugin projects, each of which is its own git repo (with the .git at the project root). I also have another large workspace which has one .git with 100 projects, that .git isn't under the owrkspace... > Which Eclipse version are you using? I believe there are changes in 3.7 that > affects how workspace refresh is done. Latest code from 3.7. What's interesting from the backtraces is that it's not a #refreshLocal that's going on. egit is actually is actually opening and reading the files: ... at org.eclipse.core.internal.resources.File.getContents(File.java:293) at org.eclipse.egit.core.ContainerTreeIterator$ResourceEntry.openInputStream(ContainerTreeIterator.java:246) ... Reading files is orders of magnitude more expensive than stat (as stat structures are readily cached). Also reading all the files in the project is going to churn quickly through the any OS cache. If it could be serialized to a few threads (where a few is < number of processors/cores) then I think performance wouldn't be so bad, and at least the main thread would get a chance to run. I guess the question should be: does egit really need to read the files? Can you not just trust the resource delta? Resource deltas are key to eclipse working well, as performance would quickly suck if all integrators went straight to the fs... I think it's a bad idea for different contributors to read files or even #rereshLocal at a large scope periodically, as it leads to IResources constantly being locked and a slow Eclipse. There are ways to fix this, I think: 1) egit: Trust the resource delta 2) Users: use Refresh Automatically (3) Add hooks for Linux + Mac OS) With 1 the resource delta _should_ be sufficient on all platforms. The only issue might be that the .git directory isn't under the workspace. Perhaps egit can create a linked resource to it at the project root so that team resources are picked up in the refresh? With 2 Windows already has such a hook. Linux and Mac OS use polling. The polling backs off to use no more than ~5% of clock-time. It does this deliberately to prevent over-consumption, backing off when filers are being slow. We can fix 3 properly by plumbing in fsevents and inotify for mac and linux with jnotify in 3.8 (if we can get the license issues worked out). The tuning that exists in core.resources has happened so that eclipse remains responsive and users don't complain about unnecessary performance drops. Perhaps you can expand a bit on why you're using File I/O and not the resource delta and whether there are any core.resources bugs that follow from this?
(In reply to comment #8) > I recall that the Linus and other have in the past stronly recommended against > using Git on a network file system because of different semantics as well as > performance. That's because git status etc. stats everything to find out what's changed. Eclipse has _already_ done this providing resource deltas so every integrator doesn't have to. Note that every team provider has this issue: they need to know if files are modified since they were last invoked. > Git is built on the assumption that users have fast file systems, fast CPU:s > and lots of memory and that we have well use those resources. Network file > system violates the assumption about fast file systems. Many enterprises use NFS. Just because it sucks just means the software has to be more careful about doing stuff that's 'cheap' locally.
(In reply to comment #9) > > What's interesting from the backtraces is that it's not a #refreshLocal that's > going on. egit is actually is actually opening and reading the files: > ... > at org.eclipse.core.internal.resources.File.getContents(File.java:293) > at > org.eclipse.egit.core.ContainerTreeIterator$ResourceEntry.openInputStream(ContainerTreeIterator.java:246) > ... That stack trace looks familiar. I think we may have an issue open on this, i.e. reading too many files, but I cannot find it right now. We certainly have fixed a number of such issues. back to dtruss and strace again... > Reading files is orders of magnitude more expensive than stat (as stat > structures are readily cached). Also reading all the files in the project is > going to churn quickly through the any OS cache. If it could be serialized to > a few threads (where a few is < number of processors/cores) then I think > performance wouldn't be so bad, and at least the main thread would get a chance > to run. > > I guess the question should be: does egit really need to read the files? Can > you not just trust the resource delta? Resource deltas are key to eclipse > working well, as performance would quickly suck if all integrators went > straight to the fs... This is partly about the divisiom of work between JGit and EGit. JGit does most of the low level work and it has no idea about Eclipse (and shoudn't). There have been some work on virtualizing the file system access, but it is not complete. And as for the resouce deltas we have not been willing to trust them since Eclipse have had such a bad idea of what has *really* changed and what hasn't. It probably works better for people who *only* work in Eclipse. But I admit we should learn more about the delta and where we can use them. The ligtweigth refresh could perhaps fix most problems. > I think it's a bad idea for different contributors to > read files or even #rereshLocal at a large scope periodically, as it leads to > IResources constantly being locked and a slow Eclipse. > > There are ways to fix this, I think: > 1) egit: Trust the resource delta > 2) Users: use Refresh Automatically > (3) Add hooks for Linux + Mac OS) Whatever keeps my resources automagically in sync and does not drain my batteries too quickly. > With 1 the resource delta _should_ be sufficient on all platforms. The only > issue might be that the .git directory isn't under the workspace. Perhaps egit > can create a linked resource to it at the project root so that team resources > are picked up in the refresh? We only look at a small subset of the files in .git. The .git directory may contain tens of thousands of files that are completely uninteresting to look at in this context. We only care about the index and some refs as a source of changes. getAllRefs seems to be a problem. It is being asked for way too often. > With 2 Windows already has such a hook. Linux and Mac OS use polling. The > polling backs off to use no more than ~5% of clock-time. It does this > deliberately to prevent over-consumption, backing off when filers are being > slow. How hard would it be to provide notifications via this hook? > We can fix 3 properly by plumbing in fsevents and inotify for mac and linux > with jnotify in 3.8 (if we can get the license issues worked out). The tuning > that exists in core.resources has happened so that eclipse remains responsive > and users don't complain about unnecessary performance drops. > > Perhaps you can expand a bit on why you're using File I/O and not the resource > delta and whether there are any core.resources bugs that follow from this? I wonder also if we should disable the git repo based refresh when automatic refresh is enabled. Since it is so slow I never use it myself.
(In reply to comment #11) > And as for the resouce deltas we have not been willing to trust them > since Eclipse have had such a bad idea of what has *really* changed and what > hasn't. It probably works better for people who *only* work in Eclipse. Note that resource deltas are the only mechanism that project explorer, builders, editors, etc. use. They all go through the IResource layer and rely on it to track and notify of changes. The IResource layer guarantees that you'll find out about changes in the workspace soon after they're made (if they were made in Eclipse), and whenever eclipse find out about them for external changes. FS hooks makes the latter happen more quickly. I would think a good egit integration - especially for the decorators for the project explorer, should be driven exclusive from resource deltas. After all this is what drives the tree changes in the project explorer. The only issue I saw was bug 338667 which would be straightfoward for us to fix. Though I do wonder how often a file changes with the timestamp not changing. > We only look at a small subset of the files in .git. The .git directory may > contain tens of thousands of files that are completely uninteresting to look at > in this context. We only care about the index and some refs as a source of > changes. Ah, OK. This would seem to be a good trigger, but does it justify opening all resources - especially if they're already 'controlled' by the Eclipse workspace? A #refreshLocal of the project would trigger a resource delta for all files discovered to be modified outside of eclipse. This would likely be better than opening all the files irrespective of whether they've been modified (if this is what's happening). > > With 2 Windows already has such a hook. Linux and Mac OS use polling. The > > polling backs off to use no more than ~5% of clock-time. It does this > > deliberately to prevent over-consumption, backing off when filers are being > > slow. > > How hard would it be to provide notifications via this hook? It does already - ResourceDeltas ;). Obviously It only monitors stuff visible through the workspace. > I wonder also if we should disable the git repo based refresh when automatic > refresh is enabled. Since it is so slow I never use it myself. Possibly. But it's still interesting that egit bypassed IResource layer. It really is intended to maintain a strong model with guaranteed notification and change batching for all eclipse consumers. Refreshing of resources controlled by eclipse should be done by the eclipse resource layer. If core.resources isn't working right for you then please do file issues against core.resources as improvements made here benefit the rest of the IDE.
I see we now have once job per project (since d422d75ee407a7e2476c0e3358766d367e5d6878), having a 127 of them might possibly lead to having 14 running in parallel. It is EGit the creates the decorator jobs. Does the Job API have some simple way of limiting this? Perhaps just having a set of say five rules that we distribiute to each job that wants to run?
You could have just one job, and a queue of projects to be reconciled. When an entry is added to the queue just call job.schedule(). Alternatively a fixed small number of jobs similarly scheduled and taking from a shared work queue.
(In reply to comment #12) > The only issue I saw was bug 338667 which would be straightfoward for us to fix. > Though I do wonder how often a file changes with the timestamp not changing. With Git this happens quite often since it is so darn fast and when it happens it is hard to see what is wrong. Perhaps those i-cannot-get-eclipse-to-refresh-no-matter-what bugs are related to this (even without Git). >Possibly. But it's still interesting that egit bypassed IResource layer. It really is intended to maintain a strong model with guaranteed notification and change batching for all eclipse consumers. Refreshing of resources controlled by eclipse should be done by the eclipse resource layer. >If core.resources isn't working right for you then please do file issues against core.resources as improvements made here benefit the rest of the IDE. It is partially related to learning the API:s, but also because we really wanted the basic JGit API to be stable before forking it.
Just my 2 cents... I think it is very important for EGit to be: 1. In-sync with any git repos in your project explorer, no-matter whether git is managed through Eclipse, or through the command line. (Many users do both) 2. Fast! I've seen a somewhat related issue with EGit's handling of symlinks here: http://www.eclipse.org/forums/index.php/mv/tree/156013/#page_top There's a common solution that I can see involving platform-specific code for POSIX systems. Currently, it seems that the Multi-Platform issue is holding up some developers from the fastest and quickest solution (using native stat & lstat calls on Linux or Mac). Users on these systems should not have to suffer because Windows does not support symlinks. I think this is a case where it's ok to write some platform-specific code to solve both of these problems on POSIX systems. Any other optimization would also be great, and most likely necessary to fix this specific issue on Windows. - James Cuzella
Decorator became much faster with http://egit.eclipse.org/r/#change,4200 which introduces IndexDiffCache and updates this cache incrementally based on ResourceChangedEvents. Also removal of deprecated GitIndex from detection of external index changes used by the repository change scanner http://egit.eclipse.org/r/#change,4218 should improve performance and resource consumption. Besides a JNI solution we now can optionally use Java 7 NIO 2 to implement support for symlinks on platforms supporting them. This is tracked in bug 354367 hence I propose this aspect should be discussed there as it is not directly related to the title of this bug.
Further speedups could be reached by using other Java 7 features, see bug 353771.
What about regular 5-sec freezing during saving files? Is it different to the current bug described here? I can't find the bug about freezing on save.
> What about regular 5-sec freezing during saving files? Is it different to the > current bug described here? I can't find the bug about freezing on save. See bug 358898 (I filed bug 361763, which got marked as dup of that one).
Are there any updates to the issue? This bug is very annoying and makes EGit+Eclipse rather unusable for normal daily work. It is a pity, because I liked EGit
(In reply to comment #21) > Are there any updates to the issue? This bug is very annoying and makes > EGit+Eclipse rather unusable for normal daily work. It is a pity, because I > liked EGit This issue seems to have expanded somewhat to accomodate all kinds of problems, making it hard to see whether "this issue" has been solved or not. Do you mean THIS issue or the one you mentioned, that Matthias opened as bug 358898?
Related to this, for some Windows users is Bug 353389 which has been merged. JGit/Egit no longer by default uses cygwin to resolve paths even when cygwin is in the PATH.
> Do you mean THIS issue or the one you mentioned, that Matthias opened as bug > 358898? not 358898. My is on Ubuntu 11 (64bit) and happens on file saving.
(In reply to comment #24) > > Do you mean THIS issue or the one you mentioned, that Matthias opened as bug > > 358898? > > not 358898. My is on Ubuntu 11 (64bit) and happens on file saving. Is it related to the subject for this bug? If not, please open a separate issue. I don't see how it could be related, but I could be wrong.
> Is it related to the subject for this bug? Robin, your question sounds almost exactly like mine here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=346079#c19 doesn't it? :)
btw, I tried openjdk 7. I've pressed CTRL-S to save a changed file. Eclipse was not responding about FIVE minutes.
(In reply to comment #26) > > Is it related to the subject for this bug? > > Robin, your question sounds almost exactly like mine here: > https://bugs.eclipse.org/bugs/show_bug.cgi?id=346079#c19 > doesn't it? :) Indeed, but I don't see that behaviour, so I have no idea what you talking about, and even less how it could possible have anything to do with the repository change scanner. You are the reporter. If you don't *know* it is related to the repository change scanner, you should report it as a separate bug.
I am *still* seeing this bug in: Eclipse Java EE IDE for Web Developers. Version: Indigo Service Release 2 Build id: 20120216-1857 org.eclipse.egit (1.3.0.201202151440-r) "Git Team Provider" org.eclipse.egit.mylyn (1.3.0.201202151440-r) "Git Team Provider" -------------------------------------------------------------------- The problem is most definitely the repository change scanner in EGit. Every time I click the "Add to Git Index" button, or the "Commit change" button, I see "Git Repository Refresh" running in Eclipse's progress window. This almost always takes a very long time (~40+ seconds). My workflow throughout the day is noticeably slowed down by this, so anything that can be done to speed up and optimize this process would be extraordinarily helpful!!
(In reply to comment #29) > I am *still* seeing this bug in: > > Eclipse Java EE IDE for Web Developers. > > Version: Indigo Service Release 2 > Build id: 20120216-1857 > > org.eclipse.egit (1.3.0.201202151440-r) "Git Team Provider" > org.eclipse.egit.mylyn (1.3.0.201202151440-r) "Git Team Provider" At this point in time bugs should have been reported against 2.0. Please test and report new figures. Also include whether you have core.autocrlf set or not, # repos, # refs (especially loose refs as I suspeect that may be an issue), # files, # untracked files and distribution of files sizes. A project set (.psf) would be very useful. Also not that the repository change scanner can be turned off if it is a nuisance rather than help. If you do not use command line Git, the scanner is of little help.
In EGit 2.2, the repository reindexing was made incremental, see bug 393642. The change results in much better performance. See also bug 381856 (available in nightly), where the number of jobs was reduced, further removing unnecessary work. Is this issue still present when using EGit >= 2.2 (preferably the nightly versions)?
Closing as fixed then.