| Summary: | Granularity of CDT git repository | ||
|---|---|---|---|
| Product: | [Tools] CDT | Reporter: | James Blackburn <jamesblackburn+eclipse> |
| Component: | cdt-core | Assignee: | Project Inbox <cdt-core-inbox> |
| Status: | RESOLVED FIXED | QA Contact: | Doug Schaefer <cdtdoug> |
| Severity: | normal | ||
| Priority: | P3 | CC: | denis.roy, elaskavaia.cdt, john.cortell, marc.khouzam, meisam.fathi, teodor.madan |
| Version: | 7.0 | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | All | ||
| Whiteboard: | |||
| Bug Depends on: | |||
| Bug Blocks: | 316208 | ||
|
Description
James Blackburn
Let me quote my own email to cdt-dev: I don't like all-in-one granularity. Normally I am not interested at all in debug plugins (sadly, there is no support of dbx). I do not want to pull and recompile those (and occasionally hit problems) every time I update from main repository. My first preference is *[component level repos]*, and then [.project level repos] organized in sets on component level and global CDT level. I am sure there are some more users like me out there too. I'd prefer to see quantitative arguments. For example, James thought the all-in-one was huge, but when you look at the numbers, it's actually manageable for him. So, how much traffic does debug really generate, for example? (In reply to comment #2) > I'd prefer to see quantitative arguments. For example, James thought the > all-in-one was huge, but when you look at the numbers, it's actually manageable > for him. > So, how much traffic does debug really generate, for example? Doug, I already said that the traffic won't be a ground for negative vote. It is meant to be an illustration that one-for-all repository is wasteful. Just take this argument FWIW. It is a matter of preference, not a show stopper. (In reply to comment #2) > So, how much traffic does debug really generate, for example? As it happens I've got the answer to this in a shell :) I pulled a bunch of changes into an old org.eclipse.cdt I had lying around from March: bash:jamesb:xl-cbga-20:33081> du -sh .git/ 137M .git/ bash:jamesb:xl-cbga-20:33084> git remote -v origin git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt.git (fetch) origin git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt.git (push) bash:jamesb:xl-cbga-20:33085> git pull origin remote: Counting objects: 29103, done. remote: Compressing objects: 100% (7214/7214), done. remote: Total 23820 (delta 11532), reused 21802 (delta 9896) Receiving objects: 100% (23820/23820), 17.67 MiB | 268 KiB/s, done. Resolving deltas: 100% (11532/11532), completed with 858 local objects. From git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdtFrom git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt * [new tag] v201003191033 -> v201003191033 * [new tag] v201003221139 -> v201003221139 ... .... 4255 files changed, 133841 insertions(+), 74392 deletions(-) bash:jamesb:xl-cbga-20:33104> du -sh .git 158M .git So between March and today, there's been 17MB of change (as reported by fetch). ~1.5M a week which doesn't seem to be too bad to me. Coming up to release there was a lot of stuff being created: the output shows 1.5M of new documentation(mostly images), an android plugin coming in, loads of codan, loads of dsf-gdb tests, some edc and xlc creation too. I'm not sure if the last 3 months is indicative of the usual change pace, but a few hundred kB per day isn't too bad. (In reply to comment #2) > I'd prefer to see quantitative arguments. For example, James thought the Hmm, trying to use git repository from http://dev.eclipse.org/git/org.eclipse.cdt/org.eclipse.cdt.git to rebase branch ScannerDiscovery61 and having some observations: - Clone of the whole repository is slow. Takes for me ~20 min. No noticeable difference between http:// and git://. CVS checkout is much faster. Took 12 min to checkout everything and only 2.5 min to checkout the 12 plugins I am interested in. I realize that I am getting the whole history with git but that is a dubious advantage as I am connected all the time and would much prefer to see fresh commits in history right away rather then keep pulling from the git repo. That's a point toward smaller granularity. I am often working with different/new workspaces and frankly 20 min to clone is just too much. Well, I guess I can clone locally which took ~3 min. Sigh, smaller repositories would be so much faster. - There is something wrong with the converted from CVS branch in git repository. After checkout I am getting some files from older revisions than on the branch in CVS. For example, CommandLauncher.java is 1.13 despite 1.18 being on the branch in CVS. ICommandLauncher is missing at all. I am afraid rebasing in those conditions probably won't go too well and I am expecting a lot of conflicts on top of that. Well, it takes me 5 minutes to check out org.eclipse.cdt for the first time. I guess I have a faster link. But either way, you only ever have to do that once. Cloning locally is almost instantaneous if you need multiple copies. git is very fast once you have that done. I don't think I'd accept anything less than the feature and all the plug-ins that go into the C/C++ IDE. If your test environment doesn't include the latest source for those plug-ins, you have an invalid test environment. And checking out everything in cdt-main.psf takes quite a long time as well, and every update after that is painful. git updates are lighting fast. That and since there is actually little outside cdt-mail, chopping buys us little. But I would be interested if there are ways to chop out history. I would think we only need the live streams when we cut over. Or something like that. (In reply to comment #6) > Well, it takes me 5 minutes to check out org.eclipse.cdt for the first time. I > guess I have a faster link. But either way, you only ever have to do that once. > Cloning locally is almost instantaneous if you need multiple copies. git is very > fast once you have that done. Sounds like you already made up your mind. As I pointed out ~3min for local cloning is far from instantaneous and longer then CVS checkout of 2.5 min for the 12 plugins I was interested in. Obviously it depends on the size of the repository and the length of its history as git will replay every commit. (In reply to comment #7) > (In reply to comment #6) > > Well, it takes me 5 minutes to check out org.eclipse.cdt for the first time. I > > guess I have a faster link. But either way, you only ever have to do that once. > > Cloning locally is almost instantaneous if you need multiple copies. git is very > > fast once you have that done. > Sounds like you already made up your mind. As I pointed out ~3min for local > cloning is far from instantaneous and longer then CVS checkout of 2.5 min for > the 12 plugins I was interested in. Obviously it depends on the size of the > repository and the length of its history as git will replay every commit. Yes, I have made up my mind. But it's not all up to me. I don't understand why you are having such poor performance with git. It took about 5 seconds to do a local clone on my machine. Does anyone else have numbers? Do you get the same issues with a Linux kernel git repo? (In reply to comment #8) >> I don't understand why you are having such poor performance with git. It took > about 5 seconds to do a local clone on my machine. Does anyone else have > numbers? Are we talking about the same repo? I am using http://dev.eclipse.org/git/org.eclipse.cdt/org.eclipse.cdt.git > Do you get the same issues with a Linux kernel git repo? What's its url? (In reply to comment #5) > - Clone of the whole repository is slow. Takes for me ~20 min. No noticeable > difference between http:// and git://. CVS checkout is much faster. Took 12 min > to checkout everything ... That sounds like a bandwidth prioritization issue. The .git repository is _smaller_ than all the checked out source, so it certainly shouldn't take longer to clone than to CVS checkout. For my clone of git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt.git .git == 157M Checked out source == 187M There was a post somewhere by Denis on how bandwidth is prioritized at the eclipse gateway -- that's the only explanation I can see for this odd result... > - There is something wrong with the converted from CVS branch in git I think Doug broke it :) He pruned 'old' directories from the CVS repo which will have changed cvsps output. At the moment pull isn't going anywhere for me, and the last incoming change I see from June 7th... (In reply to comment #7) > ... As I pointed out ~3min for local > cloning ... What platform is this on? On Linux with really slow NFS, a clone (without source file checkout) is *fast*: time git clone --no-checkout org.eclipse.cdt org.eclipse.cdt.clone/ real 0m0.213s Actually checking out all the source from the repository is quite slow on NFS: time git clone org.eclipse.cdt org.eclipse.cdt.clone/ real 0m49.586s Are you on a slow filesystem of some sort? (In reply to comment #10) > I think Doug broke it :) He pruned 'old' directories from the CVS repo which > will have changed cvsps output. At the moment pull isn't going anywhere for > me, and the last incoming change I see from June 7th... I apologize for that. Those directories should have been moved before the git mirror was created. The git clone I did this week has the latest tag so it is getting updated. (In reply to comment #10) > (In reply to comment #5) > > - Clone of the whole repository is slow. Takes for me ~20 min. No noticeable > > difference between http:// and git://. CVS checkout is much faster. Took 12 > min > > to checkout everything ... > That sounds like a bandwidth prioritization issue. The .git repository is > _smaller_ than all the checked out source, so it certainly shouldn't take longer > to clone than to CVS checkout. > For my clone of git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt.git > .git == 157M > Checked out source == 187M > There was a post somewhere by Denis on how bandwidth is prioritized at the > eclipse gateway -- that's the only explanation I can see for this odd result... Yes, I think you are right that bandwidth prioritization has to do with it. I also noticed to my surprize that :pserver:anonymous@proxy.eclipse.org:80/cvsroot/tools is twice as fast as :extssh:agvozdev@dev.eclipse.org:/cvsroot/tools. I recall that Dennis aimed for an opposite effect but I suppose there were many hardware/configuration changes on the servers since. > (In reply to comment #7) > > ... As I pointed out ~3min for local > > cloning ... > What platform is this on? On Linux with really slow NFS, a clone (without > source file checkout) is *fast*: > time git clone --no-checkout org.eclipse.cdt org.eclipse.cdt.clone/ > real 0m0.213s That is on local SSD drive but I do just plain git clone (with checkout). You have to do checkout to be able to work anyway aren't you. The result in my latest post (3min) was from cloning inside EGit. I tried git from command line and it was faster, 1.5 min. For the record, that is Win XP, cygwin git 1.6.6.1. All the Git repos are in our prioritized bandwidth. From my home cable, I just did this, right in the middle of the Helios release rush: -bash-3.00$ time git clone git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt.git Initialized empty Git repository in /home/users/toofast/shiw/git/org.eclipse.cdt/.git/ remote: Counting objects: 385338, done. remote: Compressing objects: 100% (71833/71833), done. remote: Total 385338 (delta 205574), reused 385140 (delta 205494) Receiving objects: 100% (385338/385338), 116.78 MiB | 1.12 MiB/s, done. Resolving deltas: 100% (205574/205574), done. Checking out files: 100% (12274/12274), done. real 3m14.262s user 1m0.848s sys 0m26.332s Keep in mind this is a 1GHz machine with a whopping 256M RAM. I tried from an Amazon ec2 instance I have: time git clone git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt.git Initialized empty Git repository in /root/org.eclipse.cdt/.git/ remote: Counting objects: 385338, done. remote: Compressing objects: 100% (71833/71833), done. remote: Total 385338 (delta 205574), reused 385140 (delta 205494) Receiving objects: 100% (385338/385338), 116.78 MiB | 1.40 MiB/s, done. Resolving deltas: 100% (205574/205574), done. real 2m36.642s user 0m15.565s sys 0m2.044s You are transferring 116 Megabytes, which is unfortunate, so you do need a fast link. If it's taking you 20m to clone, you either have a bad link to us, or you have a slow link. But as others have said, you only need to clone once. (In reply to comment #14) > You are transferring 116 Megabytes, which is unfortunate, so you do need a fast > link. If it's taking you 20m to clone, you either have a bad link to us, or you > have a slow link. That is the whole point. I do have a slow link and so prefer smaller granularity of the repository. I am happy with eclipse provided bandwidth and BTW think you are doing fantastic job, Denis :). Just let me use the opportunity to ask you about something from comment#5, maybe you can comment on that: > I also noticed to my surprize that :pserver:anonymous@proxy.eclipse.org:80/cvsroot/tools is twice as fast as :extssh:agvozdev@dev.eclipse.org:/cvsroot/tools. Was it something of a fluke? > That is the whole point. I do have a slow link and so prefer smaller > granularity of the repository. At EclipseCon I was speaking with Shawn Pearce (that guy always makes me feel dumb) and he said there was a way I could pack the repos down to much smaller size (at the expense of lots of disk and computational resources during the pack). His claim is that you do this massive packing once. I'll have to look into that. > I am happy with eclipse provided bandwidth and > BTW think you are doing fantastic job, Denis :). Thanks! > > I also noticed to my surprize that :pserver:anonymous@proxy.eclipse.org:80/cvsroot/tools is twice as fast as :extssh:agvozdev@dev.eclipse.org:/cvsroot/tools. > Was it something of a fluke? It could have been a fluke, but if you think about it, SSH will always be slower than any protocol that does not involve encryption. It's not intentional that pserver gets a better treatment than extssh, but I can't remember where proxy.eclipse.org fits in our QoS rules. I'll have to check. FWIW, at EclipseCon I had expressed the same concerns wrt. bandwidth. Transferring 10 years of history to the user who just wants to submit a patch against HEAD seems a bit inefficient to me. Our CVS->Git migration path also gives committers the option of archiving CVS history and only committing CVS HEAD to Git HEAD. For some reason, though, I think this will be very unpopular. (In reply to comment #17) > FWIW, at EclipseCon I had expressed the same concerns wrt. bandwidth. > Transferring 10 years of history to the user who just wants to submit a patch > against HEAD seems a bit inefficient to me. > > Our CVS->Git migration path also gives committers the option of archiving CVS > history and only committing CVS HEAD to Git HEAD. For some reason, though, I > think this will be very unpopular. Well in our case the vast majority of our history is on HEAD. We only branched for maintenance releases. The ideal scenario is to be able to take everything since a given tag, e.g. CDT_6_0 which would give us all active streams. But I'm not sure that's possible. Failing that, if we could compress the repo better, then we definitely should look at that. And even if that fails, I'm sorry you have such a slow link Andrew, but if it's only a one time 20 minute hit to get you set up with a proper CDT source checkout, then I don't think that would sway me. I've spent many of a 20 minutes checking out the CDT out of CVS with a fast link so I can properly test my changes against the whole thing. And I seem to get by. (In reply to comment #18) > And even if that fails, I'm sorry you have such a slow link Andrew, but if it's > only a one time 20 minute hit to get you set up with a proper CDT source > checkout, then I don't think that would sway me. I've spent many of a 20 minutes > checking out the CDT out of CVS with a fast link so I can properly test my > changes against the whole thing. And I seem to get by. Well it is just getting more and more over-hyped with every post, sorry for that. I'll be fine, it is just a matter of preference for me. The important thing is to move to git in principle. And I'd advocate taking all the history. There is not much of documentation on the code and I've taken quite a few clues from CVS history before. It occurs to me that one thing we should keep in mind when comparing cvs vs git performance is that cvs has compression support, too. I've seen it make a pretty big difference with other cvs repositories, though I have never tried turning it on for dev.eclipse.org. I know by default the Eclipse cvs client has compression turned off. Just something to consider when trying to compare checkout times. (In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #6) > I don't understand why you are having such poor performance with git. It took > about 5 seconds to do a local clone on my machine. Does anyone else have > numbers? > Firstly, I'm not a commiter. I use both CVS and git over a slow link (mainly less than 20Kibps), so both initial cloning in git, and first checkout in CVS are slow (30+ mins). But git is far faster than CVS when in comes to other operations. In CVS, it takes something like 2 mins whenever I compare two files to make a patch. In git it is instantaneous. It doesn't really matter how much time the initial cloning takes (I can leave my machine to clone the repository at midnight), but waiting 2 minutes or so for comparing two files that differ only in 5 lines [and being compelled to do it several times a day] is just a pint in the neck. (In reply to comment #20) > It occurs to me that one thing we should keep in mind when comparing cvs vs git > performance is that cvs has compression support, too. I've seen it make a > pretty big difference with other cvs repositories, though I have never tried > turning it on for dev.eclipse.org. I know by default the Eclipse cvs client has > compression turned off. Just something to consider when trying to compare > checkout times. That's a good point. But in the long run it won't matter. CVS is scheduled to get turned off at Eclipse in the next couple of years (I believe 2012 was the requested date from the Foundation). There apparently is a flaw that they have uncovered and CVS is no longer maintained. So we have to move anyway. And it sounds from all other reports that the only issue is the initial clone time and we'll see if we can reduce that by compressing the git data (which is already smaller than a full checkout). For my next step I'm going to try and create what I consider the minimal CDT core repository, essentially removing debug related things and the optional features, i.e. include mainly cdt.core, cdt.ui, the build plug-ins, and the documentation. My guess, though, is that will still be over half the size of the single repository given the long history of cdt.core and the major churn in the build plugins. So, I will also try and create the repository with only the 6.0 and 7.0 branches and HEAD since the 6.0 was branched. This has a much better chance of making things small. And as Denis says, it doesn't make too much sense that you would need such long ago history for patches today. But I need to be able to run git cvsimport on the Eclipse server first and am working with webmaster on that. You could also use git filter-branch to remove the bits of the repository you're not interested in. A bit more experimentation to work out what the minimum repo-size is with all source (i.e. start git repository from current HEAD): // Copy head source into a clean directory > cp -r org.eclipse.cdt/* org.eclipse.cdt-initial-commit/ > cd org.eclipse.cdt-initial-commit/ // Add all the source to a new repository > git init . > git add -A . > git commit > du -sh .git 100M .git // pack the repository > git repack -a > du -sh .git 142M .git // garbage collect > git gc --aggressive > du -sh .git 45M .git So the git repository for source, with no history, is 45M. If I gc --aggressive the org.eclipse.cdt checkout (with full history), the size reduces to 120M. So if we want all source in one repository, but filter some history, initial size will be between 45M -> 120M and grow from there. I think it's really neat having all the source and history in one place. However I wouldn't object to having a small number of repositories if this is what others wanted for bandwidth reasons. I have used accurev, somewhat similar to git with regard to cloning and branching. One aspect when deciding granularity is that the branching/tagging is done for the hole repository (unlike CVS per module/file). Thus the sources from the repository should have the same the life cycle in terms of maintainance branches, release tags etc. From this point of view it make sense for all bundles that a part of one single feature to be in the same git repository as they share same life-cycle. IMO, the question would be if lists of features provided with CDT package to be split in different repo in case the life-cycle could be different (e.g. an optional CDT feature to be released one month later then annual eclipse release train) (In reply to comment #25) > So if we want all source in one repository, but filter some history, initial > size will be between 45M -> 120M and grow from there. That's great information. Thanks, James. > I think it's really neat having all the source and history in one place. > However I wouldn't object to having a small number of repositories if this is > what others wanted for bandwidth reasons. I agree with that. I thought it was wicked cool flipping between cdt 6 and head and having it only take seconds. You could also go way back and take a look at the CDT pre-new project model and see how things used to be :). I think we should keep all history. What is the difference between full history and only 6.0 anyway? OK, I did a git cvsimport for the following plugins: org.eclipse.cdt.core org.eclipse.cdt.doc.isv org.eclipse.cdt.core.aix org.eclipse.cdt.doc.user org.eclipse.cdt.core.linux org.eclipse.cdt.make.core org.eclipse.cdt.core.linux.ia64 org.eclipse.cdt.make.core.tests org.eclipse.cdt.core.linux.ppc org.eclipse.cdt.make.ui org.eclipse.cdt.core.linux.x86 org.eclipse.cdt.managedbuilder.core org.eclipse.cdt.core.linux.x86_64 org.eclipse.cdt.managedbuilder.core.tests org.eclipse.cdt.core.macosx org.eclipse.cdt.managedbuilder.gnu.ui org.eclipse.cdt.core.qnx org.eclipse.cdt.managedbuilder.ui org.eclipse.cdt.core.solaris org.eclipse.cdt.managedbuilder.ui.tests org.eclipse.cdt.core.tests org.eclipse.cdt.ui org.eclipse.cdt.core.win32 org.eclipse.cdt.ui.tests This gives the core and build and the doc plug-ins along with the tests. After a gc --aggressive, I get a .git directory of 68M which is more than half of the 120 for everything. Which is what I expected. I assumed when we discussed breaking it into smaller repos we meant repos that were significantly less than half the size of the big one. Do we have a wiki page on how to use EGit with CDT?
I'm trying to get it setup to be able to give input, but it's not that obvious to me. I don't have any git experience outside of the tutorial at EclipseCon.
Here is what I did (with points I wasn't sure of):
1- Install EGit :-)
2- Open EGit perspective
3- Right-click on Git Repositories view and select "Import Git Repository..."
4- Paste git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt.git into the URI box. This fills most other boxes which I didn't touch. Press Next. I assume the protocol we use should be 'git' for now, but when we officially move to git, we will use 'git+ssh' to do commits?
5- All branches appear. Should I leave them all checked? I did and pressed Next.
6- The next windows shows the local dir to be <workspace>/org.eclipse.cdt with both other fields saying 'origin'. I left as is and pressed Finished.
The entire creation of the local repo took only 2m45sec (at 8:15am on a Canadian holiday).
In my workspace I then have
org.eclipse.cdt .metadata/
> \du -ks org.eclipse.cdt .metadata/
299712 org.eclipse.cdt
404 .metadata/
That is 300M. Isn't that bigger than what you guys get? Maybe I should not have taken all the branches?
What do I do then to check-out? If I try to select only the plugins I care about from under "Working directory" in the Git Repositories view, I notice I cannot do an Import for a multi-selection, which seems really bad. This makes me think I should be importing a top-level element instead. Should I really import the entire "Working directory" entry? That is the only one that contains all the plugins I want. Or?
(In reply to comment #30) > What do I do then to check-out? If I try to select only the plugins I care > about from under "Working directory" in the Git Repositories view, I notice I > cannot do an Import for a multi-selection, which seems really bad. This makes > me think I should be importing a top-level element instead. Should I really > import the entire "Working directory" entry? That is the only one that > contains all the plugins I want. Or? I found after I try to Import the entire "Working directory" I can select which plugins I want. So now I have my plugins of interest and everything compiles well. Should I have a single Git repo for my machine, or should I have one for each workspace? I guess I should have a single repo and access it from different workspaces. There is a little section "Git" in http://wiki.eclipse.org/CDT/Developer/FAQ#Git (prominently referring to James' repository) and "Git for Committers" link got tons of useful tips. (In reply to comment #31) > (In reply to comment #30) > > What do I do then to check-out? If I try to select only the plugins I care > > about from under "Working directory" in the Git Repositories view, I notice I > > cannot do an Import for a multi-selection, which seems really bad. This makes > > me think I should be importing a top-level element instead. Should I really > > import the entire "Working directory" entry? That is the only one that > > contains all the plugins I want. Or? > I found after I try to Import the entire "Working directory" I can select which > plugins I want. So now I have my plugins of interest and everything compiles > well. That flow is OK when you know it but I had same problem as you checking out the first time. > Should I have a single Git repo for my machine, or should I have one for each > workspace? I guess I should have a single repo and access it from different > workspaces. It depends on your needs but if you have OK internet connection you can as well have a cloned repository from remote one for each workspace. I would prefer that regardless of what I have said about slow connection (it was exaggerated btw). And, you can push/pull between your clones at will. (In reply to comment #31) > Should I have a single Git repo for my machine, or should I have one for each > workspace? I guess I should have a single repo and access it from different > workspaces. The repository information is entirely contained within the .git directory at the top-level. The source files corresponding to the HEAD of a particular branch is checked out next to this .git directory. As Eclipse doesn't like you having a workspace open more than once (or a project open concurrently in more than one workspace) you would, in general, git clone the repository for all the different workspaces you want. It's worth noting that git clone locally will use hard links to minimize the disk space of the repository (under .git). The checkout will obviously be copied. (In reply to comment #30) > That is 300M. Isn't that bigger than what you guys get? Maybe I should not > have taken all the branches? This is both the repository and the checkout. Compare du -sh org.eclipse.cdt/.git with du -sh org.eclipse.cdt/*. When you clone you're copying / linking to the contents of the .git directory. When you checkout HEAD of a particular branch source files are created alongside the repository. All git actions then modify the checked out files' local repository, and when you're happy you can push / pull changes from your repository to other 'remote' repositories. (In reply to comment #32) > There is a little section "Git" in > http://wiki.eclipse.org/CDT/Developer/FAQ#Git (prominently referring to James' > repository) and "Git for Committers" link got tons of useful tips. Hmm that's ancient and should probably remove that from the FAQ. Perhaps better is the egit docs themselves: http://wiki.eclipse.org/EGit/User_Guide (In reply to comment #33) > (In reply to comment #31) > > Should I have a single Git repo for my machine, or should I have one for each > > workspace? I guess I should have a single repo and access it from different > > workspaces. > > The repository information is entirely contained within the .git directory at > the top-level. The source files corresponding to the HEAD of a particular > branch is checked out next to this .git directory. This is where I got confused. I didn't know HEAD was automatically checked out (from git but not into Eclipse). So, when I check-out (import) into Eclipse, it imports the files from the already checked-out HEAD. > As Eclipse doesn't like you having a workspace open more than once (or a project > open concurrently in more than one workspace) you would, in general, git clone > the repository for all the different workspaces you want. > It's worth noting that git clone locally will use hard links to minimize the > disk space of the repository (under .git). The checkout will obviously be > copied. Ok, so instead of doing a new checkout in a new workspace like I do in CVS, I need to clone the repo first and then import the plugins in the new workspace. If not, then my import to the new workspace will contain all changes of the first workspace. It now makes sense why the EGit operation is to "import" and not to 'check-out'. > (In reply to comment #30) > > That is 300M. Isn't that bigger than what you guys get? Maybe I should not > > have taken all the branches? > > This is both the repository and the checkout. Compare du -sh > org.eclipse.cdt/.git with du -sh org.eclipse.cdt/*. Ah yes. I get it now. > When you clone you're > copying / linking to the contents of the .git directory. When you checkout HEAD > of a particular branch source files are created alongside the repository. All > git actions then modify the checked out files' local repository, and when you're > happy you can push / pull changes from your repository to other 'remote' > repositories. The subtlety is that EGit does the check-out automatically. But it is much more clear now. Thanks! (In reply to comment #34) > Perhaps better is the egit docs themselves: > http://wiki.eclipse.org/EGit/User_Guide Awesome, I'll have a look. Yeah, it is quite a different paradigm using egit but it's not unusual. The ClearCase eclipse plugins do the same, and I believe Perforce does as well. But the idea is that the checkout is done into the file system, not the workspace. With CVS, you checkout Eclipse projects, but in git you check out a directory tree and then import Eclipse projects from there. So, from there, the best way to think if EGit is as an automation for working with the external source tree. Unfortunately that also causes problems which I hope they resolve as they are also trying to fit into the Eclipse team system which isn't set up well for a non-resource managed source control system. But they are working on cleaning that up for Sept. as well. For the moment we're going with 1 repo, with history of the branches and the tags @ ~115M. We can split out components later as required. There is a problem with some slowness of Synchronize (takes about 3 min for me) and commits for that big a clone of CDT repository from eclipse.org but I suppose it's EGit problem and it's fixable. Synchronize was down to acceptable 30 sec. at one point of Egit development. The synchronise flow is still pretty useless - I'm no sure they know what they want there... In happier news theyve just got a staging view :). I wouldn't say it's useless. I use it fairly often before committing changes in multiple files even if slowish. Will check out the staging view flow. |