Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 316211 - Granularity of CDT git repository
Summary: Granularity of CDT git repository
Status: RESOLVED FIXED
Alias: None
Product: CDT
Classification: Tools
Component: cdt-core (show other bugs)
Version: 7.0   Edit
Hardware: PC All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact: Doug Schaefer CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 316208
  Show dependency tree
 
Reported: 2010-06-08 17:00 EDT by James Blackburn CLA
Modified: 2011-05-12 18:36 EDT (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description James Blackburn CLA 2010-06-08 17:00:22 EDT
We should consider what granularity we want the CDT repository to be at.

We can chose anywhere between 
  - .project level repos 
  - component level repos
  - one git repo for all of org.eclipse.cdt

Doug says:
"org.eclipse.cdt currently checks out at 187MB. The .git directory is at 150MB."
At this size we may chose to have all of CDT in one git repository
Comment 1 Andrew Gvozdev CLA 2010-06-09 11:24:14 EDT
Let me quote my own email to cdt-dev:

I don't like all-in-one granularity. Normally I am not interested at all in debug plugins (sadly, there is no support of dbx). I do not want to pull and recompile those (and occasionally hit problems) every time I update from main repository. My first preference is *[component level repos]*, and then [.project level repos] organized in sets on component level and global CDT level. I am sure there are some more users like me out there too.
Comment 2 Doug Schaefer CLA 2010-06-09 13:02:55 EDT
I'd prefer to see quantitative arguments. For example, James thought the all-in-one was huge, but when you look at the numbers, it's actually manageable for him.

So, how much traffic does debug really generate, for example?
Comment 3 Andrew Gvozdev CLA 2010-06-09 13:22:26 EDT
(In reply to comment #2)
> I'd prefer to see quantitative arguments. For example, James thought the
> all-in-one was huge, but when you look at the numbers, it's actually manageable
> for him.
> So, how much traffic does debug really generate, for example?
Doug, I already said that the traffic won't be a ground for negative vote. It is meant to be an illustration that one-for-all repository is wasteful. Just take this argument FWIW. It is a matter of preference, not a show stopper.
Comment 4 James Blackburn CLA 2010-06-09 13:35:11 EDT
(In reply to comment #2)
> So, how much traffic does debug really generate, for example?

As it happens I've got the answer to this in a shell :)

I pulled a bunch of changes into an old org.eclipse.cdt I had lying around from March:

bash:jamesb:xl-cbga-20:33081> du -sh .git/
137M    .git/

bash:jamesb:xl-cbga-20:33084> git remote -v
origin  git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt.git (fetch)
origin  git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt.git (push)

bash:jamesb:xl-cbga-20:33085> git pull origin
remote: Counting objects: 29103, done.
remote: Compressing objects: 100% (7214/7214), done.
remote: Total 23820 (delta 11532), reused 21802 (delta 9896)
Receiving objects: 100% (23820/23820), 17.67 MiB | 268 KiB/s, done.
Resolving deltas: 100% (11532/11532), completed with 858 local objects.
From git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdtFrom git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt
 * [new tag]         v201003191033 -> v201003191033
 * [new tag]         v201003221139 -> v201003221139
...
....
 4255 files changed, 133841 insertions(+), 74392 deletions(-)

bash:jamesb:xl-cbga-20:33104> du -sh .git
158M    .git


So between March and today, there's been 17MB of change (as reported by fetch). ~1.5M a week which doesn't seem to be too bad to me. 
Coming up to release there was a lot of stuff being created: the output shows 1.5M of new documentation(mostly images),  an android plugin coming in, loads of codan, loads of dsf-gdb tests, some edc and xlc creation too. 

I'm not sure if the last 3 months is indicative of the usual change pace, but a few hundred kB per day isn't too bad.
Comment 5 Andrew Gvozdev CLA 2010-06-21 23:50:15 EDT
(In reply to comment #2)
> I'd prefer to see quantitative arguments. For example, James thought the
Hmm, trying to use git repository from http://dev.eclipse.org/git/org.eclipse.cdt/org.eclipse.cdt.git to rebase branch ScannerDiscovery61 and having some observations:

- Clone of the whole repository is slow. Takes for me ~20 min. No noticeable difference between http:// and git://. CVS checkout is much faster. Took 12 min to checkout everything and only 2.5 min to checkout the 12 plugins I am interested in. I realize that I am getting the whole history with git but that is a dubious advantage as I am connected all the time and would much prefer to see fresh commits in history right away rather then keep pulling from the git repo. That's a point toward smaller granularity. I am often working with different/new workspaces and frankly 20 min to clone is just too much. Well, I guess I can clone locally which took ~3 min. Sigh, smaller repositories would be so much faster.

- There is something wrong with the converted from CVS branch in git repository. After checkout I am getting some files from older revisions than on the branch in CVS. For example, CommandLauncher.java is 1.13 despite 1.18 being on the branch in CVS. ICommandLauncher is missing at all. I am afraid rebasing in those conditions probably won't go too well and I am expecting a lot of conflicts on top of that.
Comment 6 Doug Schaefer CLA 2010-06-22 10:37:56 EDT
Well, it takes me 5 minutes to check out org.eclipse.cdt for the first time. I guess I have a faster link. But either way, you only ever have to do that once. Cloning locally is almost instantaneous if you need multiple copies. git is very fast once you have that done.

I don't think I'd accept anything less than the feature and all the plug-ins that go into the C/C++ IDE. If your test environment doesn't include the latest source for those plug-ins, you have an invalid test environment. And checking out everything in cdt-main.psf takes quite a long time as well, and every update after that is painful. git updates are lighting fast.

That and since there is actually little outside cdt-mail, chopping buys us little. But I would be interested if there are ways to chop out history. I would think we only need the live streams when we cut over. Or something like that.
Comment 7 Andrew Gvozdev CLA 2010-06-22 11:02:31 EDT
(In reply to comment #6)
> Well, it takes me 5 minutes to check out org.eclipse.cdt for the first time. I
> guess I have a faster link. But either way, you only ever have to do that once.
> Cloning locally is almost instantaneous if you need multiple copies. git is very
> fast once you have that done.
Sounds like you already made up your mind. As I pointed out ~3min for local cloning is far from instantaneous and longer then CVS checkout of 2.5 min for the 12 plugins I was interested in. Obviously it depends on the size of the repository and the length of its history as git will replay every commit.
Comment 8 Doug Schaefer CLA 2010-06-22 13:39:09 EDT
(In reply to comment #7)
> (In reply to comment #6)
> > Well, it takes me 5 minutes to check out org.eclipse.cdt for the first time. I
> > guess I have a faster link. But either way, you only ever have to do that once.
> > Cloning locally is almost instantaneous if you need multiple copies. git is very
> > fast once you have that done.
> Sounds like you already made up your mind. As I pointed out ~3min for local
> cloning is far from instantaneous and longer then CVS checkout of 2.5 min for
> the 12 plugins I was interested in. Obviously it depends on the size of the
> repository and the length of its history as git will replay every commit.

Yes, I have made up my mind. But it's not all up to me.

I don't understand why you are having such poor performance with git. It took about 5 seconds to do a local clone on my machine. Does anyone else have numbers?

Do you get the same issues with a Linux kernel git repo?
Comment 9 Andrew Gvozdev CLA 2010-06-22 13:54:04 EDT
(In reply to comment #8)
>> I don't understand why you are having such poor performance with git. It took
> about 5 seconds to do a local clone on my machine. Does anyone else have
> numbers?
Are we talking about the same repo? I am using http://dev.eclipse.org/git/org.eclipse.cdt/org.eclipse.cdt.git

> Do you get the same issues with a Linux kernel git repo?
What's its url?
Comment 10 James Blackburn CLA 2010-06-23 04:47:18 EDT
(In reply to comment #5)
> - Clone of the whole repository is slow. Takes for me ~20 min. No noticeable
> difference between http:// and git://. CVS checkout is much faster. Took 12 min
> to checkout everything ...

That sounds like a bandwidth prioritization issue. The .git repository is _smaller_ than all the checked out source, so it certainly shouldn't take longer to clone than to CVS checkout.

For my clone of git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt.git
.git == 157M
Checked out source == 187M
There was a post somewhere by Denis on how bandwidth is prioritized at the eclipse gateway -- that's the only explanation I can see for this odd result...

> - There is something wrong with the converted from CVS branch in git

I think Doug broke it :)  He pruned 'old' directories from the CVS repo which will have changed cvsps output.  At the moment pull isn't going anywhere for me, and the last incoming change I see from June 7th...

(In reply to comment #7)
> ... As I pointed out ~3min for local
> cloning ...

What platform is this on?  On Linux with really slow NFS, a clone (without source file checkout) is *fast*:
time git clone --no-checkout org.eclipse.cdt org.eclipse.cdt.clone/
real    0m0.213s

Actually checking out all the source from the repository is quite slow on NFS:
time git clone org.eclipse.cdt org.eclipse.cdt.clone/
real    0m49.586s

Are you on a slow filesystem of some sort?
Comment 11 Doug Schaefer CLA 2010-06-23 08:35:44 EDT
(In reply to comment #10)
> I think Doug broke it :)  He pruned 'old' directories from the CVS repo which
> will have changed cvsps output.  At the moment pull isn't going anywhere for
> me, and the last incoming change I see from June 7th...

I apologize for that. Those directories should have been moved before the git mirror was created.

The git clone I did this week has the latest tag so it is getting updated.
Comment 12 Andrew Gvozdev CLA 2010-06-23 09:17:27 EDT
(In reply to comment #10)
> (In reply to comment #5)
> > - Clone of the whole repository is slow. Takes for me ~20 min. No noticeable
> > difference between http:// and git://. CVS checkout is much faster. Took 12
> min
> > to checkout everything ...
> That sounds like a bandwidth prioritization issue. The .git repository is
> _smaller_ than all the checked out source, so it certainly shouldn't take longer
> to clone than to CVS checkout.
> For my clone of git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt.git
> .git == 157M
> Checked out source == 187M
> There was a post somewhere by Denis on how bandwidth is prioritized at the
> eclipse gateway -- that's the only explanation I can see for this odd result...
Yes, I think you are right that bandwidth prioritization has to do with it. I also noticed to my surprize that :pserver:anonymous@proxy.eclipse.org:80/cvsroot/tools is twice as fast as :extssh:agvozdev@dev.eclipse.org:/cvsroot/tools. I recall that Dennis aimed for an opposite effect but I suppose there were many hardware/configuration changes on the servers since.

> (In reply to comment #7)
> > ... As I pointed out ~3min for local
> > cloning ...
> What platform is this on?  On Linux with really slow NFS, a clone (without
> source file checkout) is *fast*:
> time git clone --no-checkout org.eclipse.cdt org.eclipse.cdt.clone/
> real    0m0.213s
That is on local SSD drive but I do just plain git clone (with checkout). You have to do checkout to be able to work anyway aren't you. The result in my latest post (3min) was from cloning inside EGit. I tried git from command line and it was faster, 1.5 min. For the record, that is Win XP, cygwin git 1.6.6.1.
Comment 13 Denis Roy CLA 2010-06-23 11:39:31 EDT
All the Git repos are in our prioritized bandwidth.  From my home cable, I just did this, right in the middle of the Helios release rush:

-bash-3.00$ time git clone git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt.git
Initialized empty Git repository in /home/users/toofast/shiw/git/org.eclipse.cdt/.git/
remote: Counting objects: 385338, done.
remote: Compressing objects: 100% (71833/71833), done.
remote: Total 385338 (delta 205574), reused 385140 (delta 205494)
Receiving objects: 100% (385338/385338), 116.78 MiB | 1.12 MiB/s, done.
Resolving deltas: 100% (205574/205574), done.
Checking out files: 100% (12274/12274), done.

real    3m14.262s
user    1m0.848s
sys     0m26.332s

Keep in mind this is a 1GHz machine with a whopping 256M RAM.
Comment 14 Denis Roy CLA 2010-06-23 11:59:29 EDT
I tried from an Amazon ec2 instance I have:

time git clone git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt.git
Initialized empty Git repository in /root/org.eclipse.cdt/.git/
remote: Counting objects: 385338, done.
remote: Compressing objects: 100% (71833/71833), done.
remote: Total 385338 (delta 205574), reused 385140 (delta 205494)
Receiving objects: 100% (385338/385338), 116.78 MiB | 1.40 MiB/s, done.
Resolving deltas: 100% (205574/205574), done.

real    2m36.642s
user    0m15.565s
sys     0m2.044s

You are transferring 116 Megabytes, which is unfortunate, so you do need a fast link.  If it's taking you 20m to clone, you either have a bad link to us, or you have a slow link.  But as others have said, you only need to clone once.
Comment 15 Andrew Gvozdev CLA 2010-06-23 12:26:43 EDT
(In reply to comment #14)
> You are transferring 116 Megabytes, which is unfortunate, so you do need a fast
> link.  If it's taking you 20m to clone, you either have a bad link to us, or you
> have a slow link.
That is the whole point. I do have a slow link and so prefer smaller granularity of the repository. I am happy with eclipse provided bandwidth and BTW think you are doing fantastic job, Denis :).
Just let me use the opportunity to ask you about something from comment#5, maybe you can comment on that:
> I also noticed to my surprize that :pserver:anonymous@proxy.eclipse.org:80/cvsroot/tools is twice as fast as :extssh:agvozdev@dev.eclipse.org:/cvsroot/tools.
Was it something of a fluke?
Comment 16 Denis Roy CLA 2010-06-23 15:01:09 EDT
> That is the whole point. I do have a slow link and so prefer smaller
> granularity of the repository. 

At EclipseCon I was speaking with Shawn Pearce (that guy always makes me feel dumb) and he said there was a way I could pack the repos down to much smaller size (at the expense of lots of disk and computational resources during the pack).  His claim is that you do this massive packing once.  I'll have to look into that.

> I am happy with eclipse provided bandwidth and
> BTW think you are doing fantastic job, Denis :).

Thanks!

> > I also noticed to my surprize that :pserver:anonymous@proxy.eclipse.org:80/cvsroot/tools is twice as fast as :extssh:agvozdev@dev.eclipse.org:/cvsroot/tools.
> Was it something of a fluke?

It could have been a fluke, but if you think about it, SSH will always be slower than any protocol that does not involve encryption.  It's not intentional that pserver gets a better treatment than extssh, but I can't remember where proxy.eclipse.org fits in our QoS rules. I'll have to check.
Comment 17 Denis Roy CLA 2010-06-23 15:03:29 EDT
FWIW, at EclipseCon I had expressed the same concerns wrt. bandwidth.  Transferring 10 years of history to the user who just wants to submit a patch against HEAD seems a bit inefficient to me.

Our CVS->Git migration path also gives committers the option of archiving CVS history and only committing CVS HEAD to Git HEAD.  For some reason, though, I think this will be very unpopular.
Comment 18 Doug Schaefer CLA 2010-06-23 16:49:03 EDT
(In reply to comment #17)
> FWIW, at EclipseCon I had expressed the same concerns wrt. bandwidth. 
> Transferring 10 years of history to the user who just wants to submit a patch
> against HEAD seems a bit inefficient to me.
> 
> Our CVS->Git migration path also gives committers the option of archiving CVS
> history and only committing CVS HEAD to Git HEAD.  For some reason, though, I
> think this will be very unpopular.

Well in our case the vast majority of our history is on HEAD. We only branched for maintenance releases.

The ideal scenario is to be able to take everything since a given tag, e.g. CDT_6_0 which would give us all active streams. But I'm not sure that's possible. Failing that, if we could compress the repo better, then we definitely should look at that.

And even if that fails, I'm sorry you have such a slow link Andrew, but if it's only a one time 20 minute hit to get you set up with a proper CDT source checkout, then I don't think that would sway me. I've spent many of a 20 minutes checking out the CDT out of CVS with a fast link so I can properly test my changes against the whole thing. And I seem to get by.
Comment 19 Andrew Gvozdev CLA 2010-06-23 17:16:34 EDT
(In reply to comment #18)
> And even if that fails, I'm sorry you have such a slow link Andrew, but if it's
> only a one time 20 minute hit to get you set up with a proper CDT source
> checkout, then I don't think that would sway me. I've spent many of a 20 minutes
> checking out the CDT out of CVS with a fast link so I can properly test my
> changes against the whole thing. And I seem to get by.
Well it is just getting more and more over-hyped with every post, sorry for that. I'll be fine, it is just a matter of preference for me. The important thing is to move to git in principle.
And I'd advocate taking all the history. There is not much of documentation on the code and I've taken quite a few clues from CVS history before.
Comment 20 John Cortell CLA 2010-06-24 14:24:40 EDT
It occurs to me that one thing we should keep in mind when comparing cvs vs git performance is that cvs has compression support, too. I've seen it make a pretty big difference with other cvs repositories, though I have never tried turning it on for dev.eclipse.org. I know by default the Eclipse cvs client has compression turned off. Just something to consider when trying to compare checkout times.
Comment 21 Meisam CLA 2010-06-28 07:08:37 EDT
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)

> I don't understand why you are having such poor performance with git. It took
> about 5 seconds to do a local clone on my machine. Does anyone else have
> numbers?
> 
Firstly, I'm not a commiter.

I use both CVS and git over a slow link (mainly less than 20Kibps), so both initial cloning in git, and first checkout in CVS are slow (30+ mins). But git is far faster than CVS when in comes to other operations. In CVS, it takes something like 2 mins whenever I compare two files to make a patch. In git it is instantaneous. It doesn't really matter how much time the initial cloning takes (I can leave my machine to clone the repository at midnight), but waiting 2 minutes or so for comparing two files that differ only in 5 lines [and being compelled to do it several times a day] is just a pint in the neck.
Comment 22 Doug Schaefer CLA 2010-06-28 21:36:21 EDT
(In reply to comment #20)
> It occurs to me that one thing we should keep in mind when comparing cvs vs git
> performance is that cvs has compression support, too. I've seen it make a
> pretty big difference with other cvs repositories, though I have never tried
> turning it on for dev.eclipse.org. I know by default the Eclipse cvs client has
> compression turned off. Just something to consider when trying to compare
> checkout times.

That's a good point. But in the long run it won't matter. CVS is scheduled to get turned off at Eclipse in the next couple of years (I believe 2012 was the requested date from the Foundation). There apparently is a flaw that they have uncovered and CVS is no longer maintained.

So we have to move anyway. And it sounds from all other reports that the only issue is the initial clone time and we'll see if we can reduce that by compressing the git data (which is already smaller than a full checkout).
Comment 23 Doug Schaefer CLA 2010-06-29 11:30:44 EDT
For my next step I'm going to try and create what I consider the minimal CDT core repository, essentially removing debug related things and the optional features, i.e. include mainly cdt.core, cdt.ui, the build plug-ins, and the documentation.

My guess, though, is that will still be over half the size of the single repository given the long history of cdt.core and the major churn in the build plugins.

So, I will also try and create the repository with only the 6.0 and 7.0 branches and HEAD since the 6.0 was branched. This has a much better chance of making things small. And as Denis says, it doesn't make too much sense that you would need such long ago history for patches today.

But I need to be able to run git cvsimport on the Eclipse server first and am working with webmaster on that.
Comment 24 James Blackburn CLA 2010-06-29 11:33:42 EDT
You could also use git filter-branch to remove the bits of the repository you're not interested in.
Comment 25 James Blackburn CLA 2010-06-29 12:32:27 EDT
A bit more experimentation to work out what the minimum repo-size is with all source (i.e. start git repository from current HEAD):

// Copy head source into a clean directory
> cp -r org.eclipse.cdt/* org.eclipse.cdt-initial-commit/
> cd org.eclipse.cdt-initial-commit/
// Add all the source to a new repository
> git init .
> git add -A .
> git commit

> du -sh .git
100M    .git
// pack the repository
> git repack -a
> du -sh .git
142M    .git
// garbage collect
> git gc --aggressive
> du -sh .git
45M     .git

So the git repository for source, with no history, is 45M.

If I gc --aggressive the org.eclipse.cdt checkout (with full history), the size reduces to 120M.

So if we want all source in one repository, but filter some history, initial size will be between 45M -> 120M and grow from there.


I think it's really neat having all the source and history in one place. However I wouldn't object to having a small number of repositories if this is what others wanted for bandwidth reasons.
Comment 26 Teodor Madan CLA 2010-06-29 13:01:20 EDT
I have used accurev, somewhat similar to git with regard to cloning and branching.

One aspect when deciding granularity is that the branching/tagging is done for the hole repository (unlike CVS per module/file). Thus the sources from the repository should have the same the life cycle in terms of maintainance branches, release tags etc.

From this point of view it make sense for all bundles that a part of one single feature to be in the same git repository as they share same life-cycle.

IMO, the question would be if lists of features provided with CDT package to be split in different repo in case the life-cycle could be different (e.g. an optional CDT feature to be released one month later then annual eclipse release train)
Comment 27 Doug Schaefer CLA 2010-06-30 13:56:40 EDT
(In reply to comment #25)
> So if we want all source in one repository, but filter some history, initial
> size will be between 45M -> 120M and grow from there.

That's great information. Thanks, James.
 
> I think it's really neat having all the source and history in one place.
> However I wouldn't object to having a small number of repositories if this is
> what others wanted for bandwidth reasons.

I agree with that. I thought it was wicked cool flipping between cdt 6 and head and having it only take seconds. You could also go way back and take a look at the CDT pre-new project model and see how things used to be :).
Comment 28 Elena Laskavaia CLA 2010-06-30 15:11:54 EDT
I think we should keep all history. What is the difference between full history and only 6.0 anyway?
Comment 29 Doug Schaefer CLA 2010-06-30 17:57:34 EDT
OK, I did a git cvsimport for the following plugins:

org.eclipse.cdt.core               org.eclipse.cdt.doc.isv
org.eclipse.cdt.core.aix           org.eclipse.cdt.doc.user
org.eclipse.cdt.core.linux         org.eclipse.cdt.make.core
org.eclipse.cdt.core.linux.ia64    org.eclipse.cdt.make.core.tests
org.eclipse.cdt.core.linux.ppc     org.eclipse.cdt.make.ui
org.eclipse.cdt.core.linux.x86     org.eclipse.cdt.managedbuilder.core
org.eclipse.cdt.core.linux.x86_64  org.eclipse.cdt.managedbuilder.core.tests
org.eclipse.cdt.core.macosx        org.eclipse.cdt.managedbuilder.gnu.ui
org.eclipse.cdt.core.qnx           org.eclipse.cdt.managedbuilder.ui
org.eclipse.cdt.core.solaris       org.eclipse.cdt.managedbuilder.ui.tests
org.eclipse.cdt.core.tests         org.eclipse.cdt.ui
org.eclipse.cdt.core.win32         org.eclipse.cdt.ui.tests

This gives the core and build and the doc plug-ins along with the tests. After a gc --aggressive, I get a .git directory of 68M which is more than half of the 120 for everything. Which is what I expected.

I assumed when we discussed breaking it into smaller repos we meant repos that were significantly less than half the size of the big one.
Comment 30 Marc Khouzam CLA 2010-07-01 08:39:01 EDT
Do we have a wiki page on how to use EGit with CDT?
I'm trying to get it setup to be able to give input, but it's not that obvious to me.  I don't have any git experience outside of the tutorial at EclipseCon.

Here is what I did (with points I wasn't sure of):
1- Install EGit :-)
2- Open EGit perspective
3- Right-click on Git Repositories view and select "Import Git Repository..."
4- Paste git://dev.eclipse.org/org.eclipse.cdt/org.eclipse.cdt.git into the URI box.  This fills most other boxes which I didn't touch. Press Next.  I assume the protocol we use should be 'git' for now, but when we officially move to git, we will use 'git+ssh' to do commits?
5- All branches appear.  Should I leave them all checked?  I did and pressed Next.
6- The next windows shows the local dir to be <workspace>/org.eclipse.cdt with both other fields saying 'origin'.  I left as is and pressed Finished.

The entire creation of the local repo took only 2m45sec (at 8:15am on a Canadian holiday).

In my workspace I then have
org.eclipse.cdt .metadata/

> \du -ks org.eclipse.cdt .metadata/
299712  org.eclipse.cdt
404     .metadata/

That is 300M.  Isn't that bigger than what you guys get?  Maybe I should not have taken all the branches?

What do I do then to check-out?  If I try to select only the plugins I care about from under "Working directory" in the Git Repositories view, I notice I cannot do an Import for a multi-selection, which seems really bad.  This makes me think I should be importing a top-level element instead.  Should I really import the entire "Working directory" entry?  That is the only one that contains all the plugins I want.  Or?
Comment 31 Marc Khouzam CLA 2010-07-01 09:52:39 EDT
(In reply to comment #30)
> What do I do then to check-out?  If I try to select only the plugins I care
> about from under "Working directory" in the Git Repositories view, I notice I
> cannot do an Import for a multi-selection, which seems really bad.  This makes
> me think I should be importing a top-level element instead.  Should I really
> import the entire "Working directory" entry?  That is the only one that
> contains all the plugins I want.  Or?

I found after I try to Import the entire "Working directory" I can select which plugins I want.  So now I have my plugins of interest and everything compiles well.

Should I have a single Git repo for my machine, or should I have one for each workspace?  I guess I should have a single repo and access it from different workspaces.
Comment 32 Andrew Gvozdev CLA 2010-07-01 10:04:56 EDT
There is a little section "Git" in http://wiki.eclipse.org/CDT/Developer/FAQ#Git (prominently referring to James' repository) and "Git for Committers" link got tons of useful tips.

(In reply to comment #31)
> (In reply to comment #30)
> > What do I do then to check-out?  If I try to select only the plugins I care
> > about from under "Working directory" in the Git Repositories view, I notice I
> > cannot do an Import for a multi-selection, which seems really bad.  This makes
> > me think I should be importing a top-level element instead.  Should I really
> > import the entire "Working directory" entry?  That is the only one that
> > contains all the plugins I want.  Or?
> I found after I try to Import the entire "Working directory" I can select which
> plugins I want.  So now I have my plugins of interest and everything compiles
> well.
That flow is OK when you know it but I had same problem as you checking out the first time.
 
> Should I have a single Git repo for my machine, or should I have one for each
> workspace?  I guess I should have a single repo and access it from different
> workspaces.
It depends on your needs but if you have OK internet connection you can as well have a cloned repository from remote one for each workspace. I would prefer that regardless of what I have said about slow connection (it was exaggerated btw). And, you can push/pull between your clones at will.
Comment 33 James Blackburn CLA 2010-07-01 10:13:04 EDT
(In reply to comment #31)
> Should I have a single Git repo for my machine, or should I have one for each
> workspace?  I guess I should have a single repo and access it from different
> workspaces.

The repository information is entirely contained within the .git directory at the top-level.  The source files corresponding to the HEAD of a particular branch is checked out next to this .git directory.  
As Eclipse doesn't like you having a workspace open more than once (or a project open concurrently in more than one workspace) you would, in general, git clone the repository for all the different workspaces you want.  
It's worth noting that git clone locally will use hard links to minimize the disk space of the repository (under .git).  The checkout will obviously be copied.

(In reply to comment #30)
> That is 300M.  Isn't that bigger than what you guys get?  Maybe I should not
> have taken all the branches?

This is both the repository and the checkout.  Compare du -sh org.eclipse.cdt/.git with du -sh org.eclipse.cdt/*.  When  you clone you're copying / linking to the contents of the .git directory.  When you checkout HEAD of a particular branch source files are created alongside the repository. All git actions then modify the checked out files' local repository, and when you're happy you can push / pull changes from your repository to other 'remote' repositories.
Comment 34 James Blackburn CLA 2010-07-01 10:15:09 EDT
(In reply to comment #32)
> There is a little section "Git" in
> http://wiki.eclipse.org/CDT/Developer/FAQ#Git (prominently referring to James'
> repository) and "Git for Committers" link got tons of useful tips.

Hmm that's ancient and should probably remove that from the FAQ. Perhaps better is the egit docs themselves:
http://wiki.eclipse.org/EGit/User_Guide
Comment 35 Marc Khouzam CLA 2010-07-01 10:50:42 EDT
(In reply to comment #33)
> (In reply to comment #31)
> > Should I have a single Git repo for my machine, or should I have one for each
> > workspace?  I guess I should have a single repo and access it from different
> > workspaces.
> 
> The repository information is entirely contained within the .git directory at
> the top-level.  The source files corresponding to the HEAD of a particular
> branch is checked out next to this .git directory.

This is where I got confused.  I didn't know HEAD was automatically checked out (from git
but not into Eclipse).  So, when I check-out (import) into Eclipse, it imports the files
from the already checked-out HEAD.

> As Eclipse doesn't like you having a workspace open more than once (or a project
> open concurrently in more than one workspace) you would, in general, git clone
> the repository for all the different workspaces you want.
> It's worth noting that git clone locally will use hard links to minimize the
> disk space of the repository (under .git).  The checkout will obviously be
> copied.

Ok, so instead of doing a new checkout in a new workspace like I do in CVS,
I need to clone the repo first and then import the plugins in the new workspace.
If not, then my import to the new workspace will contain all changes of the first
workspace.
It now makes sense why the EGit operation is to "import" and not to 'check-out'.

> (In reply to comment #30)
> > That is 300M.  Isn't that bigger than what you guys get?  Maybe I should not
> > have taken all the branches?
> 
> This is both the repository and the checkout.  Compare du -sh
> org.eclipse.cdt/.git with du -sh org.eclipse.cdt/*.  

Ah yes.  I get it now.

> When  you clone you're
> copying / linking to the contents of the .git directory.  When you checkout HEAD
> of a particular branch source files are created alongside the repository. All
> git actions then modify the checked out files' local repository, and when you're
> happy you can push / pull changes from your repository to other 'remote'
> repositories.

The subtlety is that EGit does the check-out automatically.
But it is much more clear now.  Thanks!

(In reply to comment #34)
> Perhaps better is the egit docs themselves:
> http://wiki.eclipse.org/EGit/User_Guide

Awesome, I'll have a look.
Comment 36 Doug Schaefer CLA 2010-07-02 12:46:54 EDT
Yeah, it is quite a different paradigm using egit but it's not unusual. The ClearCase eclipse plugins do the same, and I believe Perforce does as well.

But the idea is that the checkout is done into the file system, not the workspace. With CVS, you checkout Eclipse projects, but in git you check out a directory tree and then import Eclipse projects from there.

So, from there, the best way to think if EGit is as an automation for working with the external source tree. Unfortunately that also causes problems which I hope they resolve as they are also trying to fit into the Eclipse team system which isn't set up well for a non-resource managed source control system. But they are working on cleaning that up for Sept. as well.
Comment 37 James Blackburn CLA 2011-05-12 16:07:34 EDT
For the moment we're going with 1 repo, with history of the branches and the tags @ ~115M.  We can split out components later as required.
Comment 38 Andrew Gvozdev CLA 2011-05-12 18:17:23 EDT
There is a problem with some slowness of Synchronize (takes about 3 min for me) and commits for that big a clone of CDT repository from eclipse.org but I suppose it's EGit problem and it's fixable. Synchronize was down to acceptable 30 sec. at one point of Egit development.
Comment 39 James Blackburn CLA 2011-05-12 18:19:55 EDT
The synchronise flow is still pretty useless - I'm no sure they know what they want there... In happier news theyve just got a staging view :).
Comment 40 Andrew Gvozdev CLA 2011-05-12 18:36:07 EDT
I wouldn't say it's useless. I use it fairly often before committing changes in multiple files even if slowish. Will check out the staging view flow.