Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 324768

Summary: Remove the limitation: failed push if multiple committers in the commit list
Product: Community Reporter: Gabriel Petrovay <gabipetrovay>
Component: GitAssignee: Eclipse Webmaster <webmaster>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: cdtdoug, david_williams, d_a_carver, irbull, jamesblackburn+eclipse, janet.campbell, john.arthorne, mike.milinkovich, pwebster, wayne.beaton
Version: unspecified   
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard:

Description Gabriel Petrovay CLA 2010-09-08 13:05:19 EDT
There is this scenario that we came across that does not agree with the solution proposed by: http://wiki.eclipse.org/Git#Notes_on_DVCS

Scenario
 "Brad the Bad" Committer authors the code. He commits to his local repo.
 "Anne the Angry" Committer pulls from Brad's repo and commits to her local repo some bug fixes for Brad's broken code that would have broken the build.
 Anne pushes the whole work of Brad and Anne.
   Anne CAN push their committed code to us if both Brad and Anne are committers and have the rights to commit.

So instead of either Anne being frustrated or Brad to break the build, everybody is happy. Including us :)

The current Git setup prevents one committer to push changes that are not his., Such a push will fail with the message: "Cannot push changes that were not committed by you".

"Indeed, this is an imposed limitation" (Denis Roy). So this limitation can be lifted in some cases and avoid other problems caused by having one committer to take over other committers' commits:
- you loose track of statistics, who committed what and when;
- taking over other's commits might be an IP issue as well;
- if you have a slightly more complex scenario (>2 committers and you also have interlaced commits from other tasks) the job to solve this might become really painful

Another solution is for everybody to use Git in a CVS-like scenario where B pushes, A waits and pulls, A corrects and pushes again.

-------

A solution:

So the hook that guards the pushes should only check if all are committers and they all have the proper rights for the project. Otherwise, fail.


Other opinions on this?

Thanks!
Comment 1 David Williams CLA 2010-09-08 14:04:14 EDT
I don't know much about GIT, and on the surface there's certainly some logic and advantages to loosening, the restrictions ... but, I think the "hole" is that there is still some question of accountability. Who "really" committed the code. 

Using the above example, let's imagine Brad's code contained some IP questionable code, then Anne got it, and Anne committed both sets of code. Even though Brad would still be identified as the author, couldn't Brad claim "no, I wasn't ready for that code to be committed, I was still working on it and meant to remove that code". So, who's responsible? Anne for committing before it was ready? But Brad's identified as the author committer? Is there some way GIT or these tools can distinguish who's responsible in this case? My guess is, to make it clear who's accountable for what, the simple one-committer one-commit rule might have to be in place?
Comment 2 Gabriel Petrovay CLA 2010-09-09 18:31:47 EDT
(In reply to comment #1)
> I don't know much about GIT, and on the surface there's certainly some logic
> and advantages to loosening, the restrictions ... but, I think the "hole" is
> that there is still some question of accountability. Who "really" committed the
> code. 
> 
> Using the above example, let's imagine Brad's code contained some IP
> questionable code, then Anne got it, and Anne committed both sets of code.

In Git the commits are brought nearer to the users. Users make them in their local repos and pass sets of commits around. One such commit in a set does not loose any information by passing it around to other committers. So, in the above example Brad commits and Anne retrieves the set of commits from Brad. She does not take over Brads code. After one commit on her local repo, she will see:

-----------------------------------------------
commit b4260512dc9af2f2235f8ac807c38f5fb66a1b99
Author: Anne <anne@one_isp.com>
Date:   Wed Sep 8 15:22:02 2010 +0200

    Fix Brad's bad commit.

commit 7d898e63dde22d3468fb714a5746083cccff1a74
Author: Brad <brad@another_isp.com>
Date:   Wed Sep 8 10:37:40 2010 +0200

    Brad's comment for his last commit.

...
-----------------------------------------------

If Anne pushes this to Eclipse Git, the commits maintain their original IP violator. :)


> Even
> though Brad would still be identified as the author, couldn't Brad claim "no, I
> wasn't ready for that code to be committed, I was still working on it and meant
> to remove that code". So, who's responsible? Anne for committing before it was
> ready? But Brad's identified as the author committer? Is there some way GIT or
> these tools can distinguish who's responsible in this case? My guess is, to
> make it clear who's accountable for what, the simple one-committer one-commit
> rule might have to be in place?

Is this last sentence a question or a statement? I assume it is a statement.

As I see, Git is good enough to determine who is accountable for what. In the current deployment the example you gave above can happen. (Well, it happened already in our case :) )
Comment 3 Denis Roy CLA 2010-09-10 11:20:48 EDT
(In reply to comment #2)
> If Anne pushes this to Eclipse Git, the commits maintain their original IP
> violator. :)

The Author and Committer fields are user-defined, so here at Eclipse we cannot trust either of them.  The update hook will verify the committer entry to ensure it matches the committer performing the push, and that way we can trust that the "committer" entries in the log are valid, verified and true.

However, we don't want to impose a committer-only restriction on the Author entry, since it may be a non-committer.  Code can be commited by an Eclipse committer, but authored by someone else, and I feel preserving that in the log is important.

So in this case, if bad IP was put in by author@someone.com, we can't really validate that information during the push phase.
Comment 4 David Carver CLA 2010-09-10 17:12:47 EDT
(In reply to comment #3)
> (In reply to comment #2)
> > If Anne pushes this to Eclipse Git, the commits maintain their original IP
> > violator. :)
> 
> The Author and Committer fields are user-defined, so here at Eclipse we cannot
> trust either of them.  The update hook will verify the committer entry to
> ensure it matches the committer performing the push, and that way we can trust
> that the "committer" entries in the log are valid, verified and true.
> 
> However, we don't want to impose a committer-only restriction on the Author
> entry, since it may be a non-committer.  Code can be commited by an Eclipse
> committer, but authored by someone else, and I feel preserving that in the log
> is important.
> 
> So in this case, if bad IP was put in by author@someone.com, we can't really
> validate that information during the push phase.

Well, the problem here is that all a committer has to do is do a Rebase, and they can roll up the commit.   I understand the need for accountability, and the such.  But unless we allow Gerritt for all projects for code review, then we have a very strict commit difference.   The only thing I can think of to work around this is to have shared branches always on the central repo, so that when a merge happens everything is already on the repo and the pointers are just updated.   But that also hamstrings some of the advantages of git.
Comment 5 David Carver CLA 2010-09-11 12:28:46 EDT
(In reply to comment #1)
> I don't know much about GIT, and on the surface there's certainly some logic
> and advantages to loosening, the restrictions ... but, I think the "hole" is
> that there is still some question of accountability. Who "really" committed the
> code. 
 
> Using the above example, let's imagine Brad's code contained some IP
> questionable code, then Anne got it, and Anne committed both sets of code. Even
> though Brad would still be identified as the author, couldn't Brad claim "no, I
> wasn't ready for that code to be committed, I was still working on it and meant
> to remove that code". So, who's responsible? Anne for committing before it was
> ready? But Brad's identified as the author committer? Is there some way GIT or
> these tools can distinguish who's responsible in this case? My guess is, to
> make it clear who's accountable for what, the simple one-committer one-commit
> rule might have to be in place?

There is a difference here, one can cherry pick commits to be pushed to specific branches with git.  And in this particular case we are talking about, William had a local branch, that he and Gabriel collaborated on with their own git repos.  Gabriel pulled from William's branch, and vice versa.  Once they were both satisfied the changes were ready, then Gabriel tried the push the changes to the central repo.

Git allows you to work easily in separate branches, and then only bring in the changes into the main development branch when it is ready.  It's a very common work flow, and collaboration technique.
Comment 6 David Carver CLA 2010-09-11 12:30:36 EDT
Just to be clear, the situation we are talking about is committer to committer commits and collaboration.  We are not talking about non-committer to committer collaboration.  IP should be fine in this specific case as it is two committers on the same project.
Comment 7 Denis Roy CLA 2010-09-11 18:49:38 EDT
> Well, the problem here is that all a committer has to do is do a Rebase, and
> they can roll up the commit.

With CVS right now now, a committer can SSH in and alter files without leaving a trail.  However, that is a deliberate act of vandalism, and we do trust that committers will not do evil.

The major difference between Git and CVS/SVN is that with Git, the tooling itself makes it really easy to lose track of IP provenance.  In fact, since Git is decentralized, code provenance is maintained strictly by social convention. Eclipse.org's rigourous IP process is not a typical use case of Git, since a centralized, validated IP provenance is important for us.

All we (the EMO) need is a clear and trusted way to validate and track which committer has accepted each line of code that makes its way into the repository, and from what I know, our update hook accomplishes just that.

If there is a better way to do this that would allow for broader collaboration without compromising our provenance, I'm all ears.
Comment 8 Denis Roy CLA 2010-09-11 18:51:28 EDT
> Just to be clear, the situation we are talking about is committer to committer
> commits and collaboration.  We are not talking about non-committer to committer
> collaboration.  IP should be fine in this specific case as it is two committers
> on the same project.

That makes sense.  The update hook could easily be altered to ensure all the commits of an incoming pack are attributed to an eclipse.org committer.

I've cc'd Janet and Wayne, and I'll have a discussion with them about this.
Comment 9 Denis Roy CLA 2010-09-14 16:38:13 EDT
> IP should be fine in this specific case as it is two committers
> on the same project.

Indeed.  My update hook would either need to verify that all the commit entries in the transaction are from committers on the project (yuck), or we would have to trust our committers to Do The Right Thing(tm).

Janet will be able to tell us what is acceptable here.
Comment 10 Doug Schaefer CLA 2010-09-20 23:29:33 EDT
(In reply to comment #9)
> > IP should be fine in this specific case as it is two committers
> > on the same project.
> 
> Indeed.  My update hook would either need to verify that all the commit entries
> in the transaction are from committers on the project (yuck), or we would have
> to trust our committers to Do The Right Thing(tm).
> 
> Janet will be able to tell us what is acceptable here.

How much can you do with these hooks? Can they change the committer to the login id?

The only difference betweeen git and CVS seems to be recording of who did the actual commit to the master repo. CVS doesn't prevent me from committing a patch that I forget to mark iplog+ in bugzilla.
Comment 11 Janet Campbell CLA 2010-09-21 14:09:01 EDT
(In reply to comment #9)
> > IP should be fine in this specific case as it is two committers
> > on the same project.
> 
> Indeed.  My update hook would either need to verify that all the commit entries
> in the transaction are from committers on the project (yuck), or we would have
> to trust our committers to Do The Right Thing(tm).
> 
> Janet will be able to tell us what is acceptable here.

Updating the bug to reflect my discussions with Denis.  From a technical standpoint, we would verify at the committer level.  It would be the responsibility of the committer pushing the content (and the relevant PMC) to ensure that all content that is being pushed is consistent with the IP Policy.
Comment 12 Denis Roy CLA 2012-03-23 11:03:32 EDT
> Indeed.  My update hook would either need to verify that all the commit entries
> in the transaction are from committers on the project (yuck)

^^ That is in fact what we are doing today.  Fixed.