| Summary: | Remove the limitation: failed push if multiple committers in the commit list | ||
|---|---|---|---|
| Product: | Community | Reporter: | Gabriel Petrovay <gabipetrovay> |
| Component: | Git | Assignee: | Eclipse Webmaster <webmaster> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | CC: | cdtdoug, david_williams, d_a_carver, irbull, jamesblackburn+eclipse, janet.campbell, john.arthorne, mike.milinkovich, pwebster, wayne.beaton |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
|
Description
Gabriel Petrovay
I don't know much about GIT, and on the surface there's certainly some logic and advantages to loosening, the restrictions ... but, I think the "hole" is that there is still some question of accountability. Who "really" committed the code. Using the above example, let's imagine Brad's code contained some IP questionable code, then Anne got it, and Anne committed both sets of code. Even though Brad would still be identified as the author, couldn't Brad claim "no, I wasn't ready for that code to be committed, I was still working on it and meant to remove that code". So, who's responsible? Anne for committing before it was ready? But Brad's identified as the author committer? Is there some way GIT or these tools can distinguish who's responsible in this case? My guess is, to make it clear who's accountable for what, the simple one-committer one-commit rule might have to be in place? (In reply to comment #1) > I don't know much about GIT, and on the surface there's certainly some logic > and advantages to loosening, the restrictions ... but, I think the "hole" is > that there is still some question of accountability. Who "really" committed the > code. > > Using the above example, let's imagine Brad's code contained some IP > questionable code, then Anne got it, and Anne committed both sets of code. In Git the commits are brought nearer to the users. Users make them in their local repos and pass sets of commits around. One such commit in a set does not loose any information by passing it around to other committers. So, in the above example Brad commits and Anne retrieves the set of commits from Brad. She does not take over Brads code. After one commit on her local repo, she will see: ----------------------------------------------- commit b4260512dc9af2f2235f8ac807c38f5fb66a1b99 Author: Anne <anne@one_isp.com> Date: Wed Sep 8 15:22:02 2010 +0200 Fix Brad's bad commit. commit 7d898e63dde22d3468fb714a5746083cccff1a74 Author: Brad <brad@another_isp.com> Date: Wed Sep 8 10:37:40 2010 +0200 Brad's comment for his last commit. ... ----------------------------------------------- If Anne pushes this to Eclipse Git, the commits maintain their original IP violator. :) > Even > though Brad would still be identified as the author, couldn't Brad claim "no, I > wasn't ready for that code to be committed, I was still working on it and meant > to remove that code". So, who's responsible? Anne for committing before it was > ready? But Brad's identified as the author committer? Is there some way GIT or > these tools can distinguish who's responsible in this case? My guess is, to > make it clear who's accountable for what, the simple one-committer one-commit > rule might have to be in place? Is this last sentence a question or a statement? I assume it is a statement. As I see, Git is good enough to determine who is accountable for what. In the current deployment the example you gave above can happen. (Well, it happened already in our case :) ) (In reply to comment #2) > If Anne pushes this to Eclipse Git, the commits maintain their original IP > violator. :) The Author and Committer fields are user-defined, so here at Eclipse we cannot trust either of them. The update hook will verify the committer entry to ensure it matches the committer performing the push, and that way we can trust that the "committer" entries in the log are valid, verified and true. However, we don't want to impose a committer-only restriction on the Author entry, since it may be a non-committer. Code can be commited by an Eclipse committer, but authored by someone else, and I feel preserving that in the log is important. So in this case, if bad IP was put in by author@someone.com, we can't really validate that information during the push phase. (In reply to comment #3) > (In reply to comment #2) > > If Anne pushes this to Eclipse Git, the commits maintain their original IP > > violator. :) > > The Author and Committer fields are user-defined, so here at Eclipse we cannot > trust either of them. The update hook will verify the committer entry to > ensure it matches the committer performing the push, and that way we can trust > that the "committer" entries in the log are valid, verified and true. > > However, we don't want to impose a committer-only restriction on the Author > entry, since it may be a non-committer. Code can be commited by an Eclipse > committer, but authored by someone else, and I feel preserving that in the log > is important. > > So in this case, if bad IP was put in by author@someone.com, we can't really > validate that information during the push phase. Well, the problem here is that all a committer has to do is do a Rebase, and they can roll up the commit. I understand the need for accountability, and the such. But unless we allow Gerritt for all projects for code review, then we have a very strict commit difference. The only thing I can think of to work around this is to have shared branches always on the central repo, so that when a merge happens everything is already on the repo and the pointers are just updated. But that also hamstrings some of the advantages of git. (In reply to comment #1) > I don't know much about GIT, and on the surface there's certainly some logic > and advantages to loosening, the restrictions ... but, I think the "hole" is > that there is still some question of accountability. Who "really" committed the > code. > Using the above example, let's imagine Brad's code contained some IP > questionable code, then Anne got it, and Anne committed both sets of code. Even > though Brad would still be identified as the author, couldn't Brad claim "no, I > wasn't ready for that code to be committed, I was still working on it and meant > to remove that code". So, who's responsible? Anne for committing before it was > ready? But Brad's identified as the author committer? Is there some way GIT or > these tools can distinguish who's responsible in this case? My guess is, to > make it clear who's accountable for what, the simple one-committer one-commit > rule might have to be in place? There is a difference here, one can cherry pick commits to be pushed to specific branches with git. And in this particular case we are talking about, William had a local branch, that he and Gabriel collaborated on with their own git repos. Gabriel pulled from William's branch, and vice versa. Once they were both satisfied the changes were ready, then Gabriel tried the push the changes to the central repo. Git allows you to work easily in separate branches, and then only bring in the changes into the main development branch when it is ready. It's a very common work flow, and collaboration technique. Just to be clear, the situation we are talking about is committer to committer commits and collaboration. We are not talking about non-committer to committer collaboration. IP should be fine in this specific case as it is two committers on the same project. > Well, the problem here is that all a committer has to do is do a Rebase, and
> they can roll up the commit.
With CVS right now now, a committer can SSH in and alter files without leaving a trail. However, that is a deliberate act of vandalism, and we do trust that committers will not do evil.
The major difference between Git and CVS/SVN is that with Git, the tooling itself makes it really easy to lose track of IP provenance. In fact, since Git is decentralized, code provenance is maintained strictly by social convention. Eclipse.org's rigourous IP process is not a typical use case of Git, since a centralized, validated IP provenance is important for us.
All we (the EMO) need is a clear and trusted way to validate and track which committer has accepted each line of code that makes its way into the repository, and from what I know, our update hook accomplishes just that.
If there is a better way to do this that would allow for broader collaboration without compromising our provenance, I'm all ears.
> Just to be clear, the situation we are talking about is committer to committer
> commits and collaboration. We are not talking about non-committer to committer
> collaboration. IP should be fine in this specific case as it is two committers
> on the same project.
That makes sense. The update hook could easily be altered to ensure all the commits of an incoming pack are attributed to an eclipse.org committer.
I've cc'd Janet and Wayne, and I'll have a discussion with them about this.
> IP should be fine in this specific case as it is two committers
> on the same project.
Indeed. My update hook would either need to verify that all the commit entries in the transaction are from committers on the project (yuck), or we would have to trust our committers to Do The Right Thing(tm).
Janet will be able to tell us what is acceptable here.
(In reply to comment #9) > > IP should be fine in this specific case as it is two committers > > on the same project. > > Indeed. My update hook would either need to verify that all the commit entries > in the transaction are from committers on the project (yuck), or we would have > to trust our committers to Do The Right Thing(tm). > > Janet will be able to tell us what is acceptable here. How much can you do with these hooks? Can they change the committer to the login id? The only difference betweeen git and CVS seems to be recording of who did the actual commit to the master repo. CVS doesn't prevent me from committing a patch that I forget to mark iplog+ in bugzilla. (In reply to comment #9) > > IP should be fine in this specific case as it is two committers > > on the same project. > > Indeed. My update hook would either need to verify that all the commit entries > in the transaction are from committers on the project (yuck), or we would have > to trust our committers to Do The Right Thing(tm). > > Janet will be able to tell us what is acceptable here. Updating the bug to reflect my discussions with Denis. From a technical standpoint, we would verify at the committer level. It would be the responsibility of the committer pushing the content (and the relevant PMC) to ensure that all content that is being pushed is consistent with the IP Policy. > Indeed. My update hook would either need to verify that all the commit entries
> in the transaction are from committers on the project (yuck)
^^ That is in fact what we are doing today. Fixed.
|