Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 422396

Summary: [pmi] Commit Activity should use Git author instead of committer (or in addition)
Product: Community Reporter: Robin Stocker <robin>
Component: DashboardAssignee: Portal Bugzilla Dummy Inbox <portal-inbox>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: daniel_megert, matthias.sohn, wayne.beaton
Version: unspecified   
Target Milestone: 2014-Q1   
Hardware: All   
OS: All   
Whiteboard:
Bug Depends on: 370151    
Bug Blocks:    
Attachments:
Description Flags
Slight appearance change none

Description Robin Stocker CLA 2013-11-23 12:20:55 EST
It looks like the Commit Activity pie charts in the "Participate" tab are currently based on Git committer information.

Git distinguishes between author (the one who did the code) and committer (the one who amended/merged the code). IMO it would be better if the pie charts would be based on the author, not the committer.

Or even better would be to have 2 sets of pie charts, one based on author information and one based on committer information.

You can see the difference using "git log --pretty=fuller", here are some examples from the EGit project:

commit f5c314e92f68aa9681fd6e85dc830d39b521b4e2
Author:     Michael Keppler
AuthorDate: Fri Nov 22 14:31:57 2013 +0100
Commit:     Gerrit Code Review @ Eclipse.org
CommitDate: Fri Nov 22 08:55:03 2013 -0500

-> Michael Keppler did the code changes, Gerrit is the committer because it was rebased in Gerrit. Will be different once Gerrit 2.8 is used, see here: https://gerrit-review.googlesource.com/#/c/45733/

commit 8e19c7eae07a4a6ac0b247736be5d776a30e8bdd
Author:     Matthias Sohn
AuthorDate: Fri Nov 8 01:18:52 2013 +0100
Commit:     Robin Stocker
CommitDate: Fri Nov 15 18:12:30 2013 +0100

-> Matthias did the code changes, I just signed it off and rebased it.
Comment 1 Matthias Sohn CLA 2014-01-23 19:14:51 EST
thanks for filing this

For most changes the author did the heavy lifting and if the committer is different he rebased or did some minor additional tweaks or moved the commit to another branch.
Comment 2 Wayne Beaton CLA 2014-01-24 10:50:33 EST
I got inspired last night and wrote a routine to capture the author data. It's running now.

The new routine scans the Git repos and--for each commit--extracts the author, as well as any "Also-by" [1] entries.

Once it's done running (and I have some data to play with), I'll rework the charts to use the new data. Initially, I'm going to pull the author names as they've been provided to Git. 

This may result in some authors having multiple entries (e.g. some commits moved over from CVS may have committer ids instead of names, or something). If there is a need, I think that we can be a bit more clever and match Git authors up to committer records to consolidate. But I think I've just started mumbling...

The new implementation is a bit more resilient than the old; parallel and incremental scans are possible, meaning that we may be able to run the scan more than once a week.

[1]http://wiki.eclipse.org/Development_Resources/Contributing_via_Git#The_Commit_Record
Comment 3 Wayne Beaton CLA 2014-01-26 21:17:51 EST
The "Individual Commit Activity" chart is now based on the author of Git commits. As mentioned in comment #2, it includes "Also-by" entries, so any single commit may actually be claimed by one or more individuals. As before, the chart is based on the last three months of commits; it may be interesting to include a project lifetime chart as well (maybe somebody can open a new bug if that's considered valuable).

Note that the charts currently display the name of the author as it is provided in the commit record. This means that some authors commits are split (e.g. "Tim Fox" and "purplefox" on the Vert.x project are the same person).

Note also that it is based on absolute commit activity and will include merge commits that do not include any intellectual property. I am already capturing the data that I need to count only those commits that actually have file changes. I'm not sure how big an impact this will have on the charts, but it is probably worth investigating.

This is experimental for now as it is based on some new code for gathering commit information.
Comment 4 Matthias Sohn CLA 2014-01-27 03:31:47 EST
thanks, this looks more reasonable (checked for jgit and egit), we should do the same for the other chart "Organization Commit Activity"
Comment 5 Robin Stocker CLA 2014-01-27 07:36:52 EST
Thanks! +1 for also changing Organization Commit Activity.
Comment 6 Wayne Beaton CLA 2014-01-27 16:59:29 EST
(In reply to Matthias Sohn from comment #4)
> thanks, this looks more reasonable (checked for jgit and egit), we should do
> the same for the other chart "Organization Commit Activity"

I agree. That's going to take a little more work to map committers to organizations (it's relatively simple to just extract the author information from the commits). I've laid some of the groundwork already; I'll keep poking at it.
Comment 7 Wayne Beaton CLA 2014-03-25 14:50:07 EDT
Created attachment 241235 [details]
Slight appearance change

Organization commit activity now takes the authors into account. 

You'll notice a small change in how it appears. Commits associated with "[Contribution]" come from a non-committer; those associated with "[Unaffiliated]" come from a committer who is not affiliated with a member company. Only member companies are shown (this part isn't new).
Comment 8 Wayne Beaton CLA 2014-04-04 14:41:58 EDT
We're done here. All charts in the PMI use the author field.