Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 400937

Summary: GITify CVS statistics
Product: Community Reporter: Ed Willink <ed>
Component: DashboardAssignee: Portal Bugzilla Dummy Inbox <portal-inbox>
Status: RESOLVED WONTFIX QA Contact:
Severity: normal    
Priority: P3 CC: wayne.beaton
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows Vista   
Whiteboard: stalebug

Description Ed Willink CLA 2013-02-15 10:34:23 EST
The Dashboard displays of statistics from CVS and GIT epochs are seriously incompatible; CVS being file commits, rather than activity commits. The distinction is particularly serious for EMF projects that may have large amounts of auto-generate code that gets refreshed in multiple transactions.

In order to normalize, suggest that runs of CVS transactions with the same comment on the same day be aggregated as a single equivalent GIT commit.
Comment 1 Wayne Beaton CLA 2013-02-15 10:49:11 EST
Actually... the numbers for Git do consider individual files.

As part of the Git commit processing, we get a list of all files in each commit and write a line for each of them into the database. This is exactly the same thing we do for CVS.

Dash completely rebuilds its database every time it runs (I know... this is stupid), so the statistics gathered from the CVS epoch are actually gathered in exactly the same way (for CVS repos that have been migrated to Git).
Comment 2 Ed Willink CLA 2013-02-15 11:12:44 EST
Using file commits should indeed keep the epochs consistent, but you must have a bug.

Take a relatively small project such as http://projects.eclipse.org/projects/modeling.mmt.qvtd. The project page shows 15 commits for Feb 2013, which is consistent with GIT transaction count. If you look inside the transactions there are a couple that are over 10 files in their own right.

[Still need to ensure that all bars are at least one pixel high.

Again looking at QVTd, last 3 months is 24 commits, but only 15 can be seen in the hover.
Comment 3 Wayne Beaton CLA 2013-02-15 12:57:11 EST
Sorry, I mixed up some things. Dash records the files involved in a commit, but we only query on the actual commits themselves for these charts. I am absolutely confident that the charts reflect the actual number of commits for Git and for migrated CVS commits.

I walked through the QVTd repository and observed that many of the commits do have the same comment. I also noticed that these commits--which all seem to be concerned with a single directory--are often timestamped within a few seconds of each other. The amount of time between commits seems to be directly related to the size of the commit (i.e. larger commits take more time). See below.

My guess is that our clever CVS tools timestamped the files in directory groups as part of a single commit operation.

So... some thoughts... 

Come up with a clever heuristic that knows to consider commits with a shared comment that occur within a certain time threshold as "together".

Or, change what we report to be based on the size of the data committed (e.g. number of changed files or lines).

---

commit 602639d53ee48005a250166aaf6f1a817f06f366
Author:     ewillink <ewillink>
AuthorDate: Tue Aug 26 19:04:28 2008 +0000
Commit:     ewillink <ewillink>
CommitDate: Tue Aug 26 19:04:28 2008 +0000

    Remove custom build properties

commit aaad289a14dd986b0a4d402b181e899a29ef61aa
Author:     ewillink <ewillink>
AuthorDate: Tue Aug 26 19:04:27 2008 +0000
Commit:     ewillink <ewillink>
CommitDate: Tue Aug 26 19:04:27 2008 +0000

commit 2e7688937178032c9a28d28eb2c3f844a019fe84
Author:     ewillink <ewillink>
AuthorDate: Tue Aug 26 19:04:26 2008 +0000
Commit:     ewillink <ewillink>
CommitDate: Tue Aug 26 19:04:26 2008 +0000

    Remove custom build properties

commit 34e363954fa7dd59a3bbeb0190e0fa7666c0f23b
Author:     ewillink <ewillink>
AuthorDate: Tue Aug 26 19:04:25 2008 +0000
Commit:     ewillink <ewillink>
CommitDate: Tue Aug 26 19:04:25 2008 +0000

    Remove custom build properties
Comment 4 Ed Willink CLA 2013-02-15 13:12:52 EST
(In reply to comment #3)
> So... some thoughts... 
> 
> Come up with a clever heuristic that knows to consider commits with a shared
> comment that occur within a certain time threshold as "together".

I think shrinking commit counts better avoids the exaggerated impact of auto-generation.

After inspecting a few old commits, it seems that CVS broke up commits to a separate transaction per project.

If you introduce the time window I would opt for perhaps an hour. Sometimes CVS or the network was obstinate and multiple tries were needed.
Comment 5 Wayne Beaton CLA 2013-11-25 17:13:02 EST
For consideration with the new implementation.
Comment 6 Wayne Beaton CLA 2014-12-09 11:12:35 EST
Removing as a blocker on 422525.
Comment 7 Ed Willink CLA 2014-12-09 11:18:00 EST
Probably obsolete now that the dashboard shows only the last year (mostly good).

An option for a longer time span might be nice, or just a link to a more comprehensive GIT statistics page.
Comment 8 Eclipse Genie CLA 2016-12-01 09:08:00 EST
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

--
The automated Eclipse Genie.
Comment 9 Wayne Beaton CLA 2016-12-01 11:15:42 EST
I think that we've done just about as much as we're able to do with the new dashboard implementation.