Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 360614 - Commit statistics are wrong and unhelpful
Summary: Commit statistics are wrong and unhelpful
Status: RESOLVED FIXED
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: Dashboard (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows Vista
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Portal Bugzilla Dummy Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-12 02:19 EDT by Ed Willink CLA
Modified: 2013-11-25 17:06 EST (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ed Willink CLA 2011-10-12 02:19:12 EDT
The Dashboard Commits for MDT/OCL appear to be nearly a month out of date and very unrepresentative of project activity.

The old LOC statistics gave me a ridiculously high contribution because of high volumes of auto-generated EMF and Xtext source.

The new commit statistics give every one line releng contribution the same weight as a 1000 line code development.

LOC is in principle better, and perhaps a 1000-line upper bound would provide some guard against auto-generation.

Or perhaps file-commits might do.
Comment 1 Wayne Beaton CLA 2011-10-12 11:48:29 EDT
(In reply to comment #0)
> The Dashboard Commits for MDT/OCL appear to be nearly a month out of date and
> very unrepresentative of project activity.

The dashboard itself seems to show the correct data. It's the charts generated on the project summary page that seem out of date to me.

It seems that the weekly job that updates the data used to generate the charts hasn't been completing as a result of recent changes on the server. I am investigating further.
Comment 2 Wayne Beaton CLA 2011-10-12 18:34:34 EDT
(In reply to comment #1)
> It seems that the weekly job that updates the data used to generate the charts
> hasn't been completing as a result of recent changes on the server. I am
> investigating further.

This is a relatively easy fix. I had previously been uploading the commit statistics to the http "writeable" area via the dev server. Since I can no longer log in there after the recent server changes, it was rejecting the copy operation. I am instead uploading the data file via the build server which seems to work. AFAICT, MDT OCL status look up-to-date.
Comment 3 Wayne Beaton CLA 2011-10-12 18:38:54 EDT
(In reply to comment #0)
> The old LOC statistics gave me a ridiculously high contribution because of high
> volumes of auto-generated EMF and Xtext source.

The current implementation of the charts on the project info pages is based on absolute number of commits. A single commit of a bajillion auto-generated files should show up as 1 commit.

> The new commit statistics give every one line releng contribution the same
> weight as a 1000 line code development.

This is by design. They are "commit statistics" afterall :-)

> LOC is in principle better, and perhaps a 1000-line upper bound would provide
> some guard against auto-generation.

Better is a subjective term. I'm interested in exploring this further, but I need to better understand why its better.
 
> Or perhaps file-commits might do.

Maybe. But that doesn't help the auto-generation problem.
Comment 4 Ed Willink CLA 2011-10-13 04:05:55 EDT
(In reply to comment #3)
> > LOC is in principle better, and perhaps a 1000-line upper bound would provide
> > some guard against auto-generation.
> 
> Better is a subjective term. I'm interested in exploring this further, but I
> need to better understand why its better.

I'm assuming that the ideal statistic would be proportional to the intellectual effort, which is almost impossible to measure.

A one-line releng change may represent a week of effort, and 100,000 auto-generated lines may be 5 minutes work. However because releng tends to be live on Hudson that one line releng change may have 10 experimental commits, so it is multiplied up, whereas code commits are much closer to right first time.

I suggest that file-commits or LOC-up-to-100-line commits may approximate effort more closely (assuming you don't plan to have a smart auto-generation differencer). Perhaps a 1+loge(LOC) metric would help avoid small changes getting swamped by huge changes. Massive auto-generation will normally be a single commit; a massive new contribution can be multiple commits if the author's vanity is important.
Comment 5 Ed Willink CLA 2011-10-16 01:52:34 EDT
(In reply to comment #2)
> AFAICT, MDT OCL status look up-to-date.

Yes. They were then but now they're six days out of date.
Comment 6 Wayne Beaton CLA 2011-10-16 09:39:50 EDT
(In reply to comment #5)
> (In reply to comment #2)
> > AFAICT, MDT OCL status look up-to-date.
> 
> Yes. They were then but now they're six days out of date.

Gathering SCM stats is pretty expensive. We only run the batch process that pulls this data once a week on Sunday.
Comment 7 Ed Willink CLA 2011-11-14 13:43:57 EST
(In reply to comment #6)
> Gathering SCM stats is pretty expensive. We only run the batch process that
> pulls this data once a week on Sunday.

No update on November 6.
No update on November 13.
Comment 8 Wayne Beaton CLA 2011-11-14 14:18:13 EST
(In reply to comment #7)
> (In reply to comment #6)
> > Gathering SCM stats is pretty expensive. We only run the batch process that
> > pulls this data once a week on Sunday.
> 
> No update on November 6.
> No update on November 13.

Recent server changes messed up the upload of the statistics. I've fixed the problem and have manually run the script. Everything should be up-to-date now.
Comment 9 Wayne Beaton CLA 2013-11-25 17:06:58 EST
I believe that the Dash process is running and generating data as expected. FWIW, we are planning a reimplmentation of Dash (see Bug 422525).