| Summary: | GITify CVS statistics | ||
|---|---|---|---|
| Product: | Community | Reporter: | Ed Willink <ed> |
| Component: | Dashboard | Assignee: | Portal Bugzilla Dummy Inbox <portal-inbox> |
| Status: | RESOLVED WONTFIX | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | CC: | wayne.beaton |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Windows Vista | ||
| Whiteboard: | stalebug | ||
|
Description
Ed Willink
Actually... the numbers for Git do consider individual files. As part of the Git commit processing, we get a list of all files in each commit and write a line for each of them into the database. This is exactly the same thing we do for CVS. Dash completely rebuilds its database every time it runs (I know... this is stupid), so the statistics gathered from the CVS epoch are actually gathered in exactly the same way (for CVS repos that have been migrated to Git). Using file commits should indeed keep the epochs consistent, but you must have a bug. Take a relatively small project such as http://projects.eclipse.org/projects/modeling.mmt.qvtd. The project page shows 15 commits for Feb 2013, which is consistent with GIT transaction count. If you look inside the transactions there are a couple that are over 10 files in their own right. [Still need to ensure that all bars are at least one pixel high. Again looking at QVTd, last 3 months is 24 commits, but only 15 can be seen in the hover. Sorry, I mixed up some things. Dash records the files involved in a commit, but we only query on the actual commits themselves for these charts. I am absolutely confident that the charts reflect the actual number of commits for Git and for migrated CVS commits.
I walked through the QVTd repository and observed that many of the commits do have the same comment. I also noticed that these commits--which all seem to be concerned with a single directory--are often timestamped within a few seconds of each other. The amount of time between commits seems to be directly related to the size of the commit (i.e. larger commits take more time). See below.
My guess is that our clever CVS tools timestamped the files in directory groups as part of a single commit operation.
So... some thoughts...
Come up with a clever heuristic that knows to consider commits with a shared comment that occur within a certain time threshold as "together".
Or, change what we report to be based on the size of the data committed (e.g. number of changed files or lines).
---
commit 602639d53ee48005a250166aaf6f1a817f06f366
Author: ewillink <ewillink>
AuthorDate: Tue Aug 26 19:04:28 2008 +0000
Commit: ewillink <ewillink>
CommitDate: Tue Aug 26 19:04:28 2008 +0000
Remove custom build properties
commit aaad289a14dd986b0a4d402b181e899a29ef61aa
Author: ewillink <ewillink>
AuthorDate: Tue Aug 26 19:04:27 2008 +0000
Commit: ewillink <ewillink>
CommitDate: Tue Aug 26 19:04:27 2008 +0000
commit 2e7688937178032c9a28d28eb2c3f844a019fe84
Author: ewillink <ewillink>
AuthorDate: Tue Aug 26 19:04:26 2008 +0000
Commit: ewillink <ewillink>
CommitDate: Tue Aug 26 19:04:26 2008 +0000
Remove custom build properties
commit 34e363954fa7dd59a3bbeb0190e0fa7666c0f23b
Author: ewillink <ewillink>
AuthorDate: Tue Aug 26 19:04:25 2008 +0000
Commit: ewillink <ewillink>
CommitDate: Tue Aug 26 19:04:25 2008 +0000
Remove custom build properties
(In reply to comment #3) > So... some thoughts... > > Come up with a clever heuristic that knows to consider commits with a shared > comment that occur within a certain time threshold as "together". I think shrinking commit counts better avoids the exaggerated impact of auto-generation. After inspecting a few old commits, it seems that CVS broke up commits to a separate transaction per project. If you introduce the time window I would opt for perhaps an hour. Sometimes CVS or the network was obstinate and multiple tries were needed. For consideration with the new implementation. Removing as a blocker on 422525. Probably obsolete now that the dashboard shows only the last year (mostly good). An option for a longer time span might be nice, or just a link to a more comprehensive GIT statistics page. This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. -- The automated Eclipse Genie. I think that we've done just about as much as we're able to do with the new dashboard implementation. |