Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 328895 - Bad results from Histogram Diff
Summary: Bad results from Histogram Diff
Status: RESOLVED FIXED
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 0.10.0   Edit
Hardware: PC Mac OS X - Carbon (unsup.)
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Shawn Pearce CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-27 18:15 EDT by Matthias Sohn CLA
Modified: 2010-11-01 17:35 EDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Sohn CLA 2010-10-27 18:15:55 EDT
We found several bad results from histogram diff when using it in our gerrit instance:

- looks like here
http://egit.eclipse.org/r/#patch,unified,1766,3,org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/history/GitHistoryPage.java
the new algorithm shows too coarse grained diffs. Large blocks seem to be changed
but these blocks only differ in a few lines, so that it's hard to spot what really changed.

- This diff also looks odd :
http://egit.eclipse.org/r/#patch,unified,1783,2,org.eclipse.egit.core/pom.xml 
look at the block containing <artifactId>maven-osgi-source-plugin</artifactId>

- Here is another case where the new diff fails: instead of showing just the addition of the method testDeleteVsModify()
a large block is shown to be changed compared to the old version
http://egit.eclipse.org/r/#patch,sidebyside,1809,1,org.eclipse.jgit.test/tst/org/eclipse/jgit/merge/MergeAlgorithmTest.java 

This problems were originally reported in the mail thread http://dev.eclipse.org/mhonarc/lists/egit-dev/msg01560.html

There was one more problem mentioned in that thread which has already been fixed.

Creating this bug to make tracking this easier
Comment 1 Shawn Pearce CLA 2010-10-29 19:58:11 EDT
I think I understand the problem.

When we found the initial LCS for a file, we started
the LCS region on a blank line.  This occurs often in
the file, and thus has a count that is higher than the
count for the line "\t/**\n".  (It is "less unique".)

When we later discover the insertion of "\t/**\n" at
the start of the new method, we select this as our
LCS because it has a lower count than the blank line
we initially set the LCS on earlier.

The problem here is, we aren't looking to see if any
of the common lines within the prior LCS region have
a lower count than the initial line.  We just used the
initial line's count and said "good enough".  But then
we apply this optimization where we don't even look at
the other lines in the LCS, and thus we never consider
if they have a lower count.

We need to use the lowest count within the LCS to set
the count for the LCS.
Comment 2 Shawn Pearce CLA 2010-10-29 21:09:20 EDT
(In reply to comment #0)
> We found several bad results from histogram diff

I think http://egit.eclipse.org/r/1829 fixes the issues
you identified on the gerrit instance.

Unfortunately there is still a problem with the output,
we aren't getting a very clean diff for this case:

@@ -188,6 +189,16 @@
 	public void testSameModification() throws IOException {
 		assertEquals(replace_C_by_Z,
 				merge(base, replace_C_by_Z, replace_C_by_Z));
+	}
+
+	/**
+	 * Check that a deleted vs. a modified line shows up as conflict (see Bug
+	 * 328551)
+	 *
+	 * @throws IOException
+	 */
+	public void testDeleteVsModify() throws IOException {
+		assertEquals(A+B+XXX_0+XXX_1+Z+XXX_2+D+E+F+G+H+I+J, merge(base, delete_C, replace_C_by_Z));
 	}
 
 	private String merge(String commonBase, String ours, String theirs) throws IOException {

The "\t}\n" and "\n" initial added lines should be
at the end of the new testDeleteVsModify method.
Comment 3 Shawn Pearce CLA 2010-10-29 22:44:40 EDT
(In reply to comment #2)
> 
> Unfortunately there is still a problem with the output,
> we aren't getting a very clean diff for this case:
> 
> @@ -188,6 +189,16 @@
>      public void testSameModification() throws IOException {
>          assertEquals(replace_C_by_Z,
>                  merge(base, replace_C_by_Z, replace_C_by_Z));
> +    }
> +

This bad case is fixed by http://egit.eclipse.org/r/1831
Comment 4 Shawn Pearce CLA 2010-11-01 17:35:18 EDT
Fixed, and a new Gerrit deployed to egit.eclipse.org.