Community
Participate
Working Groups
In the method createGeneratedImagePath of InternalImageFactory the code generates the same hashcode for two different images. This leads to display errors in out application.
Should mention that I use the latest repository version.
Created attachment 180134 [details] Path to Demo Use this path on the demo and add the images to the icons folder. You will see twice the same image altough they should be differen.
Created attachment 180135 [details] restore_view_act.gif
Created attachment 180136 [details] restore_view_selected_act.gif
Just a side note: The problem does not occur using CVS checkout of v1.3 from 20_05_2010.
I guess that this problem is related to the fix for this bug: 317685: Images loaded from ext. points don't work behind session based load balancers https://bugs.eclipse.org/bugs/show_bug.cgi?id=317685
Ah, so this is a bug that came in through a bug fix :-) Well, the question is, why the image data of both images create the same hash code then. The images themself are different.
If you compare the images byte-by-byte, you will see that they differ only by 3 bytes, which I assume are the color palette. The InternalImageFactory#getHashCode() uses only imageData.data, imageData.alpha, imageData.transparentPixel and imageData.type to calculated the hashCode.
Ok, i thought the image date were the bytes. So the new mechanism should be to generate the hash code of all bytes, right?
Fixed in CVS HEAD. imageData.palette is used for hash code calculation.
I can verify, that it now works as expected.
We ran into some images that generate the same hash. When computing the hash, it only looks at the value of each pixel not its position. We have some basic images being generated which are black boxes on a white background. If two have the same number of black pixels they generate the same hash. It would also happen if you had an image which is an exact mirrored version of another. I will attach a test project demonstrating the bug and a proposed fix which computes an MD5 hash instead of using the custom function.
Created attachment 204411 [details] Simple test project with various hash methods After further exploration, it appears that the current hash method simply doesn't include alphaData from the image. See the test case in the attached project, the original code fails on the sample images while including alphaData passes. Also there were a few other single fields missing: x, y, width, height, depth, delayTime, disposalMethod. I also change the computation to use a long rather than an int since all that is needed is a String. This should increase the overall hash space and hopefully reduce the probability of collision. Also included are a couple of other hash implementations for comparison. The last few test cases run each of the methods 1000 times to get an average performance metric. You can see the run time for each test in the junit view. The current implementation performs better than the other methods and I hope with all of the fields included it will be unlikely to have hash collisions. My only concern is that there is no validation of this method so we don't really know what the probability of collision is. However, it seems to work. --- After more testing --- Looking at the original hash function we discovered it was possible to generate hash collisions by just changing all the pixels values from one color to another. Of course these colors have to be specially crafted, but there is still a chance it could occur. The attached project has two pairs of images: the original which were failing for alpha data, and the new pair with only color differences. You have to uncomment the one you want in the test case. I created a modified version which passes the test for this new case. It processes each 8 bytes from the imageData.data array as a long so it doesn't have the same issue. Still I am not sure of the validity. Looking at the performance results and the simplicity of the code, my recommendation is to go with the CRC32 as the hash. It is part of the java.util.zip package and should exist in all versions of Java. The performance is only slightly lower than the old implementation, but I think that is acceptable to have a higher guarantee of reliability. Any thoughts?
+1 for CRC32. It isn't that much slower and the code is much simpler.
A comment about MD5 and other encryption methods: MD5 does not necessarily come with java and is governed by security manager. In the project we tested several methods, but the most universal and performant options boil down to CRC32 and a modified version of the algorithm that is there. CRC32 is error correction not encryption and it comes with java.
Thanks for this thorough investigation! The timing is indeed critical, because the hashes serve as keys to the cache and have to be calculated in every user session when a lightweight copy of the same image is created or when the image data are accessed. I like the CRC32 approach very much because a) it comes with the class library and b) it's a proven algorithm that I think is suitable for this case because it has been designed (in contrast to cryptographic hash functions like MD5 and SHA1) for exactly this purpose - to create hashes that are sensitive to data manipulations and to do that very quickly. Although CRC is for accidental data corruption, I think that it should be hard to find collisions. In my tests I found that when all fields are included in the hash calculation, CRC32 is even faster than the original algorithm. May be that's because it uses native code to loop over arrays. BTW, I guess I had some rationale for leaving out certain fields from the calculation, but I don't quite remember. Anyway, what really takes time are the data arrays. And since we _have_ to include the alpha array, I agree that it's best to simply include everything. Cole, could you provide a patch? It would be nice if you could also include some images in the test that produce collisions with the old implementation, as you already did in the attached code.
We all agreed to go with CRC32. +1
Created attachment 205193 [details] Patch using CRC32 and all data members As promised, the patch against CVS head.
Created attachment 205194 [details] New test cases for InternalImageFactory Patch for the test projects which include new tests that fail on the old implementation but pass on the new one.
Hi Cole, could you provide your test images as a separate zip? I can't extract them from the patch. Thanks, Ralf
Comment on attachment 204411 [details] Simple test project with various hash methods The sample images are the same as in the test project I attached previously.
Thanks, Cole. I applied the patches to HEAD with minor changes: * Moved the example images into the test bundle, since they are specific to this test. * Renamed the hash method to getHash(), since using CRC is an implementation detail. * Renamed the test methods to be more precise.