Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 222870

Summary: Enabled MD5 processing step
Product: [Eclipse Project] Equinox Reporter: John Arthorne <john.arthorne>
Component: p2Assignee: Pascal Rapicault <pascal>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: jeffmcaffer, pascal, Stefan.Liebig, timothym
Version: 3.4Keywords: helpwanted
Target Milestone: 3.5 M3   
Hardware: PC   
OS: Windows XP   
Whiteboard:

Description John Arthorne CLA 2008-03-15 23:27:10 EDT
We have an MD5 processing step but it doesn't appear to be currently in use.  We should enable this to ensure artifact contents are correct. If we don't have this, a single bad Ganymede mirror could result in a poor user experience because the user has no control over which mirror is used.
Comment 1 John Arthorne CLA 2008-03-15 23:30:10 EDT
Tim, can you look at this?

Stefan, are there any known problems/limitations of the MD5 verifier step? Have you had this working in the past?
Comment 2 Tim Mok CLA 2008-03-17 09:00:32 EDT
I'll take a look at this over the next day or so.
Comment 3 Stefan Liebig CLA 2008-03-25 11:23:48 EDT
The MD5Verifier has only been tested within unit tests. As far as I can remember the commented code in SimpleArtifactRepository.addPostSteps(..) has never been executed:
		//		if (md5Verification)
		//			steps.add(new MD5Verifier(descriptor.getProperty(IArtifactDescriptor.ARTIFACT_MD5)));
The CVS history of SimpleArtifactRepository is unfortunately not complete because refactoring.

In the MD5Verifier is a minor issue that I would like to correct. Should I do that after the M6 build?
Comment 4 Tim Mok CLA 2008-03-25 11:38:39 EDT
There doesn't seem to be an optimizer for this so there isn't a way to generate the MD5 data for artifacts. I can go ahead and work on that for after the M6 build. The commented out code doesn't even seem necessary after the optimizer is written. The optimizer should add some attributes to the artifacts.xml that will let the ProcessingStepHandler know to process the MD5.
Comment 5 Stefan Liebig CLA 2008-03-25 16:10:59 EDT
Yes, that is right, there is not yet an optimizer. Writing it should be pretty straightforward. However, the current MD5Verifier was created with two use cases in mind (pre-/post processing):
- verifying the downloaded bytes, which could be a delta, pack200
- verifying the reconstructed bytes (artifact), e.g. from applying a delta or unpacking
This means that the existing optimizers for delta and pack200 need additional parameters that would trigger the creation of the pre-/post- md5 verification and whether they are optional or not.
Comment 6 Jeff McAffer CLA 2008-03-26 17:24:26 EDT
keep in mind that we wanted to have MD5s come from somewhere else.  For example anyone can create a repo with a bogus bundle and add the bogus MD5 to their artifact.xml.  The idea is to get the content from foo.com but then go to trusted.org to get the md5.  Certainly having the md5 in the repo is a step towards ensuring that what you get is what the repo actually had but that is onlypart of the MD5 story.
Comment 7 John Arthorne CLA 2008-03-26 17:51:12 EDT
Isn't the goal of the MD5 just to verify download integrity (that bits weren't lost on the wire)? If you're looking to establish trust then the JAR signatures provide this...  The main use case I have in mind is for mirrors. The artifact.xml is read on the main site, but the actual bytes of the artifact are pulled from a dynamically chosen mirror. If there is a mirror with busted content (for example if it is in the middle of replicating) the MD5 gives a nice quick way to verify the correct bits arrived. If the MD5 fails we can hop to another mirror.
Comment 8 Jeff McAffer CLA 2008-03-26 20:27:49 EDT
yes, that is a good usecase.  that is sort of my point, the md5 should come from someplace trusted.  I am not really meaning that in terms of signatures etc but some place that will not have a corrupted md5. What you suggest is a valid usecase but then it feels more like an internal repo implementation detail than processing steps that go on the descriptor.  as it is we sometimes have logic that looks to see if there are processing steps (e.g., used to use this for determining if something was canonical).  Adding processing steps to the descriptor would defeat this.  It is possible for the repo itself to add processing steps as the thing is downloaded/uploaded.  This could be used to check/compute the md5 and stash it on the descriptor etc without tagging the descriptor with steps.

Perhaps this s what you have in mind already.  My point in bringing this up was to remind people of this perspective.  We talked about it quite a bit some time ago and I did not want the discussion to get lost
Comment 9 Pascal Rapicault CLA 2008-10-20 20:23:34 EDT
Fixed in HEAD.
In case of problems, it is possible to disable MD5 checking by setting the property eclipse.p2.MD5Check to false.