Community
Participate
Working Groups
Eclipse 3.4.2 is available today and this represents a great opportunity to try out p2 when the servers are faced with incredible loads resembling the final release. The first thing that this makes me realize is that we definitely need to have the ability to restart failed download of the repo index files (content.jar / artifacts.jar) so that people can despite time outs and other problems get the main index files required for p2 to show content to install to the user.
This seems more like an enhancement to me than a major bug...as it seems very late in the game to add on new features that until now haven't been on our docket. Also...this bug isn't just for transport, as since we don't have any storage of partial results, the storage of such results (either in a single session or across sessions) would have to be done at the download manager level. We do, however, have immediate support for downloading partial files.
This is not a big deal for the release of 3.5 itself, but it will be a big deal if we want 3.5 users to be able to successfully update to 3.5.1 and 3.5.2 and maybe even to 3.6. The content.jar and artifacts.jar are not downloaded through the download manager but are obtained as regular files. Therefore we have much more flexibility with what we can do.
*** Bug 266272 has been marked as a duplicate of this bug. ***
I looked into this, and I need a bit of help to make sure I am in the right place... It looks like SimpleMetadataRepositoryFactory is asked for a local file (getLocalFile) which is done by obtaining a CacheManager, that in turn calls ECFMetaDataTransport to download the file if it is not present. It looks like there is opportunity to make the CacheManager preserve the state of a partial download (it now deletes a partial download at the end). My idea is to move the partial to a "partial" directory (to prevent someone from mistaking the partial file for the real thing) - and then modify the cace manager to check if there is a partial download on a subsequent request. As it uses ECFMetaDataTransport to perform the download, a new method is needed to perform a partial download. Does this seam like a reasonable approach?
It looks like it is the good place. Can we rely on the number bytes downloaded to restart? Could there be a case where the restarted download could result in an invalid file? If this is the case then we need a way to flush the cache as well when the repo failed loading.
The cache flushing on failure was added recently... in the past if there was a failure during download and we had a partial result with a new timestamp, we never went back to the server for the new file. We certainly have to make sure we discard/move the file on failure.
(In reply to comment #5) > It looks like it is the good place. Can we rely on the number bytes downloaded > to restart? Where you thinking of some other way of doing it? Are checksums available somewhere? If available up-front, then I thought that the partial, and the expected checksum should be saved. When resuming, a check is made if the checksum is unchanged (this to make sure the server is not delivering different content with same timestamp). > Could there be a case where the restarted download could result in an invalid > file? If this is the case then we need a way to flush the cache as well when > the repo failed loading. > Yes, certainly - the outcome of the resume is naturally that a new resume may occur, that it fails completely (missing) or that the downloaded bits have the wrong size or checksum, and if so must be discarded and download start over again. The worst case is probably when a resume encounters a "false positive" error due to high server load or malfunction, or being on a poort network - exactly the case where the resume is supposed to help. What should be done in this case? I can imagine saving a bit more information about the partial download - timestamp of last attempt - number of bytes obtained per attempt - status of attempts etc. Then have some heuristics that determines when an error is really an error (3 attempts and gone more than 24 hours?). My idea was to start with simplest possible.
>> It looks like it is the good place. Can we rely on the number bytes downloaded to restart? >Where you thinking of some other way of doing it? No >Are checksums available somewhere? No, but this looks like a good idea, maybe past 3.5 unless absolutely critical for this. I think Henrich had encountered servers upon restart the server was sending too many / too few bytes. If we can't detect it then this may jeopardize what we can do Again to me it would be a worst story than the one we have today, if we could have a file that appears to be complete and up-to-date but that can not parse.
(In reply to comment #8) > I think Henrich had encountered servers upon restart the server was sending too > many / too few bytes. If we can't detect it then this may jeopardize what we > can do We have encountered cases where reading over the socket (from an HTTP server) would delivered fewer bytes than known to be on the server without an exception thrown (dubbed premature end-of-file). Our assumption is that this is a server side or network connectivity issue. However I do not know whether this is in any way related to partial gets (restart).
(In reply to comment #9) > We have encountered cases where reading over the socket (from an HTTP server) > would delivered fewer bytes than known to be on the server without an exception > thrown (dubbed premature end-of-file). Our assumption is that this is a server > side or network connectivity issue. > However I do not know whether this is in any way related to partial gets > (restart). > Not really, if it happens on a regular transfer, it can also happen on a partial transfer. In the end there is only two pieces of information that can help in determining if the transfer was good: authorative information on size of file, and checksum. So, how is the premature-end-of-file detected? By a reported size (Content-Length) that is different than what is actually read?
(In reply to comment #10) <stuff deleted> > In the end there is only two pieces of information that can help in determining > if the transfer was good: authorative information on size of file, and > checksum. > > So, how is the premature-end-of-file detected? By a reported size > (Content-Length) that is different than what is actually read? > Yes I agree that this requires authoritative validation info; in our app these are expected size, and checksum (MD5). Case 1: expected size and checksum is known: If size does not agree with content-length (or for partial gets a content-range) download fails before transferring payload bytes. Otherwise file is downloaded. In case a premature EOF is encountered based on the expected size, then download is repeated with partial get for the remainder of the file. When entire file was downloaded with the exact number of bytes expected the checksum is validated. Case 2: No authoritative info is known. Then content-length (or content-range) is leveraged as the expected size.
I have almost finished the implementation of this for p2 meta data repository. It would be great if the patch in 270667 could be committed, as I am adding more messages in the implementation of "resume download".
Created attachment 130884 [details] Implementation of resumed download of content.jar/content.xml This patch adds resume of download of content.jar, and content.xml to the CacheManager.
assigning back to inbox so someone can review and commit.
Henrik, I wanted to make sure that you saw the small API change in IFileTransferInfo as described in bug #270516...as this might be germane to your patch for this bug.
(In reply to comment #15) > Henrik, I wanted to make sure that you saw the small API change in > IFileTransferInfo as described in bug #270516...as this might be germane to > your patch for this bug. > Thanks Scott. I was aware of that, and it does not affect. Checks are already made with <= 0.
As I now have p2 committer status - should I go ahead and commit this change to HEAD? Pascal mentioned in the p2 call, that it would be a good idea to be able to turn off the resume feature via a property - I can add a separate issue for that.
Release it but with the support for the property. Also please make sure to create a test case for this. Thx John please take a look at the code once it is released (john is away today Tuesday)
The property: org.eclipse.equinox.p2.metadata.repository.resumable can be set to "false" to turn off the resume download feature. Added tests for both cases (resume and blocked resume).
Fixed in HEAD.
I released some cleanup: - I removed several new message keys that were only for logged messages. Messages that only go to the log file are not translated in Eclipse. The end user's language is not necessarily the language of the support/development staff that will be receiving the log. - The createCache method had become quite long and hard to read (>130 lines). I factored it into two separate methods (createCache and updateCache). - Refactored computation of cache directory into a helper method (getCacheDirectory) - Deleted some lines of code that had been commented out in getCache method
New Gerrit change created: https://git.eclipse.org/r/73862
Gerrit change https://git.eclipse.org/r/73862 was merged to [master]. Commit: http://git.eclipse.org/c/equinox/rt.equinox.p2.git/commit/?id=258e4b40ab8ac916b6832bb0d24200f870f64cf3