| Summary: | [prov] Local caching of remote metadata repositories | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] Equinox | Reporter: | John Arthorne <john.arthorne> | ||||||||||
| Component: | Incubator | Assignee: | equinox.incubator-inbox <equinox.incubator-inbox> | ||||||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||||||
| Severity: | enhancement | ||||||||||||
| Priority: | P3 | CC: | arlehmann, dj.houghton, jeffmcaffer, matthew_a_trees, pascal, simon_kaegi, timothym | ||||||||||
| Version: | 3.4 | Flags: | timothym:
review?
john.arthorne: review+ |
||||||||||
| Target Milestone: | 3.4 M5 | ||||||||||||
| Hardware: | PC | ||||||||||||
| OS: | Windows XP | ||||||||||||
| Whiteboard: | |||||||||||||
| Bug Depends on: | 213381 | ||||||||||||
| Bug Blocks: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
John Arthorne
Simon, do you know of a way to find a timestamp or some similar value for a URL that can be used to determine if its contents have changed (without actually reading the entire contents)? The URLConnection has a getLastModified which should help us avoid parsing. We might be able to avoid downloading altogether by caching and using the previous value with setIfModifiedSince before opening the connection. I'm not really sure how generic support is across protocols other than HTTP though. We could still use a content checksum for other protocols pretty easily. Actually this reminds me that we should be using ECF to access any remote file. I have opened bug 214940. also it would be good if ECF had support for this sort of timestamp checking etc. Looks like ECF has already added support for directory browsing. It can provide a list of IRemoteFile objects that can be queried for a last modified date. I logged a request to get support for a specified file instead of having to use a directory browser then checking the file list, etc. See bug 215250 I've got some code done for caching the metadata index. It also caches local repositories but I'll be working to change that to reading the local repo directly as a part of bug 216047. My patch also needs the work done from bug 214940 so that all the file transport is through ECF. The metadata caching will also require the latest version of ECF that includes the file browsing to get the timestamp. DJ said he's working on getting the new ECF in later this week. Created attachment 87517 [details]
Metadata Caching WIP
Tim, is this ready to go in? Not yet. I've got a bit more testing with it now that the new ECF milestone is in. Created attachment 87772 [details]
Metadata Caching v01
Patch changes:
+Added a check to read directly from local index files.
+Checks the remote file's timestamp and compares to a cached file (if it exists). The newer file will be used.
The code is ready but can't go in yet. The new ECF milestone in the build doesn't have the code required for the timestamp check. I could rework the patch to use the ECF milestone being used but I think this can wait until ECF can be updated again.
Ok, this is ready to go now. I've confirmed the necessary version of ECF is in the build now. I quickly reviewed, and suggested switching from Date to long as the timestamp representation. This is the norm for file timestamps, and I recall performance and locale issues with using Date. I'm reworking the patch now. I'm just thinking of a different way of checking if the file doesn't exist. I used to have the method return a null timestamp so now I'm investigating a couple of options when using a long instead. I should have a new patch posted in a day. I suggest 0, consistent with File#lastModified (EFS also uses 0 for this case). Created attachment 88147 [details]
Metadata Caching v02
I've changed lastModified to return a long instead of a Date. I also added some checks for when index files are not found. They'll throw ProvisionExceptions.
Reviewing... Created attachment 88162 [details]
Updated patch
The load method was getting quite large and complicated, so I refactored to create a separate method to obtain the local file that handles the caching. There were also some code paths where stream weren't being closed. Finally, the logic to see if the cache is up to date looked backwards:
if (timestamp < cacheTimestamp) {
// retrieve the newer index file
It seems like it should retrieve the remote file if the remote timestamp is greater (newer) than the cache timestamp.
I haven't tested this yet...
I have released the patch. Tim, do you have some tests for this? There are many different paths depending on: local vs. remote repository, xml vs. jar format, stale cache vs. valid cache, etc. Also, there is currently no story for how the cache gets cleaned up when repositories go away. I can see how this could be implemented, but it requires support in IMetadataRepositoryManager to broadcast an event when a repository is removed. There may already be a bug report about it. Tim, can you enter a bug report to track cleanup of SimpleMetadataRepositoryFactory's local disk cache? It should be marked as depending on the bug for repository events if we have one. Question, does this support allows to work from the local cache when the actual repository is not reachable? Note: I have not looked at the code (In reply to comment #19) > Question, does this support allows to work from the local cache when the actual > repository is not reachable? > Note: I have not looked at the code > I specifically wrote it to not read the local cache when the repository is unreachable. I figure if the metadata is unavailable from the repository that the artifacts would probably be outdated or unavailable as well. I'll get working on some tests for this over the next few days. Marking fixed since this support was added in M5. Opened bug 218447 about adding some automated tests for this. I was wondering if someone could review my post regarding the local caching of remote metadata repositories and how it relates to Automatic Updates.. Not sure if I found a new bug or if this one should be reopened? http://www.eclipse.org/newsportal/article.php?id=5309&group=eclipse.technology.equinox#5309 |