Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 88260 - We need a caching URI Resolver for Third Party dtd and xsd
Summary: We need a caching URI Resolver for Third Party dtd and xsd
Status: CLOSED FIXED
Alias: None
Product: WTP Source Editing
Classification: WebTools
Component: wst.xml (show other bugs)
Version: 0.7   Edit
Hardware: All Windows XP
: P3 normal with 1 vote (vote)
Target Milestone: ---   Edit
Assignee: Lawrence Mandel CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 96824
Blocks:
  Show dependency tree
 
Reported: 2005-03-16 17:41 EST by Arthur Ryman CLA
Modified: 2005-12-10 00:02 EST (History)
4 users (show)

See Also:


Attachments
Screen shot of web.xml validation error message detail. (22.58 KB, image/png)
2005-07-15 09:30 EDT, Arthur Ryman CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Arthur Ryman CLA 2005-03-16 17:41:35 EST
We have a problem with redistributing DTDs and XSDs from third parties such as
Sun and W3C. However these files are freely accessible on the Web. Instead of
redistributing them, we should create entries in our XML catalog and use a URI
resolver that downloads them and caches them. Therefore as long as you are
connected to the network once, you can get these and use them offline. The
caching function should be like that in a browser. The cache should periodically
check for a newer version (user HTTP HEAD) but use the old one if disconnected.
The user should be able to set preferences and control the cache, like for a Web
browser. There could even be a Refresh XML Catalog command to load everything in
the Catalog. Maybe run this as a background task when the user opens the XML editor.
Comment 1 Arthur Ryman CLA 2005-03-31 18:54:48 EST
We should also create a low priority job to pre-cache all the URIs listed in 
the XML catalog. This would ensure that the URIs were cached as long as you 
were connected once.
Comment 2 Lawrence Mandel CLA 2005-04-14 18:36:05 EDT
I've started investigating this request.
Comment 3 Lawrence Mandel CLA 2005-04-18 03:28:02 EDT
I've created a plugin, org.eclipse.wst.internet.cache, which extends the URI 
resolver and provides caching facilities to WTP.
Some notes about the cache plugin.
1. The cache respects cache values set for remote resources. If a cache value 
is not set, the cache defaults to live for 1 day. I welcome comments/concerns 
about the 1 day default expiry.
2. The cache plugin provides a preference page which allows a user to view the 
entries in the cache, delete selected entries, clear the entire cache, and 
disable the caching facility. See the cache preference page under the Internet 
category.
3. The cache plugin provides a low priority (the priority is set to DECORATE, 
the lowest possible priority in Eclipse) job which will retrieve resources 
that a) were not able to be retrieved earlier (possibly because there was no 
network connnection) and b) that are prespecified via an Eclipse extension 
point. A resource to cache can be specified as follows.
<extension point="org.eclipse.wst.internet.cache.cacheresource">
  <cacheresource uri="RESOURCE_URI"/>*
</extension>

The cache plugin should be in WTP builds from 20050418 and later.
Comment 4 Arthur Ryman CLA 2005-04-18 10:06:39 EDT
The extension point to precache resources MUST include a URL to any applicable 
licence associated with the resource. The Eclipse legal guidance is that we 
cannot precache anything that has licence terms without the user's explicit 
acceptance.

I suggest the following change:

<extension point="org.eclipse.wst.internet.cache.cacheresource">
  <cacheresource uri="RESOURCE_URI" licenceuri="LICENCE_URI" />*
</extension>

where licenceuri is optional, but MUST be provided if a licence applies. The 
background task should check for any licences, and prompt the user to accept 
them. The UI should display a list of the URIs to be precached, and link them 
to the licences. Each URI should have a check mark. The UI should have some 
buttons:

1. Select All - checks all URIs
2. Unselect All - unchecks all URIs
3. Accept - signals user acceptance of the licences for checked URIs
4. Cancel - cancels the precaching job

This dialog only be displayed automatically once. After the first time, the 
user can launch it via the Preferences page.

Comment 5 Lawrence Mandel CLA 2005-04-18 11:23:28 EDT
Can resources requested by the user via some operation, not the precache, 
still be cached silently or will we have to try and display some licence for 
these resources as well?
Comment 6 Arthur Ryman CLA 2005-04-18 11:30:40 EDT
If the user requests a resource then we don't have to present a licence. It's
just for the resources that we plan to cache in the background task without user
initiation.
Comment 7 Lawrence Mandel CLA 2005-04-18 12:40:14 EDT
Should the dialog prompting the user to agree to the licences be launched on 
Eclipse startup? If so, I think we should try to reduce the number of dialogs 
we hit the users with where possible. How about including these licence 
requirements in the same dialog as the third party requirements? 
Comment 8 Arthur Ryman CLA 2005-04-18 13:43:53 EDT
Yes, we don't want lots of dialogs.

Perhaps we could make this launchable only from the preferences page. Add a
button like: Download Resources

That way the user is in control. We'd need to make sure this function was
adequately documented.
Comment 9 Lawrence Mandel CLA 2005-05-31 13:38:48 EDT
Looking more closely at the precache feature (as I started implementing it) this
feature doesn't seem to fit naturally in the cache. I'd like to suggest that
this feature belongs in the XML catalog. The nature of a cache is that it stores
local copies of remote resources that are used. The XML catalog has the facility
to allow the user to add entries and inlcludes entries for resources that are
already available locally. 

I suggest that instead of adding a download resources button and dialog to the
cache preference page that a new option be added to the new entry button on the
XML catalog preference page. The entry will allow users to enter their own
catalog entries or select from prelisted entries. I think this may be easier for
users to understand. 

Comments?
Comment 10 David Williams CLA 2005-06-15 01:21:44 EDT
Changed Version field given new release numbering.
Comment 11 David Williams CLA 2005-06-21 01:47:23 EDT
Lawrence, I agree with your last comment, that the XML Catalog is the natural
place to provide both the defintion and "fetch and cache" capability you mention. 

Since it is "user directed" there should not be any issues. 

Perhaps a checkbox that says "fetch and cache when http: protocol given"
would suffice. 

Comment 12 Arthur Ryman CLA 2005-06-28 14:28:03 EDT
Save the user response for accepting a licence in the area common to the 
installation, i.e. not local to each workspace. This avoids asking the user the 
same question for each new workspace. It's annoying enough as it is, so let's 
make life a little easier for users by just asking them once (per install at 
least). David Williams knows where to store configuration preferences.
Comment 13 Arthur Ryman CLA 2005-06-30 10:54:06 EDT
While testing the current I build I noticed long time delays for Web projects 
and Web services. The problem turned out to be the project builders trigger the 
validators which try to get the J2EE schemas from the Web. If network 
throughput is slow, this causes a very noticeable time delay (last night around 
10 seconds). The problem went away when I enabled caching. The current default 
is to disable caching. Not many users will know about the caching preference 
page and they'll just think that WTP sucks.

I strongly recommend that caching be on by default. The user will get the 
prompt and become aware of the situation. However, we need to understand the 
implication from unattended JUnit tests. They will trigger this dialog. I 
recommend that we create a JVM properpty, e.g. -Dwtp.quiet=true to let our code 
know that it is running unattended. The caching code can check this property 
and automatically accept the licence during testing. 

Lawrence, BTW, shouldn't you change the status on this bug to ASSIGNED?
Comment 14 Lawrence Mandel CLA 2005-06-30 11:03:25 EDT
Changing status to assigned.

I agree that it makes sense to enable caching by default and will turn this back
on. I will also implement the JVM property to allow JUnit tests to run unattended.
Comment 15 Nitin Dahyabhai CLA 2005-06-30 11:32:58 EDT
I disagree about turning it on by default, since having it on precipitated bug
96824.  Unless we can throttle or somehow reduce the resources used by the
cache, turning it on by default could be disastrous to performance depending on
how many lookups are being performed at once.

The cache should make only one request per URL no matter how many times the
extension is asked to resolve that URL.  It would also be a good idea to not
block the caller of the lookup while we're downloading the resource to be
cached.  The download should be spun off to a queuing Job and only after it's
completed should the cached location be returned as a result to the lookup query.
Comment 16 Arthur Ryman CLA 2005-06-30 11:43:24 EDT
Nitin, the cache should behave itself as you describe. That's a bug IMHO and
needs to be fixed asap.

However, most users will have no clue that we even have a cache, so shipping WTP
with it disabled will just generate a lot of bad performance comments.
Comment 17 David Williams CLA 2005-06-30 12:12:15 EDT
I would like to tweak Nitin's requirement of "only one request per URL no matter
how many times the extension is asked to resolve that URL" ... I think that
should be something like "one one request per URL per second" (or similar) ...
there's some occasions when networks come and go so what could not be resolved
one time, might be able to be resolved a few seconds later, after a cable is
plug'd in, or wireless comes back in range, etc. (CVS support does similar
"retry but not too often", I believe). 

Plus, I've opened a "help wanted" enhancement to ensure WTP is well behaved in
this regard. See bug 102350. 

Comment 18 Nitin Dahyabhai CLA 2005-07-12 20:29:42 EDT
Lawrence, regarding your notification of enabling the cache by default, I
strongly disagree that that's the right thing to do at this time.  Unless a
limiter is in place, we're knowingly going to cause bug 96824 to happen.
Comment 19 Arthur Ryman CLA 2005-07-12 21:43:01 EDT
The cache should be on by default since users will experience performance
problems if it's off.
Comment 20 Lawrence Mandel CLA 2005-07-13 00:41:02 EDT
Nitin, I disagree and think the cache should be on by default. If the cache is
probing the same resource repeatedly this is a bug and needs to be addressed for
0.7. I don't think this bug is a reason to disable the cache by default.
Comment 21 Arthur Ryman CLA 2005-07-15 09:30:21 EDT
Created attachment 24837 [details]
Screen shot of web.xml validation error message detail.

The validator was unable to read a schema while caching was off. It could read
it when caching was enabled. Why?
Comment 22 Arthur Ryman CLA 2005-07-15 09:34:11 EDT
I just attached a screen shot. I had a very negative RC1 experience. Why isn't 
caching enabled by default?

Caching was disable when I started RC1. When I created a Web project I got the 
validation error in web.xml because 
http://www.ibm.com/webservices/xsd/j2ee_web_services_client_1_1.xsd couldn't be 
read. Then I enabled caching and it could be read. I could also read it via a 
Web browser. Is this caching interfering with resource resolution?

Why are we using the IBM site? It is official (guaranteed to be there)?
Comment 23 Lawrence Mandel CLA 2005-07-15 11:03:51 EDT
Arthur,

1. Caching is not enabled by default in RC1 as this caused a problem in the
build's automated tests. See bug 103614. (Specifically, the license dialog
caused the build to hang.) I have a fix for this that is ready to go for RC2.
Caching will be enabled in RC2.

2. Caching should not interfere with resource resolution when it is disabled.
This function works fine for me on RC1. You may have had intermittent network
connectivity which caused this failure. Please try this again and let report
back whether you have the same problem.

3. The schema
http://www.ibm.com/webservices/xsd/j2ee_web_services_client_1_1.xsd is included
in http://java.sun.com/xml/ns/j2ee/j2ee_1_4.xsd.
<xsd:include
schemaLocation="http://www.ibm.com/webservices/xsd/j2ee_web_services_client_1_1.xsd"/>
Seems this schema, although hosted by IBM, is part of the J2EE set of schemas.
Comment 24 Lawrence Mandel CLA 2005-11-02 01:04:01 EST
Resolving to fixed. The cache was in 0.7. There are other open defects for
requests related to the cache. Please open new defects for other cache related
problems and requests.
Comment 25 Arthur Ryman CLA 2005-12-09 16:38:46 EST
Verfied.
Comment 26 Lawrence Mandel CLA 2005-12-10 00:02:22 EST
Closing bug.