Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 271681

Summary: [transport][ui] Details error message not shown in the UI
Product: [Eclipse Project] Equinox Reporter: Pascal Rapicault <pascal>
Component: p2Assignee: Henrik Lindberg <henrik.lindberg>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: john.arthorne, matthew, susan
Version: 3.5   
Target Milestone: 3.5 M7   
Hardware: PC   
OS: Mac OS X - Carbon (unsup.)   
Whiteboard:
Bug Depends on: 272604    
Bug Blocks: 267887    
Attachments:
Description Flags
Surface Error Messages
none
Surface Errors from Artifact Repositories
john.arthorne: iplog+
Surface Errors from Metadata Repositories john.arthorne: iplog+

Description Pascal Rapicault CLA 2009-04-08 14:58:48 EDT
I entered a bogus URL in the available software dialog  and the error message just said "No repository found: http://..." I have attached a screenshot.
I was rather disappointed given that a non trivial amount of work went into providing better error message 
s. We need to see why the UI is not showing more.
Comment 1 Matthew Piggott CLA 2009-04-08 17:06:52 EDT
Created attachment 131341 [details]
Surface Error Messages

This should surface more specific error messages for transport errors from SimpleArtifactRepositories and CompositeArtifactRepositories.  

I've also made a slight change to the RepositoryStatus class to change text associated with UnknownHost errors, previously the error was:
 Unknown Host:  http://somehost.com/artifacts.xml

This is changed to simply:
 Unknown Host:  somehost.com

The display of the message is still a bit wonky, the Status output is duplicated (bug 267887) and the text of the Exception appears.  So for an Unknown Host the dialog is:

 Unknown Host:  somehost.com

Expanded Details:
 Unknown Host:  somehost.com
  Unknown Host:  somehost.com
    somehost.com
Comment 2 Susan McCourt CLA 2009-04-09 11:26:55 EDT
fyi, I'm using bug 267887 to track any UI changes needed.  For the most part I am just using the status manager to report any statuses or exceptions returned from underneath but it's possible I'm building statuses with redundant messages...
Comment 3 Matthew Piggott CLA 2009-04-09 13:12:27 EDT
Actually, I looked at the duplicated messages yesterday as well, and it appeared to be the way the StatusHandler displays errors.  I believe the status message is used for the text, and the details section has an entry for the status message and the exception.getCause().

I noticed that if the Status were to have an INFO or WARNING severity then a MessageDialog would be used instead which lacks a Details button/section.  Would it make more sense to display test connection errors in a MessageDialog.openError()?  The status would presumably not be in the Error log, but is it useful to have test connection failures in the log?
Comment 4 Henrik Lindberg CLA 2009-04-15 11:55:00 EDT
Matthew, could you please update the patch, there has been some changes that makes it difficult to apply.
Comment 5 Susan McCourt CLA 2009-04-15 19:35:16 EDT
Henrik, please don't commit the UI-part of the patch, I'd like to take a look at that part separately once we've decided what error codes are surfaced by core.  
The patch as is will suppress any future reporting of problems with the problem sites and while we do want to do that for not-found, I'm not sure we want to do it for the other cases.  There is another way to fix the problem on the UI side.

Also, I'd like to understand the difference between REPOSITORY_NOT_FOUND and REPOSITORY_INVALID_LOCATION and decide how to treat those.

Once you've validated that the core-side stuff is working as expected, can you attach the remaining UI part of the patch to bug 267887 and I will modify as needed...this also relates to UI bug 269507.

For certain errors I'd like to offer the user the chance to fix a bogus URL or delete a bad site, etc...but I need to understand what the codes mean.
Comment 6 Henrik Lindberg CLA 2009-04-15 21:48:00 EDT
(In reply to comment #5)
> Henrik, please don't commit the UI-part of the patch, I'd like to take a look
> at that part separately once we've decided what error codes are surfaced by
> core.  
I think the UI should be prepared to show meaningful info for all of the specified status codes.
Right now, not all of the codes are used, and in some cases the preferred code can't be used as the caller does not handle the status correctly. (I think REPOSITORY_NOT_FOUND is overused).

> The patch as is will suppress any future reporting of problems with the problem
> sites and while we do want to do that for not-found, I'm not sure we want to do
> it for the other cases.  There is another way to fix the problem on the UI
> side.
> 
I am not going to commit the UI code. I am also not certain about several other things suggested in the patch.

> Also, I'd like to understand the difference between REPOSITORY_NOT_FOUND and
> REPOSITORY_INVALID_LOCATION and decide how to treat those.
> 
Me too :)
We now have:
REPOSITORY_INVALID_LOCATION
REPOSITORY_NOT_FOUND
REPOSITORY_FAILED_READ
REPOSITORY_UNKNOWN_TYPE
REPOSITORY_FAILED_AUTHENTICATION

as well as these that deal with creation and writing
REPOSITORY_EXISTS
INTERNAL_ERROR
REPOSITORY_FAILED_WRITE
REPOSITORY_READ_ONLY

and the general
INTERNAL_ERROR

There are two issues - what the codes actually mean, and layering/interpretation.
As I see it, the cases are:
a) The address itself is invalid (not a valid URI, or an un-supported URI form (like relative)). This is caused by a typo - user should edit.
b) There is no handler for the scheme (user typed in a CVS "pserver" URI). User either made a typo, or there is something wrong with the ECF configuration - impossible to say which without checking scheme against a number of non supported schemes.
c) Connection errors (host unknown, wrong port, not connected at all) - can not determine cause, either a typo, network error, or problems at the host end
d) Authentication errors - either wrong address, or user entered wrong credentials (or has none)
e) It is not possible to determine if the address is a reference to a repository because of errors when trying to get the files that tells what sort of repository it is (for example: request for content.jar times out so we don't know if it is a 404-not found, a host problem (say 500), or a network issue).
f) The (marker) file is not found (we get a 404)
g) The (marker) file is found, but can not be completely read (network, io, authentication issues)
h) The repository (marker) file was found, but there is no factory for the type (or version of that type). The configuration is wrong - a factory for the type should either have been in the configuration from the start, or installed from a "built in" repository type using InstallHandlers.
i) A marker file is found, and can be read, there is a factory, but there are syntax/semantic/versioning errors in the read meta data
j) The repository was successfully loaded.

Then consider layering:
- The lowest level can only tell if the (marker) file could be read, and if not, why it could not be read. It should not set codes that have meaning higher up (like retry something else) - the lowest level simply can not know,
- A repository factory can determine if i) is a loadable instance ii) not possible to determine if it is a loadable instance or iii) (almost) certain that it is not a loadable instance or iv) a loadable instance with errors
- the level above factories needs to decide based on the result from each factory in turn if it should try another factory. The factory itself should not decide this.
- A composite factory sort of acts like the level above factories, but should return as if it was a single factory (as there is a level above).

I think it is not strange, that "REPOSITORY_NOT_FOUND" is overused in the code....

Anyway - matching the scenarios with error codes... (*= codes we do not have)
a) INVALID_LOCATION
b) * NO_HANDLER_FOR_SCHEME
c) * CAN_NOT_CONNECT
d) FAILED_AUTHENTICATION
e) * COMMUNICATION_ERROR
f) NOT_FOUND
g) FAILED_READ
h) UNKNOWN_TYPE
i) * SYNTAX_ERROR
j) OK

(c and e, could be combined - from a users perspective the difference is probably not meaningful).

If we don't add the "missing codes" the mapping is a bit difficult:
a-b) INVALID_LOCATION
d) FAILED_AUTHENTICATION
c, e, g, i) FAILED_READ
f) NOT_FOUND
h) UNKNOWN_TYPE
j) OK

A repo factory could return
OK
FAIL - it is this type of repo, there was something wrong when loading it (can not be used)
WARNING - it is this type of repo, there were warnings when loading it (can be used)
NOT_LOADED - it was not this type of repo 

I think these codes should be combinable with the lower level codes
Search for a factory is done when !NOT_LOADED is returned.
When a NOT_LOADED is returned, a check of the detail code determines if it is meaningful to continue the search (i.e. not meaningful to continue after connection errors and authentication errors.

The overall search returns:
OK
FAIL
WARNING
NOT_LOADED

with suitable detail code, and possibly with detailed exception.

> Once you've validated that the core-side stuff is working as expected, can you
> attach the remaining UI part of the patch to bug 267887 and I will modify as
> needed...this also relates to UI bug 269507.
> 
> For certain errors I'd like to offer the user the chance to fix a bogus URL or
> delete a bad site, etc...but I need to understand what the codes mean.
> 
So do I :)

Comment 7 Susan McCourt CLA 2009-04-15 22:18:52 EDT
It seems like what we are getting at in this bug is that there are all these specific things that can go wrong underneath (different exceptions, bad returns, timeouts, etc.), and then we are mapping these to some generic codes, and then the UI is deciding what to do with those codes and what the user might do about it.  Three layers.

So I completely agree with what you say about layering and intepretation.  In order to shed some light on which codes might be meaningful, here are the kinds of errors that would cause us to do different things in the UI.  I think I'm basically restating what you said above, but by putting it in terms of the user corrections, I'm hoping it helps.  The UI is only going to check codes based on what it can suggest/do, whereas the underlying suggestion could provide much more detail to a support person.  I'm not suggesting these are the actual codes, but here is the ideal way that the UI would classify errors:

- NETWORK_IS_DEAD.
If there is a REPOSITORY_NOT_FOUND because the network is dead, the possible user correction (use only local sites, disable networked sites for now, etc.) is different than if the URI was wrong.
- BOGUS HOST.
There is a network, but the host is not there.  This is useful to know because it could just be a typo.  This has nothing to do with whether a particular marker was there, the host itself is not there and the user should try something different.
- REPOSITORY_NOT_FOUND
- the host is there, but there was not a repository found.  At this point, the user doesn't really care which marker was gone, which factory had a problem.  If the host is there but the file names are not present, that is probably the true case of "repo not found."
- BOGUS REPOSITORY
- a repository was found but there was some problem with it.  From the user's point of view, this is pretty close to REPOSITORY_NOT_FOUND, but it's helpful to know that there was *supposed* to be one there and that it's not a typo.  The message could tell them to contact the provider and provide the detailed spew.
Comment 8 Henrik Lindberg CLA 2009-04-15 22:52:45 EDT
(In reply to comment #7)

> - REPOSITORY_NOT_FOUND
> - the host is there, but there was not a repository found.  At this point, the
> user doesn't really care which marker was gone, which factory had a problem. 
> If the host is there but the file names are not present, that is probably the
> true case of "repo not found."

Yes, this is the "trivial case". (i.e positive proof that location is not referencing a repository),

> - BOGUS REPOSITORY
> - a repository was found but there was some problem with it.  From the user's
> point of view, this is pretty close to REPOSITORY_NOT_FOUND, but it's helpful
> to know that there was *supposed* to be one there and that it's not a typo. 
> The message could tell them to contact the provider and provide the detailed
> spew.
> 
This is also trivial (i.e. positive proof - there is something there, but it is somehow "broken").

The rest are all some sort of "error" from invalid location syntax to not having an internet connection. It is useful for the user to know where they problem may be:
- what user entered (the address, username, password, etc)
- local environment (is the ethernet cable plugged in)
- "the network" - can I talk to this host
- "the remote host" - is it responding

> - NETWORK_IS_DEAD.
> If there is a REPOSITORY_NOT_FOUND because the network is dead, the possible
> user correction (use only local sites, disable networked sites for now, etc.)
> is different than if the URI was wrong.
> - BOGUS HOST.
> There is a network, but the host is not there.  This is useful to know because
> it could just be a typo.  This has nothing to do with whether a particular
> marker was there, the host itself is not there and the user should try
> something different.

Unfortunately, it is difficult to differentiate - I think that the codes I proposed would give the user a clue to what is going on. The typical message to the user would be "Could not load/use repository because...." (as opposed to "repository not found") and then one of "authentication issues", "connection issues", "errors communicating with the server"...

I believe we are saying the same thing.
Comment 9 Matthew Piggott CLA 2009-04-16 11:16:43 EDT
Created attachment 132091 [details]
Surface Errors from Artifact Repositories
Comment 10 Susan McCourt CLA 2009-04-16 11:18:35 EDT
> I believe we are saying the same thing.
> 
Yep...so are you going to take a stab at this?
I could add the UI handling after the fact per bug 267887.

One thing to consider is that the repo manager currently has special handling for REPOSITORY_NOT_FOUND (rememberNotFound...), so that it doesn't constantly try to communicate with a missing repo.  For which error codes would this happen...  I think this is why so many errors underneath currently get mapped to NOT_FOUND at the repo manager level...
Comment 11 Matthew Piggott CLA 2009-04-16 11:32:15 EDT
Created attachment 132094 [details]
Surface Errors from Metadata Repositories

The previous should surface problems using TestConnection (once the UI is updated) as it first checks Artifact repositories.  

This patch should surface errors when adding repositories through the UI, it modifies the CacheManager to surface errors.  Unfortunately there seems to be a timing problem in FileInfoReader which can result in most problems becoming simply Not Found.

In the FileInfoReader#sendBrowseRequest() method, if the transfer exception is returned via listener (as is the case for UnknownHost) it does not always occur prior to the check for null.  This also leads to breaking out of the loop before the maximum retry attempts.
Comment 12 Henrik Lindberg CLA 2009-04-16 20:47:40 EDT
(In reply to comment #11)
> In the FileInfoReader#sendBrowseRequest() method, if the transfer exception is
> returned via listener (as is the case for UnknownHost) it does not always occur
> prior to the check for null.  This also leads to breaking out of the loop
> before the maximum retry attempts.
> 
Nice catch! Thanks Matthew- I created Bug 272604 to track fixing that part of the issue.
Comment 13 Henrik Lindberg CLA 2009-04-17 18:50:52 EDT
Released patch to HEAD. 
The presentation is not perfect, but now erros do surface to the UI. 

Comment 14 John Arthorne CLA 2009-06-01 12:58:31 EDT
Removing iplog+ from bug - this indicates an IP contribution in a comment
rather than a patch.

http://wiki.eclipse.org/Development_Resources/Automatic_IP_Log