Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 275398 - [ui] cancel in install new software of add repo while pending does not cancel
Summary: [ui] cancel in install new software of add repo while pending does not cancel
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: p2 (show other bugs)
Version: 3.5   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: 3.5 RC1   Edit
Assignee: John Arthorne CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-07 20:49 EDT by Henrik Lindberg CLA
Modified: 2009-06-03 11:49 EDT (History)
2 users (show)

See Also:
susan: review+


Attachments
Stack trace (17.34 KB, text/plain)
2009-05-12 15:14 EDT, John Arthorne CLA
no flags Details
Possible fix (1.05 KB, patch)
2009-05-14 00:24 EDT, John Arthorne CLA
no flags Details | Diff
Fix part two (1.77 KB, patch)
2009-05-14 00:41 EDT, John Arthorne CLA
no flags Details | Diff
patch - replacement for fix part one (1.22 KB, patch)
2009-05-14 12:00 EDT, Susan McCourt CLA
no flags Details | Diff
Replacement for part 1 (1.59 KB, patch)
2009-05-14 13:49 EDT, John Arthorne CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Henrik Lindberg CLA 2009-05-07 20:49:51 EDT
Canceling when contacting repos in "install new software" causes later cancelation to hang.

To reproduce:
1. Run testserver
2. Add the site http://localhost:8080/proxy/declerate 
     (a really slow connection to updates 3.4)
3. See "Pending" appear
4. Cancel the dialog
5. Start "software updates"
     Notice that is reports "cancel requested"
6. Try to cancel "software updates"
     It marks everything as canceled requested
     The UI is now stuck.

This may be related to Bug 263600
Comment 1 Susan McCourt CLA 2009-05-08 10:48:39 EDT
Can't reproduce the hang, and I'm not sure this is a UI-level cancellation problem.

(In reply to comment #0)
> Canceling when contacting repos in "install new software" causes later
> cancelation to hang.
> 
> To reproduce:
> 1. Run testserver
> 2. Add the site http://localhost:8080/proxy/declerate 
>      (a really slow connection to updates 3.4)

I had to use http://localhost:8080/proxy/decelerate 

> 3. See "Pending" appear
> 4. Cancel the dialog

so far so good, at this point the progress view shows cancel is requested but the fetch job (the load) is not responding

> 5. Start "software updates"
>      Notice that is reports "cancel requested"

What is "software updates".  Do you mean Help>Check for Updates?
(Software Updates was the 3.4 menu)
I did "Help>Check for Updates"

I saw the "Contacting Software Sites" job in the progress view

> 6. Try to cancel "software updates"
>      It marks everything as canceled requested
>      The UI is now stuck.

I saw the job marked canceled and the UI was responsive.

> 
> This may be related to Bug 263600
> 

need to understand the steps better and how we know this to be a UI issue.
It seems to me there are still transport cases where repos are not responding to cancellation.  The UI code paths for cancellation are the same, so when one repo responds and another doesn't, this usually indicates a non-responsive site.



Comment 2 Henrik Lindberg CLA 2009-05-08 11:18:03 EDT
(In reply to comment #1)
> I had to use http://localhost:8080/proxy/decelerate 
>
sorry, typo.
 
> > 5. Start "software updates"
> >      Notice that is reports "cancel requested"
> 
> What is "software updates".  Do you mean Help>Check for Updates?
yes.
> I saw the "Contacting Software Sites" job in the progress view
>
Expand the details.
 
> > 6. Try to cancel "software updates"
> >      It marks everything as canceled requested
> >      The UI is now stuck.
> 
> I saw the job marked canceled and the UI was responsive.
> 
What I mean is: the UI is responsive, but it never cancels. So dialog is stuck.
> need to understand the steps better and how we know this to be a UI issue.
> It seems to me there are still transport cases where repos are not responding
> to cancellation.  The UI code paths for cancellation are the same, so when one
> repo responds and another doesn't, this usually indicates a non-responsive
> site.
> 
I think there is something wrong with the cancelation/job that runs when adding a new site in "install new software" (i.e. the thing that runs when "Pending" is shown. When canceling in the other dialogs, the job is canceled the correct way - all the way down to transport. Does it really call setCancel(true) on the monitor? That should cancel the transport.

Comment 3 Susan McCourt CLA 2009-05-08 18:16:28 EDT
Thanks, I see what you mean now, I'll look into this.
Comment 4 Susan McCourt CLA 2009-05-11 14:16:54 EDT
Assigning to John as he has some spare cycles.
I took a quick look at the DeferredTreeContentManager, RemoteQueriedElement, and QueriedElement.  The monitor is being passed around properly, so I think it's ok on the element/query side.

The load job is supposed to be cancelled when the content manager is disposed, but it's entirely possible that the content manager is not being disposed when the wizard is closed.
Comment 5 John Arthorne CLA 2009-05-12 15:14:14 EDT
Created attachment 135434 [details]
Stack trace

I can reproduce the fact that there are still two threads running after cancelation, but the update check dialog does close immediately when cancel is pressed.
Comment 6 Susan McCourt CLA 2009-05-13 18:00:36 EDT
Looking at the stack trace, I don't see anything unexplainable from the UI's point of view.

The update check thread is waiting because the FileReader is in a join().  All of this is part of a metadata repo load happening in the update check.

Meanwhile, the UI tries to load a metadata repo in the fetching children job created by the content manager.  This job will be blocked on the load lock until the first load returns.  During this lock, cancellation requests on the UI fetch job won't respond.  Once the other load completes, the UI job will resume and then the next check of the monitor will result in the load cancel.

I'm not sure there's anything to be fixed for 3.5 in this bug.
Comment 7 John Arthorne CLA 2009-05-14 00:22:29 EDT
>org.eclipse.equinox.internal.p2.ui.DefaultQueryProvider.getQueryDescriptor(DefaultQueryProvider.java:55)
>org.eclipse.equinox.internal.p2.ui.model.QueriedElement.fetchChildren(QueriedElement.java:93)

There is a disconnect between these two lines that prevents cancelation from being propagated. QueriedElement.fetchChildren takes a progress monitor argument, but then calls the following with passing the monitor:

getQueryProvider().getQueryDescriptor(this);

This ends up calling MetadataRepositoryElement.getQueryable, which does this:

if (queryable == null)
	return getMetadataRepository(new NullProgressMonitor());

So the progress monitor chain is broken and cancelation is not propagated down.
 
Comment 8 John Arthorne CLA 2009-05-14 00:24:46 EDT
Created attachment 135725 [details]
Possible fix

I don't know if this is the right fix, but it was the only way I could see to ensure the repository was loaded with the correct monitor so it could repond to cancelation.
Comment 9 John Arthorne CLA 2009-05-14 00:41:09 EDT
Created attachment 135726 [details]
Fix part two

Part two of the fix is to pass the monitor into the enterLoad method, and respond to cancelation. With these two patches, the update check bails promptly on cancel.

Note I'm not ready to ask for review here. I need to look them over again tomorrow.
Comment 10 Susan McCourt CLA 2009-05-14 01:20:16 EDT
(In reply to comment #7)
> >org.eclipse.equinox.internal.p2.ui.DefaultQueryProvider.getQueryDescriptor(DefaultQueryProvider.java:55)
> >org.eclipse.equinox.internal.p2.ui.model.QueriedElement.fetchChildren(QueriedElement.java:93)
> 
> There is a disconnect between these two lines that prevents cancelation from
> being propagated. QueriedElement.fetchChildren takes a progress monitor
> argument, but then calls the following with passing the monitor:
> 
> getQueryProvider().getQueryDescriptor(this);
> 
> This ends up calling MetadataRepositoryElement.getQueryable, which does this:
> 
> if (queryable == null)
>         return getMetadataRepository(new NullProgressMonitor());
> 
> So the progress monitor chain is broken and cancelation is not propagated down.

Good catch.  For verification purposes, note that the broken progress monitor chain in MetadataRepositoryElement will be observed when a single site is chosen in the "Work With" combo.  When "All Available" or "Only Local" are chosen, the queryable is a QueryableMetadataRepositoryManager.

I know you aren't asking for review (yet), but FWIW, the fix looks reasonable.  It ensures the repo is loaded outside of the query provider method.  I also agree that without the second fix, the first fix often won't matter.  Because we start a parallel background load of all repos when we open the wizard, the chances are high that there's already a load lock on the manager when we are populating the wizard.  


Comment 11 John Arthorne CLA 2009-05-14 11:30:46 EDT
Thanks Susan. I'm quite comfortable with "part two" because this is a big general improvement to cancelation responsiveness. I'm sure that will help out in a number of cases. Since the loadRepository method will already throw OperationCanceledException on cancelation, I think it is a very low risk change.

I just didn't have the context to know if the UI fix looked reasonable. Based on your feedback I would be inclined to go ahead with this. I'm doing some more testing today with patches from both 275975 and this bug, and so far it is looking really good. Cancelation is prompt for various operations even when dealing with the "timeout" and "decelerate" test repositories.
Comment 12 Susan McCourt CLA 2009-05-14 12:00:08 EDT
Created attachment 135805 [details]
patch - replacement for fix part one

updated patch.  This patch creates child monitors so that the potential load and fetch share the monitor passed into the fetchChildren method.  I'm going to use this patch and your patch for testing today.
Comment 13 Susan McCourt CLA 2009-05-14 12:04:12 EDT
Comment on attachment 135726 [details]
Fix part two

(sorry for all the noise, I obsoleted the wrong patch)
Comment 14 John Arthorne CLA 2009-05-14 13:49:21 EDT
Created attachment 135829 [details]
Replacement for part 1

I'm finding the previous patch isn't good enough. Here is what happens:

- MetadataRepositoryElement.getMetadataRepository is called
- It catches and swallows cancelation
- We then call super.fetchChildren which never checks for cancelation.
- Since the queryable is still null, this ends up calling loadRepository with a null progress monitor

-> Thus we end up in the same hanging state.

This patch propagates the cancelation, which from some basic testing seems to do the trick.
Comment 15 John Arthorne CLA 2009-05-14 14:21:08 EDT
I think these two patches (comment 9 and comment 14) are ready for review. In conjunction with the patch on Bug 275975 I tested a variety of different scenarios (cancel while pending in available software dialog, cancel check for updates, cancel test connection) using several different test servers (timeout, readtimeout, decelerate). Cancelation is prompt in all cases and there are no errors either logged or in dialogs.
Comment 16 John Arthorne CLA 2009-05-14 14:52:55 EDT
I accidentally released part two to HEAD while releasing another fix. I will revert it if this isn't reviewed in the next 3 hours.
Comment 17 Susan McCourt CLA 2009-05-14 16:11:40 EDT
Fixed in HEAD >20090514 (released the UI part).
This is much improved.  The cancellation of the "Fetching children" job in the UI is immediate, even with bad servers.  

Verified that the repo is remembered is bad once the timeout fails.
I also retested some (albeit unrelated) cancel scenarios that were previously problematic to make sure this didn't add any new gotchas:
- cancelling during resolution shouldn't cancel the entire wizard
- cancelling authentication of one repo shouldn't cancel the entire operation.

Thx for the help on this, John (and to Henrik for pointing it out), it makes a nice difference.
Comment 18 John Arthorne CLA 2009-06-03 11:49:20 EDT
*** Bug 278973 has been marked as a duplicate of this bug. ***