Community
Participate
Working Groups
Canceling when contacting repos in "install new software" causes later cancelation to hang. To reproduce: 1. Run testserver 2. Add the site http://localhost:8080/proxy/declerate (a really slow connection to updates 3.4) 3. See "Pending" appear 4. Cancel the dialog 5. Start "software updates" Notice that is reports "cancel requested" 6. Try to cancel "software updates" It marks everything as canceled requested The UI is now stuck. This may be related to Bug 263600
Can't reproduce the hang, and I'm not sure this is a UI-level cancellation problem. (In reply to comment #0) > Canceling when contacting repos in "install new software" causes later > cancelation to hang. > > To reproduce: > 1. Run testserver > 2. Add the site http://localhost:8080/proxy/declerate > (a really slow connection to updates 3.4) I had to use http://localhost:8080/proxy/decelerate > 3. See "Pending" appear > 4. Cancel the dialog so far so good, at this point the progress view shows cancel is requested but the fetch job (the load) is not responding > 5. Start "software updates" > Notice that is reports "cancel requested" What is "software updates". Do you mean Help>Check for Updates? (Software Updates was the 3.4 menu) I did "Help>Check for Updates" I saw the "Contacting Software Sites" job in the progress view > 6. Try to cancel "software updates" > It marks everything as canceled requested > The UI is now stuck. I saw the job marked canceled and the UI was responsive. > > This may be related to Bug 263600 > need to understand the steps better and how we know this to be a UI issue. It seems to me there are still transport cases where repos are not responding to cancellation. The UI code paths for cancellation are the same, so when one repo responds and another doesn't, this usually indicates a non-responsive site.
(In reply to comment #1) > I had to use http://localhost:8080/proxy/decelerate > sorry, typo. > > 5. Start "software updates" > > Notice that is reports "cancel requested" > > What is "software updates". Do you mean Help>Check for Updates? yes. > I saw the "Contacting Software Sites" job in the progress view > Expand the details. > > 6. Try to cancel "software updates" > > It marks everything as canceled requested > > The UI is now stuck. > > I saw the job marked canceled and the UI was responsive. > What I mean is: the UI is responsive, but it never cancels. So dialog is stuck. > need to understand the steps better and how we know this to be a UI issue. > It seems to me there are still transport cases where repos are not responding > to cancellation. The UI code paths for cancellation are the same, so when one > repo responds and another doesn't, this usually indicates a non-responsive > site. > I think there is something wrong with the cancelation/job that runs when adding a new site in "install new software" (i.e. the thing that runs when "Pending" is shown. When canceling in the other dialogs, the job is canceled the correct way - all the way down to transport. Does it really call setCancel(true) on the monitor? That should cancel the transport.
Thanks, I see what you mean now, I'll look into this.
Assigning to John as he has some spare cycles. I took a quick look at the DeferredTreeContentManager, RemoteQueriedElement, and QueriedElement. The monitor is being passed around properly, so I think it's ok on the element/query side. The load job is supposed to be cancelled when the content manager is disposed, but it's entirely possible that the content manager is not being disposed when the wizard is closed.
Created attachment 135434 [details] Stack trace I can reproduce the fact that there are still two threads running after cancelation, but the update check dialog does close immediately when cancel is pressed.
Looking at the stack trace, I don't see anything unexplainable from the UI's point of view. The update check thread is waiting because the FileReader is in a join(). All of this is part of a metadata repo load happening in the update check. Meanwhile, the UI tries to load a metadata repo in the fetching children job created by the content manager. This job will be blocked on the load lock until the first load returns. During this lock, cancellation requests on the UI fetch job won't respond. Once the other load completes, the UI job will resume and then the next check of the monitor will result in the load cancel. I'm not sure there's anything to be fixed for 3.5 in this bug.
>org.eclipse.equinox.internal.p2.ui.DefaultQueryProvider.getQueryDescriptor(DefaultQueryProvider.java:55) >org.eclipse.equinox.internal.p2.ui.model.QueriedElement.fetchChildren(QueriedElement.java:93) There is a disconnect between these two lines that prevents cancelation from being propagated. QueriedElement.fetchChildren takes a progress monitor argument, but then calls the following with passing the monitor: getQueryProvider().getQueryDescriptor(this); This ends up calling MetadataRepositoryElement.getQueryable, which does this: if (queryable == null) return getMetadataRepository(new NullProgressMonitor()); So the progress monitor chain is broken and cancelation is not propagated down.
Created attachment 135725 [details] Possible fix I don't know if this is the right fix, but it was the only way I could see to ensure the repository was loaded with the correct monitor so it could repond to cancelation.
Created attachment 135726 [details] Fix part two Part two of the fix is to pass the monitor into the enterLoad method, and respond to cancelation. With these two patches, the update check bails promptly on cancel. Note I'm not ready to ask for review here. I need to look them over again tomorrow.
(In reply to comment #7) > >org.eclipse.equinox.internal.p2.ui.DefaultQueryProvider.getQueryDescriptor(DefaultQueryProvider.java:55) > >org.eclipse.equinox.internal.p2.ui.model.QueriedElement.fetchChildren(QueriedElement.java:93) > > There is a disconnect between these two lines that prevents cancelation from > being propagated. QueriedElement.fetchChildren takes a progress monitor > argument, but then calls the following with passing the monitor: > > getQueryProvider().getQueryDescriptor(this); > > This ends up calling MetadataRepositoryElement.getQueryable, which does this: > > if (queryable == null) > return getMetadataRepository(new NullProgressMonitor()); > > So the progress monitor chain is broken and cancelation is not propagated down. Good catch. For verification purposes, note that the broken progress monitor chain in MetadataRepositoryElement will be observed when a single site is chosen in the "Work With" combo. When "All Available" or "Only Local" are chosen, the queryable is a QueryableMetadataRepositoryManager. I know you aren't asking for review (yet), but FWIW, the fix looks reasonable. It ensures the repo is loaded outside of the query provider method. I also agree that without the second fix, the first fix often won't matter. Because we start a parallel background load of all repos when we open the wizard, the chances are high that there's already a load lock on the manager when we are populating the wizard.
Thanks Susan. I'm quite comfortable with "part two" because this is a big general improvement to cancelation responsiveness. I'm sure that will help out in a number of cases. Since the loadRepository method will already throw OperationCanceledException on cancelation, I think it is a very low risk change. I just didn't have the context to know if the UI fix looked reasonable. Based on your feedback I would be inclined to go ahead with this. I'm doing some more testing today with patches from both 275975 and this bug, and so far it is looking really good. Cancelation is prompt for various operations even when dealing with the "timeout" and "decelerate" test repositories.
Created attachment 135805 [details] patch - replacement for fix part one updated patch. This patch creates child monitors so that the potential load and fetch share the monitor passed into the fetchChildren method. I'm going to use this patch and your patch for testing today.
Comment on attachment 135726 [details] Fix part two (sorry for all the noise, I obsoleted the wrong patch)
Created attachment 135829 [details] Replacement for part 1 I'm finding the previous patch isn't good enough. Here is what happens: - MetadataRepositoryElement.getMetadataRepository is called - It catches and swallows cancelation - We then call super.fetchChildren which never checks for cancelation. - Since the queryable is still null, this ends up calling loadRepository with a null progress monitor -> Thus we end up in the same hanging state. This patch propagates the cancelation, which from some basic testing seems to do the trick.
I think these two patches (comment 9 and comment 14) are ready for review. In conjunction with the patch on Bug 275975 I tested a variety of different scenarios (cancel while pending in available software dialog, cancel check for updates, cancel test connection) using several different test servers (timeout, readtimeout, decelerate). Cancelation is prompt in all cases and there are no errors either logged or in dialogs.
I accidentally released part two to HEAD while releasing another fix. I will revert it if this isn't reviewed in the next 3 hours.
Fixed in HEAD >20090514 (released the UI part). This is much improved. The cancellation of the "Fetching children" job in the UI is immediate, even with bad servers. Verified that the repo is remembered is bad once the timeout fails. I also retested some (albeit unrelated) cancel scenarios that were previously problematic to make sure this didn't add any new gotchas: - cancelling during resolution shouldn't cancel the entire wizard - cancelling authentication of one repo shouldn't cancel the entire operation. Thx for the help on this, John (and to Henrik for pointing it out), it makes a nice difference.
*** Bug 278973 has been marked as a duplicate of this bug. ***