Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 226052

Summary: [ui] - first filter text keystroke in available IU list can hang UI if no repos have been loaded
Product: [Eclipse Project] Equinox Reporter: Susan McCourt <susan>
Component: p2Assignee: Susan McCourt <susan>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: greg.johnson, john.arthorne, overholt
Version: 3.4Flags: john.arthorne: review+
john.arthorne: review+
Target Milestone: 3.4 RC1   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Attachments:
Description Flags
patch for background filtering
none
this one includes the NLS changes
none
updated patch to create a new scheduling rule on input change
none
patch for delaying the synchronous setting and disabling the available page none

Description Susan McCourt CLA 2008-04-07 18:38:33 EDT
Bug #204826 discusses the difficulty in synchronizing deferred content fetching with the FilteredTree support.  The solution is to switch to synchronous viewer content fetching when the user starts using filtering.

The first time a filter keystroke is typed, if the repos involved are not loaded, the filter job (which runs in the UI) will cause a refresh of the tree and there will be a long wait.  There is not a good way to provide cancellation or iterative filtering without reimplementing the wheel.

If we find that users are accidentally typing in this field and getting unexpected delays, we could consider disabling the filter initially, background loading all the repos, and enabling the filter when that is done. 

I didn't do this as a first approach because I don't see taking the time and memory to load everything if the user never intends to type in the filter text at all.
Comment 1 Susan McCourt CLA 2008-04-14 19:17:20 EDT
I had an idea that doesn't require eager loading of the repos.

When the user first types into filter text, save away the keystroke, disable the filter text box while starting a background job that sets the content viewer synchronous mode to true and does a full expansion.  When the job is finished, enable the filter field and restore the typed text, which will then run the filter job on the already expanded contents.

This may be a bit squirrely, but is better than the long hang.
Comment 2 Susan McCourt CLA 2008-04-29 17:14:04 EDT
*** Bug 229405 has been marked as a duplicate of this bug. ***
Comment 3 Susan McCourt CLA 2008-05-07 14:41:23 EDT
Created attachment 99141 [details]
patch for background filtering
Comment 4 Susan McCourt CLA 2008-05-07 14:47:13 EDT
John, can you review this patch?

1 - adds "API" to QueryableMetadataRepositoryManager to force a load of all repos
2 - adds scheduling rule to the filter job for synching the background load with the filtering
3 - adds "API" to ProvisioningOperationRunner to manage jobs created by someone else so that the progress can be shown the way it is for other operations.

The patch is fairly surgical/specific.  At a different time in the release I would do it differnetly 
- It is specific to loading metadata repositories whereas the rest of the code in DeferredFetchFilteredTree is trying to be a bit more generic.  
- ideally the load would be an operation managed by ProvisioningOperationRunner but I need to control exactly how the job is scheduled and created.

Also, the strings aren't NLS'ed because I have conflicting changes in my workspace but I will do it before releasing.

Comment 5 Susan McCourt CLA 2008-05-07 15:57:20 EDT
Created attachment 99161 [details]
this one includes the NLS changes

This patch externalizes the strings AND uses the same string constant for both the job name and in the filter text ("Retrieving List")
Comment 6 Susan McCourt CLA 2008-05-07 16:28:33 EDT
Created attachment 99174 [details]
updated patch to create a new scheduling rule on input change

John pointed out that the ability to reset the input on the scheduling rule that coordinates the filter and the load job could cause problems in the job scheduler, because it means the answer to isConflicting(ISchedulingRule) could change during a jobs lifetime.

Instead, this patch cancels both the filter and the load job when input changes, nulls out/recreates the rule, and resets the new rule into the filter job.  The load job will get the new rule when it is next created.
Comment 7 John Arthorne CLA 2008-05-07 18:02:51 EDT
+1

This change is big enough that I have a hard time saying for sure that it will all work perfectly, but I think it's worth releasing. We can then hammer on it and make sure there are no weird issues. I see no more obvious problems in the code after looking at it for about an hour.
Comment 8 Susan McCourt CLA 2008-05-07 19:31:36 EDT
I agree with your comments.  I ran the original patch for most of the morning before attaching it and for the mainstream usage it was a huge improvement.  Given how bad the problem is that this fixes, some unexpected glitches would still be a big improvement.

Released to HEAD >20080507
Comment 9 Susan McCourt CLA 2008-05-08 13:20:17 EDT
glitch #1
If you try to expand the repo that is loading in the background job, you can still lock up the UI until the repo is finished loading.  I believe this is because the request to load a repository that is being loaded will be locked since access to the repo instances is synchronized.  What's needed here is to make sure the content provider is not set to a synchronous mode until we know the load is done.  I'm working on a new patch.
Comment 10 Susan McCourt CLA 2008-05-08 13:22:32 EDT
Note - I think I didn't notice this in yesterday's testing because I had a lot of remote repos enabled.  So the chances that I would expand the same one being loaded were not as high as today, when I only have one remote repo enabled.  So clicking it always causes the hang.
Comment 11 Susan McCourt CLA 2008-05-12 12:02:44 EDT
The discussion in the p2 call this morning was that we should simply disable the dialog while the load is happening, making sure the user knows why they are waiting.
Comment 12 Susan McCourt CLA 2008-05-12 17:37:50 EDT
Created attachment 99829 [details]
patch for delaying the synchronous setting and disabling the available page

This patch changes the synchronization of the filter and load job
Comment 13 Susan McCourt CLA 2008-05-12 17:57:18 EDT
John, can you review this patch?
It changes the way the filter job and load job are synchronized, most importantly the timing at which the content provider is set to a synchronous mode.  It also disables more than just the filter control.

Previously we immediately set the content provider to a synchronous retrieval mode when a load job was started.  The problem here is that during the time the load is taking place, any other user action, or repository event received, could cause a refresh and subsequent query in the UI, and with the content provider already set to synchronous mode, the UI would hang again until the load was done.

This patch 
- does not set the synchronous flag until the load is finished.
- therefore, it cannot use the synchronous flag to determine if it needs to trigger a load.  Instead, it looks for the presence of the load job to know if it needs to start another one.
- I separated the detection of the need for a load job and the job management code.  It was previously occurring in a display.syncExec and this seemed dangerous to me.  This meant the introduction of a boolean to know what happened in the UI thread.
- changed the filter disabling and enabling code to disable/enable the entire
page.  This prevents the user refreshing during the load, or trying to install, etc.  

I've now run with this code for a couple hours while fixing some other things, and have had no hangs.

I should also note that before adding the code to disable the page, I did run the rest of the patch to see if the user could do anything interesting while the load is occurring.  When expanding and collapsing tree items, you would see blank content.  You could refresh or manage sites without incident.  However, if you had a previous selection such that the install button was enabled, you could still try to install while the viewers were repopulating.  This is why I ended up disabling the page, but I wanted to mention that the change in synchronization of the content provider did eliminate the freezing inside the viewer.

I also tested that you could successfully change to the installed feature page, click around, view properties, check for updates, etc.. while the other stuff was going on.  
Comment 14 John Arthorne CLA 2008-05-13 12:09:39 EDT
Apart from one problem with the filter job cancelation, this patch looks good. I have reviewed the code, and installed a patched bundle into my environment and tested with various large sites. The experience is much improved, and I have had no hangs. Before I applied the patch I was getting frequent hangs when filtering immediately after startup with several sites in my list.
Comment 15 Susan McCourt CLA 2008-05-13 14:31:28 EDT
Patch released to HEAD >20080513 with the following changes based on discussion with John:

1) don't reset the filtering rule when cancelling the filter job, because that is not safe to do until the job actually stops
2) move the reset of the filtering rule to a done() listener on the filter job
3) had to null out the scheduling rule *before* making the calls to cancel the filter or load job when input changed.  Not doing so meant that the scheduling rule wouldn't necessarily be recreated when cancelling the jobs
4) doing #3 exposed a bug...the input on the viewer is not reset until after the inputChanged method is called in the filtered tree.  This meant that before, resetting the rule could reset it using the old input.  The filtered tree has to cache the input when it receives the inputChanged so that the rule can be set up using the cached input rather than the viewer input (in case the cancel listener fires before the viewer resets the input).

Without doing #4, the scheduling rule was recreated on a null input and the load job was skipped over.  This was immediately obvious in testing because the filter text would immediately reenable, then the UI would hang while the viewer fetch triggered a load.

I did not discuss #4 specifically with John because he was not available, but he ok'ed the general approach and said I could release and attach the patch for him to double check later. I tested with a large set of repos and different filters and am seeing the same behavior as the patch he reviewed.

(I am releasing vs. waiting for another review in the interest of time.  My next fix which is in progress touches the same code and may actually slightly change the job sequence again. )
Comment 16 Susan McCourt CLA 2008-05-13 14:33:41 EDT
John - after all that I forgot to attach a patch, but you can simply review the differences between the revisions of DeferredFetchFilteredTree 1.5 and 1.6.  
Comment 17 John Arthorne CLA 2008-05-26 09:37:07 EDT
*** Bug 233845 has been marked as a duplicate of this bug. ***