Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 311756 - When resuming from yieldRule(), the monitor's blocked state is not cleared.
Summary: When resuming from yieldRule(), the monitor's blocked state is not cleared.
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Runtime (show other bugs)
Version: 3.6   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.6 RC1   Edit
Assignee: John Arthorne CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-05-05 11:22 EDT by Min Idzelis CLA
Modified: 2010-05-10 16:34 EDT (History)
2 users (show)

See Also:
tjwatson: review+


Attachments
Test case (4.81 KB, patch)
2010-05-05 11:25 EDT, Min Idzelis CLA
no flags Details | Diff
Fix (799 bytes, patch)
2010-05-05 11:38 EDT, Min Idzelis CLA
no flags Details | Diff
Additional test case (5.61 KB, patch)
2010-05-07 13:57 EDT, John Arthorne CLA
no flags Details | Diff
Fix v02 (1.46 KB, patch)
2010-05-07 14:32 EDT, John Arthorne CLA
no flags Details | Diff
Tests all four unblocking cases (8.64 KB, patch)
2010-05-07 14:33 EDT, John Arthorne CLA
no flags Details | Diff
Fix v03 (3.01 KB, patch)
2010-05-07 15:35 EDT, John Arthorne CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Min Idzelis CLA 2010-05-05 11:22:48 EDT
Build Identifier: 

A Job that successfully performs a yieldRule() sets the blocked state of the monitor, but does not clear it. This results (sometimes) in the BlockedJobsDialog lingering much longer than it should be.  

Reproducible: Always
Comment 1 Min Idzelis CLA 2010-05-05 11:25:15 EDT
Created attachment 167161 [details]
Test case
Comment 2 Min Idzelis CLA 2010-05-05 11:38:15 EDT
Created attachment 167164 [details]
Fix
Comment 3 John Arthorne CLA 2010-05-07 13:05:27 EDT
I see. In ImplicitJobs#begin we clear this flag in the finally block, but it isn't done in the yield case.
Comment 4 John Arthorne CLA 2010-05-07 13:25:54 EDT
It concerns me that we don't call waitEnd in all cases, which means we won't be clearing the blocked flag in all cases on yield. From ThreadJob#waitForRun:

	if (threadJob == result) {
		waitEnd(threadJob, monitor);

Specifically we won't clear the blocked flag in "Condition #3", where the rule was transferred to the waiting thread.
Comment 5 John Arthorne CLA 2010-05-07 13:57:06 EDT
Created attachment 167516 [details]
Additional test case

New test case involving yield and transfer. This one fails to unblock. I'm look into a fix.
Comment 6 Min Idzelis CLA 2010-05-07 14:20:02 EDT
Hey, good catch John!

On a related note, I am also seeing a deadlock triggered by yieldRule() (unrelated to this monitor not being cleared). 

It looks like it might be related to endWait(). 

It does not not require transferRule() however. 

I'm still debugging (it's very time consuming) but the deadlock is caused because there are two ThreadJobs being created for the same Job. (Both ThreadJobs's have the exact same 'realJob'). This is clearly wrong, but I'm having a hard time tracking down why a second ThreadJob is being created. It might have something to do with one ThreadJob not being removed from the waiting JobQueue. 

I will be independently tracking this down. I think that possibly we are seeing two sides to the same problem. 

Keep me posted with anything you find, and I'll do the same. 

Thanks.
Comment 7 John Arthorne CLA 2010-05-07 14:32:20 EDT
Created attachment 167527 [details]
Fix v02

Min, this is very similar to your fix, but the call to clearBlocked is taken out of the waitEnd method. This way it runs even in the rule transfer case. What do you think? All four permutations now pass (yield/no yield and transfer/no transfer).
Comment 8 John Arthorne CLA 2010-05-07 14:33:13 EDT
Created attachment 167529 [details]
Tests all four unblocking cases
Comment 9 Min Idzelis CLA 2010-05-07 14:44:44 EDT
I agree with your changes. 

From a style standpoint, it is a little odd that waitStart() will report blocked, but waitEnd() will not. Perhaps we should move the reportBlocked call to outside of waitStart()? What do you think about that? 


I'll be opening another bug about the deadlock I mentioned before.
Comment 10 John Arthorne CLA 2010-05-07 15:35:57 EDT
Created attachment 167552 [details]
Fix v03

I agree with your style point, that was bugging me too but I was trying to keep the change simple. However this patch is equivalent to v02 but reportBlocked is moved into waitEnd again like your original patch. I use an extra parameter to indicate if the lock state should be updated. I like this because now waitStart/waitEnd is symmetric, and it avoids cluttering up the waitForRun method with additional setup and cleanup. The original bug stemmed from the asymmetry between waitStart and waitEnd so it's nice for this to be cleaner.
Comment 11 John Arthorne CLA 2010-05-07 16:09:46 EDT
We are in RC1 so another reviewer is needed. The following summary is for the benefit of the other reviewer. Note this bug is relatively simple but the surrounding code involved is incredibly complex. I suggest just focusing on the simple bug rather than trying to understand all of the unrelated code you pass through ;)

IProgressMonitorWithBlocking has two symmetric methods that are called when a thread is blocked or unblocked:

	public void setBlocked(IStatus reason);
	public void clearBlocked();

These methods are used by the UI to, for example, open a dialog saying the user operation is blocked. If we call setBlocked but never call clearBlocked, the dialog is never closed. This can lead to the UI being blocked forever and the user has to kill the process.

Within the job implementation we have two helper methods for managing this: JobManager#reportBlocked and JobManager#reportUnblocked. As with the above, calls to these methods must be symmetric or bad things happen.

In the method ThreadJob#joinRun we call reportBlocked (joinRun > waitForRun > waitStart), but we never call reportUnblocked(). In previous releases, there was only ever one caller of this method: ImplicitJobs#begin. That method called reportUnblocked within a finally block, so we didn't have any bug.

In 3.6, we introduced another place where we invoked ThreadJob#joinRun (JobManager#yieldRule). So for this caller, the reportUnblocked() method was never called, hence this bug report.

The fix is to ensure we always call reportUnblocked in all cases. For consistency this was put in waitEnd because reportBlocked is called from waitStart. Since waitEnd is called in a finally block, this ensures the calls are symmetric.

The attached test exercises the four main variants where joinRun is called. Two of these fail before the fix is applied, and all four pass once the fix is applied.
Comment 12 Thomas Watson CLA 2010-05-07 17:13:35 EDT
(In reply to comment #11)
> We are in RC1 so another reviewer is needed. The following summary is for the
> benefit of the other reviewer. Note this bug is relatively simple but the
> surrounding code involved is incredibly complex. I suggest just focusing on the
> simple bug rather than trying to understand all of the unrelated code you pass
> through ;)
> 

Thanks for the details John.  I did focus on the actual fix.  I reviewed the changes and confirmed that the test pass and we now ensure the method reportUnblocked(monitor) is called appropriately.

In my review I did notice some questionable reads and writes to the following fields (according to the javadoc at least):

org.eclipse.core.internal.jobs.ThreadJob.isBlocked
org.eclipse.core.internal.jobs.ThreadJob.acquireRule

Both document @GuardedBy("JobManager.implicitJobs") but there are quite a few reads and writes to these fields outside of a sync(implicitJobs) block.  I discussed this with John briefly.  We believe the javadoc has gone a bit overboard.  These fields should only be access by a single thread and there should not really be any need to synchronize in that case.

+1 to the fix and tests.

John, if you based the fix off of one of Min's attachements then I assume a +iplog flag is in order when you release?
Comment 13 John Arthorne CLA 2010-05-07 17:40:49 EDT
Thanks Tom. I have entered bug 312136 for the @GuardedBy documentation problem.
Comment 14 Min Idzelis CLA 2010-05-10 16:34:20 EDT
(In reply to comment #9)
> 
> I'll be opening another bug about the deadlock I mentioned before.

see bug 312334.

Great write up John (in comment #11), thanks for the review Tom!