| Summary: | When resuming from yieldRule(), the monitor's blocked state is not cleared. | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] Platform | Reporter: | Min Idzelis <min123> | ||||||||||||||
| Component: | Runtime | Assignee: | John Arthorne <john.arthorne> | ||||||||||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||||||||||
| Severity: | normal | ||||||||||||||||
| Priority: | P3 | CC: | jsholl, tjwatson | ||||||||||||||
| Version: | 3.6 | Flags: | tjwatson:
review+
|
||||||||||||||
| Target Milestone: | 3.6 RC1 | ||||||||||||||||
| Hardware: | PC | ||||||||||||||||
| OS: | Windows XP | ||||||||||||||||
| Whiteboard: | |||||||||||||||||
| Attachments: |
|
||||||||||||||||
|
Description
Min Idzelis
Created attachment 167161 [details]
Test case
Created attachment 167164 [details]
Fix
I see. In ImplicitJobs#begin we clear this flag in the finally block, but it isn't done in the yield case. It concerns me that we don't call waitEnd in all cases, which means we won't be clearing the blocked flag in all cases on yield. From ThreadJob#waitForRun:
if (threadJob == result) {
waitEnd(threadJob, monitor);
Specifically we won't clear the blocked flag in "Condition #3", where the rule was transferred to the waiting thread.
Created attachment 167516 [details]
Additional test case
New test case involving yield and transfer. This one fails to unblock. I'm look into a fix.
Hey, good catch John! On a related note, I am also seeing a deadlock triggered by yieldRule() (unrelated to this monitor not being cleared). It looks like it might be related to endWait(). It does not not require transferRule() however. I'm still debugging (it's very time consuming) but the deadlock is caused because there are two ThreadJobs being created for the same Job. (Both ThreadJobs's have the exact same 'realJob'). This is clearly wrong, but I'm having a hard time tracking down why a second ThreadJob is being created. It might have something to do with one ThreadJob not being removed from the waiting JobQueue. I will be independently tracking this down. I think that possibly we are seeing two sides to the same problem. Keep me posted with anything you find, and I'll do the same. Thanks. Created attachment 167527 [details]
Fix v02
Min, this is very similar to your fix, but the call to clearBlocked is taken out of the waitEnd method. This way it runs even in the rule transfer case. What do you think? All four permutations now pass (yield/no yield and transfer/no transfer).
Created attachment 167529 [details]
Tests all four unblocking cases
I agree with your changes. From a style standpoint, it is a little odd that waitStart() will report blocked, but waitEnd() will not. Perhaps we should move the reportBlocked call to outside of waitStart()? What do you think about that? I'll be opening another bug about the deadlock I mentioned before. Created attachment 167552 [details]
Fix v03
I agree with your style point, that was bugging me too but I was trying to keep the change simple. However this patch is equivalent to v02 but reportBlocked is moved into waitEnd again like your original patch. I use an extra parameter to indicate if the lock state should be updated. I like this because now waitStart/waitEnd is symmetric, and it avoids cluttering up the waitForRun method with additional setup and cleanup. The original bug stemmed from the asymmetry between waitStart and waitEnd so it's nice for this to be cleaner.
We are in RC1 so another reviewer is needed. The following summary is for the benefit of the other reviewer. Note this bug is relatively simple but the surrounding code involved is incredibly complex. I suggest just focusing on the simple bug rather than trying to understand all of the unrelated code you pass through ;) IProgressMonitorWithBlocking has two symmetric methods that are called when a thread is blocked or unblocked: public void setBlocked(IStatus reason); public void clearBlocked(); These methods are used by the UI to, for example, open a dialog saying the user operation is blocked. If we call setBlocked but never call clearBlocked, the dialog is never closed. This can lead to the UI being blocked forever and the user has to kill the process. Within the job implementation we have two helper methods for managing this: JobManager#reportBlocked and JobManager#reportUnblocked. As with the above, calls to these methods must be symmetric or bad things happen. In the method ThreadJob#joinRun we call reportBlocked (joinRun > waitForRun > waitStart), but we never call reportUnblocked(). In previous releases, there was only ever one caller of this method: ImplicitJobs#begin. That method called reportUnblocked within a finally block, so we didn't have any bug. In 3.6, we introduced another place where we invoked ThreadJob#joinRun (JobManager#yieldRule). So for this caller, the reportUnblocked() method was never called, hence this bug report. The fix is to ensure we always call reportUnblocked in all cases. For consistency this was put in waitEnd because reportBlocked is called from waitStart. Since waitEnd is called in a finally block, this ensures the calls are symmetric. The attached test exercises the four main variants where joinRun is called. Two of these fail before the fix is applied, and all four pass once the fix is applied. (In reply to comment #11) > We are in RC1 so another reviewer is needed. The following summary is for the > benefit of the other reviewer. Note this bug is relatively simple but the > surrounding code involved is incredibly complex. I suggest just focusing on the > simple bug rather than trying to understand all of the unrelated code you pass > through ;) > Thanks for the details John. I did focus on the actual fix. I reviewed the changes and confirmed that the test pass and we now ensure the method reportUnblocked(monitor) is called appropriately. In my review I did notice some questionable reads and writes to the following fields (according to the javadoc at least): org.eclipse.core.internal.jobs.ThreadJob.isBlocked org.eclipse.core.internal.jobs.ThreadJob.acquireRule Both document @GuardedBy("JobManager.implicitJobs") but there are quite a few reads and writes to these fields outside of a sync(implicitJobs) block. I discussed this with John briefly. We believe the javadoc has gone a bit overboard. These fields should only be access by a single thread and there should not really be any need to synchronize in that case. +1 to the fix and tests. John, if you based the fix off of one of Min's attachements then I assume a +iplog flag is in order when you release? Thanks Tom. I have entered bug 312136 for the @GuardedBy documentation problem. (In reply to comment #9) > > I'll be opening another bug about the deadlock I mentioned before. see bug 312334. Great write up John (in comment #11), thanks for the review Tom! |