| Summary: | JobManager.yieldRule() does not update lock graph correctly for jobs that manage rules | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] Platform | Reporter: | Carlin Rogers <carlin.rogers> | ||||||||||
| Component: | Runtime | Assignee: | John Arthorne <john.arthorne> | ||||||||||
| Status: | VERIFIED FIXED | QA Contact: | |||||||||||
| Severity: | major | ||||||||||||
| Priority: | P3 | CC: | Andrew.McCulloch, cameron.bateman, carlin.rogers, david_williams, jamesblackburn+eclipse, john.arthorne, min123, raghunathan.srinivasan | ||||||||||
| Version: | 3.6 | Flags: | john.arthorne:
review+
|
||||||||||
| Target Milestone: | 3.6.1 | ||||||||||||
| Hardware: | PC | ||||||||||||
| OS: | Windows XP | ||||||||||||
| Whiteboard: | ORACLE_P1 | ||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Carlin Rogers
Created attachment 172070 [details]
patch to yieldRule() to get rule from likeThreadJob
This issue is impacting performance on the adopter product. We would appreciate a speedy review and feedback. (In reply to comment #1) > Created an attachment (id=172070) [details] > patch to yieldRule() to get rule from likeThreadJob I've looked over the patch and it seems logical, except for I don't think it needs to be in an else block. There can in fact be jobs that have both implicit and explicit rules. So, I think that the patch should be changed to remove the lock thread for likeThreadJobs whenever they are have a non-null rule. (Add it to the if block after line 1341). Additionally, whenever a patch is made for the Job manager there should be a corresponding regression JUnit test in the org.eclipse.core.tests.runtime plugin (in the package org.eclipse.core.tests.runtime.jobs). Please attach a test case that will shows an error before applying the patch and no error after applying the patch. Created attachment 172175 [details]
updated patch with recommended changes
Thanks for reviewing this issue. I've attached a revised patch that includes the suggested changes.
Unfortunately, I'm trying to get out the door for a vacation so I will not get the JUnit test completed until a later date.
We are waiting for feedback on this bug. We would like to see this issue addressed in the first maintenance release of Helios. Is there a testcase for this? Created attachment 175993 [details]
Test to reproduce bug
The test runs two jobs, the first one of which manages a scheduling rule, calling getJobManager().beginRule() and getJobManager().endRule(). This first job will also call Job.yieldRule() in same manner as ModelManagerImpl, to simulate the scenario we see this issue occur. It will yield and the second job will run.
To reproduce the issue, run the test and put breakpoints in the DeadlockDetector at
- lockAcquired()
- lockReleased()
- lockReleasedCompletely()
Note that with the job that manages the scheduling rule, in the call to beginRule() there is a call to DeadlockDetector.lockAcquired(). Then in the call to yieldRule(), line 1343, the job.getRule() call is null so there's no call to getLockManager().removeLockThread() and in turn, DeadlockDetector.lockReleased(). This is the bug. The lock graph is now incorrect!
Then the second job runs (the one with the 'root' rule) with usual calls to DeadlockDetector.lockAcquired() and
DeadlockDetector.lockReleasedCompletely().lockAcquired().
Finally, after the second job completes, the first job continues and calls ThreadJob.joinRun() which will update the lock graph with a call to DeadlockDetector. This is now the second call to lockAcquired() without having called lockReleased() at all for this job. At the end, this job manages the call to endRule() which updates the lock graph with a call to DeadlockDetector.lockReleased() but this is the only call for this job.
The patch looks good to me. (In reply to comment #8) > The patch looks good to me. Clarifying: I was referring the code patch. The JUnit test is more like an example than a test. Doesn't do any verification. Still, I'd recommend going for it based on a code review of the actual patch. Created attachment 176938 [details]
Test case
It took awhile to isolate a test case with appropriate assertions. This test fails without the fix, and passes with the fix.
I have released the fix and test case to HEAD. I will look into backporting to 3.6.1 once we have a successful build with the fix. It turns out my JUnit wasn't valuable. Another unrelated job could be running during the test suites, which would interfere and cause my assertion to be false. We can't assume the lock graph will be entirely empty here. I have just disabled the assertion in the test, since the test is still useful for demonstrating and manually reproducing it. I have released this fix to the next 3.6.1 build. |