Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 316839 - JobManager.yieldRule() does not update lock graph correctly for jobs that manage rules
Summary: JobManager.yieldRule() does not update lock graph correctly for jobs that man...
Status: VERIFIED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Runtime (show other bugs)
Version: 3.6   Edit
Hardware: PC Windows XP
: P3 major (vote)
Target Milestone: 3.6.1   Edit
Assignee: John Arthorne CLA
QA Contact:
URL:
Whiteboard: ORACLE_P1
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-14 17:58 EDT by Carlin Rogers CLA
Modified: 2011-10-28 15:45 EDT (History)
8 users (show)

See Also:
john.arthorne: review+


Attachments
patch to yieldRule() to get rule from likeThreadJob (1022 bytes, patch)
2010-06-16 16:02 EDT, Carlin Rogers CLA
no flags Details | Diff
updated patch with recommended changes (1.05 KB, patch)
2010-06-17 19:01 EDT, Carlin Rogers CLA
no flags Details | Diff
Test to reproduce bug (7.82 KB, application/octet-stream)
2010-08-05 20:15 EDT, Carlin Rogers CLA
no flags Details
Test case (5.24 KB, patch)
2010-08-18 16:14 EDT, John Arthorne CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Carlin Rogers CLA 2010-06-14 17:58:01 EDT
We're seeing the following Assert / IllegalArgumentException in our adopter project. It's followed by the output of LockManager.handleException()

java.lang.IllegalArgumentException: A thread with no real locks was chosen to resolve deadlock.
	at org.eclipse.core.runtime.Assert.isLegal(Assert.java:63)
	at org.eclipse.core.internal.jobs.DeadlockDetector.realLocksForThread(DeadlockDetector.java:495)
	at org.eclipse.core.internal.jobs.DeadlockDetector.lockWaitStart(DeadlockDetector.java:400)
	at org.eclipse.core.internal.jobs.LockManager.addLockWaitThread(LockManager.java:158)
	at org.eclipse.core.internal.jobs.ThreadJob.waitStart(ThreadJob.java:429)
	at org.eclipse.core.internal.jobs.ThreadJob.waitForRun(ThreadJob.java:212)
	at org.eclipse.core.internal.jobs.ThreadJob.joinRun(ThreadJob.java:199)
	at org.eclipse.core.internal.jobs.ImplicitJobs.begin(ImplicitJobs.java:92)
	at org.eclipse.core.internal.jobs.JobManager.beginRule(JobManager.java:285)
	at org.eclipse.core.internal.utils.StringPoolJob.run(StringPoolJob.java:99)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)

!ENTRY org.eclipse.core.jobs 4 2 2010-06-10 12:33:10.484
!MESSAGE LockManager.handleException
!STACK 0
java.lang.Exception:  :: 
 R/, P/testSimpleHeaderFooterIncludes_JspIncludeProject_, P/testCreationBoundaryCases_JspIncludeProject_, OrderedLock (4999), OrderedLock (5000), OrderedLock (0),
 Worker-26 :  -1, 6, 6, 0, 0, 0,
 Worker-10 :  3, -1, 3, 0, 0, 0,
 Thread-7 :  1, 1, 1, 0, 0, -1,
 main :  -1, 0, 0, 0, 0, 0,
 Java indexing :  0, 0, 0, 1, 1, 0,
 Worker-9 :  1, 1, -1, 0, 0, 0,
 Worker-27 :  0, 0, 0, 0, 0, -1,
 Worker-21 :  0, 0, 0, 0, 0, -1,
 Worker-22 :  0, 0, 0, 0, 0, -1,
-------

	at org.eclipse.core.internal.jobs.LockManager.handleInternalError(LockManager.java:204)
	at org.eclipse.core.internal.jobs.LockManager.addLockWaitThread(LockManager.java:176)
	at org.eclipse.core.internal.jobs.ThreadJob.waitStart(ThreadJob.java:429)
	at org.eclipse.core.internal.jobs.ThreadJob.waitForRun(ThreadJob.java:212)
	at org.eclipse.core.internal.jobs.ThreadJob.joinRun(ThreadJob.java:199)
	at org.eclipse.core.internal.jobs.ImplicitJobs.begin(ImplicitJobs.java:92)
	at org.eclipse.core.internal.jobs.JobManager.beginRule(JobManager.java:285)
	at org.eclipse.core.internal.utils.StringPoolJob.run(StringPoolJob.java:99)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)

Notice some of the entries in the graph are greater than one. The issue occurs because the JobManager.yieldRule() does not seem to update the lock manager's graph correctly for jobs that manage rules themselves. We have a class that extends org.eclipse.core.runtime.jobs.Job, overriding run() and makes the required calls to the JobManager beginRule() and endRule() as required but without a call to setRule(). The getRule() method is final on Job so we have no override of it.

With the integration of Helios and the new support to yield, there's now a call to getRule() that impacts the desired behavior of JobManager.yieldRule(). After getting the corresponding ThreadJob, likeThreadJob, from the current Job, there's a check that (job.getRule() != null)  on/near line 1343. In our case it will return null.

However, if there was an else clause you could check the rule of the actual threadJon, likeThreadJob. The else clause would check that (likeThreadJob.getRule() != null), then call the LockManager..removeLockThread(currentThread, likeThreadJob.getRule())
Comment 1 Carlin Rogers CLA 2010-06-16 16:02:20 EDT
Created attachment 172070 [details]
patch to yieldRule() to get rule from likeThreadJob
Comment 2 Raghunathan Srinivasan CLA 2010-06-16 19:44:29 EDT
This issue is impacting performance on the adopter product. We would appreciate a speedy review and feedback.
Comment 3 Min Idzelis CLA 2010-06-17 11:27:27 EDT
(In reply to comment #1)
> Created an attachment (id=172070) [details]
> patch to yieldRule() to get rule from likeThreadJob

I've looked over the patch and it seems logical, except for I don't think it needs to be in an else block. 

There can in fact be jobs that have both implicit and explicit rules. So, I think that the patch should be changed to remove the lock thread for likeThreadJobs whenever they are have a non-null rule. (Add it to the if block after line 1341). 

Additionally, whenever a patch is made for the Job manager there should be a corresponding regression JUnit test in the org.eclipse.core.tests.runtime plugin (in the package org.eclipse.core.tests.runtime.jobs). Please attach a test case that will shows an error before applying the patch and no error after applying the patch.
Comment 4 Carlin Rogers CLA 2010-06-17 19:01:06 EDT
Created attachment 172175 [details]
updated patch with recommended changes

Thanks for reviewing this issue. I've attached a revised patch that includes the suggested changes.

Unfortunately, I'm trying to get out the door for a vacation so I will not get the JUnit test completed until a later date.
Comment 5 Raghunathan Srinivasan CLA 2010-08-04 14:12:11 EDT
We are waiting for feedback on this bug. We would like to see this issue addressed in the first maintenance release of Helios.
Comment 6 James Blackburn CLA 2010-08-04 15:25:17 EDT
Is there a testcase for this?
Comment 7 Carlin Rogers CLA 2010-08-05 20:15:31 EDT
Created attachment 175993 [details]
Test to reproduce bug

The test runs two jobs, the first one of which manages a scheduling rule, calling getJobManager().beginRule() and getJobManager().endRule(). This first job will also call Job.yieldRule() in same manner as ModelManagerImpl, to simulate the scenario we see this issue occur. It will yield and the second job will run.

To reproduce the issue, run the test and put breakpoints in the DeadlockDetector at
- lockAcquired()
- lockReleased()
- lockReleasedCompletely()

Note that with the job that manages the scheduling rule, in the call to beginRule() there is a call to DeadlockDetector.lockAcquired(). Then in the call to yieldRule(), line 1343, the job.getRule() call is null so there's no call to getLockManager().removeLockThread() and in turn, DeadlockDetector.lockReleased(). This is the bug. The lock graph is now incorrect!

Then the second job runs (the one with the 'root' rule) with usual calls to DeadlockDetector.lockAcquired() and
DeadlockDetector.lockReleasedCompletely().lockAcquired().

Finally, after the second job completes, the first job continues and calls ThreadJob.joinRun() which will update the lock graph with a call to DeadlockDetector. This is now the second call to lockAcquired() without having called lockReleased() at all for this job. At the end, this job  manages the call to endRule() which updates the lock graph with a call to DeadlockDetector.lockReleased() but this is the only call for this job.
Comment 8 Min Idzelis CLA 2010-08-16 16:42:17 EDT
The patch looks good to me.
Comment 9 Min Idzelis CLA 2010-08-16 16:44:42 EDT
(In reply to comment #8)
> The patch looks good to me.

Clarifying: I was referring the code patch. The JUnit test is more like an example than a test. Doesn't do any verification. 

Still, I'd recommend going for it based on a code review of the actual patch.
Comment 10 John Arthorne CLA 2010-08-18 16:14:49 EDT
Created attachment 176938 [details]
Test case

It took awhile to isolate a test case with appropriate assertions. This test fails without the fix, and passes with the fix.
Comment 11 John Arthorne CLA 2010-08-18 16:16:27 EDT
I have released the fix and test case to HEAD. I will look into backporting to 3.6.1 once we have a successful build with the fix.
Comment 12 John Arthorne CLA 2010-08-20 15:12:30 EDT
It turns out my JUnit wasn't valuable. Another unrelated job could be running during the test suites, which would interfere and cause my assertion to be false. We can't assume the lock graph will be entirely empty here. I have just disabled the assertion in the test, since the test is still useful for demonstrating and manually reproducing it.
Comment 13 John Arthorne CLA 2010-08-24 16:15:08 EDT
I have released this fix to the next 3.6.1 build.