Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 205857 - Transaction lock can dangle on Jobmanager worker thread
Summary: Transaction lock can dangle on Jobmanager worker thread
Status: VERIFIED FIXED
Alias: None
Product: EMF Services
Classification: Modeling
Component: Transaction (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P1 major
Target Milestone: ---   Edit
Assignee: Christian Damus CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-09 15:13 EDT by Christian Damus CLA
Modified: 2017-02-24 15:11 EST (History)
2 users (show)

See Also:
Ed.Merks: review+


Attachments
Proposed fix (84 bytes, patch)
2007-10-09 15:23 EDT, Christian Damus CLA
no flags Details | Diff
The real patch (20.61 KB, patch)
2007-10-09 15:27 EDT, Christian Damus CLA
no flags Details | Diff
Same patch without formatting diffs (3.62 KB, patch)
2007-10-09 15:39 EDT, Christian Damus CLA
give.a.damus: review?
Details | Diff
JUnit test simulating the original application scenario (4.08 KB, patch)
2007-10-09 17:23 EDT, Christian Damus CLA
ahunter.eclipse: iplog+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Damus CLA 2007-10-09 15:13:17 EDT
The UI-safe lock acquisition procedure for starting transactions can result in a transaction lock being acquired by an AcquireJob, implemented by one of the JobManager's pooled worker threads, and never released by that thread.  The result is a "livelock" scenario (as opposed to deadlock) of a sort:  because the worker thread is re-used for other jobs, including potentially AcquireJobs, this dangling lock may be passed around from thread to thread, finally ending up on some non-Jobmanager-worker thread, from which it will never be transferred away.  In the mean-time, and afterwards, other threads will be starved for the lock as they attempt to start transactions.

For a reproducible test case, see http://www.eclipse.org/newsportal/article.php?id=27735&group=eclipse.tools.emf#27735.

The problem is, that the thread (often but not always the UI thread) that attempts to wait for an Acquirejob to get the transaction lock may fail to beginRule() because it already has some rule that doesn't include the AcquireRule.  This fails with an IllegalArgumentException, which results in the thread that attempted to start a transaction getting an InterruptedException instead.  Meanwhile, however, the AcquireJob continues to wait, eventually gets the lock, and then transfers it to the original thread which has already given up on it.  Thus the dangling lock.

The beginRule failing with an IllegalArgumentException is hard to avoid, because the Lock::uiSafeAcquire() checks up-front whether the current thread already has some scheduling rule and attempts another path in that case.  However, there are numerous ways in which it can concurrently be given a scheduling rule while the uiSafeAcquire method is in progress.
Comment 1 Christian Damus CLA 2007-10-09 15:21:56 EDT
Target for 1.1.2 release (Europa WM).
Comment 2 Christian Damus CLA 2007-10-09 15:23:30 EDT
Created attachment 79981 [details]
Proposed fix

Attached a fix that aborts the AcquireJob or releases the lock if already transferred by it when beginRule() fails.
Comment 3 Christian Damus CLA 2007-10-09 15:27:49 EDT
Created attachment 79982 [details]
The real patch

Attached wrong clipboard contents on first attempt.
Comment 4 Christian Damus CLA 2007-10-09 15:39:10 EDT
Created attachment 79983 [details]
Same patch without formatting diffs

One last try:  attaching an updated version of the previous patch that doesn't have formatting diffs to obscure the change.
Comment 5 Christian Damus CLA 2007-10-09 15:42:11 EDT
Ed, could you have a look at the proposed patch?  I would like to commit this bug for the Europa Winter Maintenance release owing to the severity of its effect, but it is a rather crucial bit of code.
Comment 6 Christian Damus CLA 2007-10-09 17:23:41 EDT
Created attachment 80004 [details]
JUnit test simulating the original application scenario

Attached a JUnit test that simulates the original application scenario reported on the newsgroup.  This test starts a separate thread and, while it is attempting to acquire a lock (depending on an accurate Thread.sleep timing implementation in the JVM) transfers the workspace root's scheduling rule to it so that the IJobManager::beginRule() call will fail.
Comment 7 Nicolai Kamenzky CLA 2007-10-09 17:38:19 EDT
My GMF editors (not just my prototype which I provided for debugging) previously showed an even higher probability to end in the described livelock. After applying the patch and playing araound with them I didn't get a single livelock. It seems that the bug is fixed. But if I get a livelock again I will report it here.

Christian, thank you very much!
Comment 8 Ed Merks CLA 2007-10-15 10:36:00 EDT
It looks good to me.
Comment 9 Christian Damus CLA 2007-10-15 12:31:37 EDT
The fix is committed to CVS R1_1_maintenance (1.1.2 branch) and HEAD (1.2 branch).
Comment 10 Christian Damus CLA 2007-10-17 16:39:51 EDT
Fixed in EMF TRANSACTION 1.1.2 M200710151235 build.
Fixed in EMF TRANSACTION 1.2.0 I200710171604 build.
Comment 11 Nick Boldt CLA 2008-01-28 16:35:53 EST
Move to verified as per bug 206558.
Comment 12 Christian Damus CLA 2008-02-28 11:58:34 EST
Fix available in R1_1_maintenance: 1.1.2 (R200712061336).