Community
Participate
Working Groups
The UI-safe lock acquisition procedure for starting transactions can result in a transaction lock being acquired by an AcquireJob, implemented by one of the JobManager's pooled worker threads, and never released by that thread. The result is a "livelock" scenario (as opposed to deadlock) of a sort: because the worker thread is re-used for other jobs, including potentially AcquireJobs, this dangling lock may be passed around from thread to thread, finally ending up on some non-Jobmanager-worker thread, from which it will never be transferred away. In the mean-time, and afterwards, other threads will be starved for the lock as they attempt to start transactions. For a reproducible test case, see http://www.eclipse.org/newsportal/article.php?id=27735&group=eclipse.tools.emf#27735. The problem is, that the thread (often but not always the UI thread) that attempts to wait for an Acquirejob to get the transaction lock may fail to beginRule() because it already has some rule that doesn't include the AcquireRule. This fails with an IllegalArgumentException, which results in the thread that attempted to start a transaction getting an InterruptedException instead. Meanwhile, however, the AcquireJob continues to wait, eventually gets the lock, and then transfers it to the original thread which has already given up on it. Thus the dangling lock. The beginRule failing with an IllegalArgumentException is hard to avoid, because the Lock::uiSafeAcquire() checks up-front whether the current thread already has some scheduling rule and attempts another path in that case. However, there are numerous ways in which it can concurrently be given a scheduling rule while the uiSafeAcquire method is in progress.
Target for 1.1.2 release (Europa WM).
Created attachment 79981 [details] Proposed fix Attached a fix that aborts the AcquireJob or releases the lock if already transferred by it when beginRule() fails.
Created attachment 79982 [details] The real patch Attached wrong clipboard contents on first attempt.
Created attachment 79983 [details] Same patch without formatting diffs One last try: attaching an updated version of the previous patch that doesn't have formatting diffs to obscure the change.
Ed, could you have a look at the proposed patch? I would like to commit this bug for the Europa Winter Maintenance release owing to the severity of its effect, but it is a rather crucial bit of code.
Created attachment 80004 [details] JUnit test simulating the original application scenario Attached a JUnit test that simulates the original application scenario reported on the newsgroup. This test starts a separate thread and, while it is attempting to acquire a lock (depending on an accurate Thread.sleep timing implementation in the JVM) transfers the workspace root's scheduling rule to it so that the IJobManager::beginRule() call will fail.
My GMF editors (not just my prototype which I provided for debugging) previously showed an even higher probability to end in the described livelock. After applying the patch and playing araound with them I didn't get a single livelock. It seems that the bug is fixed. But if I get a livelock again I will report it here. Christian, thank you very much!
It looks good to me.
The fix is committed to CVS R1_1_maintenance (1.1.2 branch) and HEAD (1.2 branch).
Fixed in EMF TRANSACTION 1.1.2 M200710151235 build. Fixed in EMF TRANSACTION 1.2.0 I200710171604 build.
Move to verified as per bug 206558.
Fix available in R1_1_maintenance: 1.1.2 (R200712061336).