Community
Participate
Working Groups
The cache range unit tests experience intermittent deadlocks. I can reproduce them 100% of the time by adding a Thread.sleep(100) at the beginning of RequestCache.canceled(). I was hopeful at first; it seemed a simple adjustment to RequestCache.canceled() would fix the problem--I made it unsynchronized and tweaked the implementation to: RequestMonitor rm = fRm; if (rm != null) { rm.cancel(); } That fixed that particular deadlock (cache_deadlock.gif), but a second, similar one was close behind (cache_deadlock2.gif). Both are the result of threads locking the RM and cache object in opposing order.
Created attachment 182425 [details] stack crawls showing deadlock
Created attachment 182426 [details] stack crawls showing deadlock (second case)
Created attachment 182681 [details] A different fix. At first I thought of the change in RequestMonitor.canceled() also. Another option is to force handleCanceledRm() to always be called on the DSF executor thread... as in this patch. There is still a problem though. I think canceled() should be called from isCanceled() when a cancellation is detected, but I think I couldn't find a way to do that safely. The cancellation handling is particularly difficult because it needs to be thread safe (since RequestMonitor.isCanceled()) is thread safe, but the rest of the cache logic is synchronized using the executor thread.
(In reply to comment #3) > The cancellation handling is particularly difficult because > it needs to be thread safe (since RequestMonitor.isCanceled()) is thread safe, > but the rest of the cache logic is synchronized using the executor thread. Indeed. I couldn't think of a good solution, either. But since we're talking about changing the cancel behavior, maybe we should first sort that out and then see if this is still an issue.
Created attachment 182862 [details] Additional fix in RM. The previous fix was committed with bug 329481. This fix addresses another issue that contributed to this deadlock: The whole RequestMonitor.isCancel() method is synchronized and it calls the parent RM's isCanceled() method while holding the lock. This is unnecessary and in this case it contributed to a deadlock.
(In reply to comment #3) > There is still a problem though. I think canceled() should be called from > isCanceled() when a cancellation is detected, but I think I couldn't find a way > to do that safely. The cancellation handling is particularly difficult because > it needs to be thread safe (since RequestMonitor.isCanceled()) is thread safe, > but the rest of the cache logic is synchronized using the executor thread. I still have to think about this one a little more.
Created attachment 182925 [details] Patch to call cancelled(). (In reply to comment #3) > There is still a problem though. I think canceled() should be called from > isCanceled() when a cancellation is detected, but I think I couldn't find a way > to do that safely. The cancellation handling is particularly difficult because > it needs to be thread safe (since RequestMonitor.isCanceled()) is thread safe, > but the rest of the cache logic is synchronized using the executor thread. This patch extends on the last one and changes the logic in isCanceled() so that it does not modify the list of waiting requests. Instead it uses a separate dispatch call to handle the canceled RMs and to call AbstractCache.canceled().
I committed the fixes, John please review.
*** cdt cvs genie on behalf of ppiech *** Bug 329488 - DSF ACPM deadlocks [*] RequestMonitor.java 1.13 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.dsf/src/org/eclipse/cdt/dsf/concurrent/RequestMonitor.java?root=Tools_Project&r1=1.12&r2=1.13 [*] AbstractCache.java 1.7 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.dsf/src/org/eclipse/cdt/dsf/concurrent/AbstractCache.java?root=Tools_Project&r1=1.6&r2=1.7 [*] CacheTests.java 1.7 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.tests.dsf/src/org/eclipse/cdt/tests/dsf/concurrent/CacheTests.java?root=Tools_Project&r1=1.6&r2=1.7
1. Very minor. We can avoid the local variable in RequestMonitor.isCanceled() with synchronized { if (fCanceled) { return true; } } 2. This is not new behavior, but it seems to me AbstractCache.isCanceled() returning 'true' when fWaitingList is null is either incorrect or the method is poorly named. fWaitingList is nulled after a call to set(). Someone calling isCanceled() after that would get an incorrect response--at least a misleading one. If the current behavior is intended, then renaming the method to isBeingWaitedOn() would reduce confusion, IMO. 3. Asynchronously calling handleCanceledRm() opens the door for isCanceled() being called again and re-requesting the removal of the same set of canceled RMs. It appears to be harmless, but it is confusing. There should be a comment acknowledging that possibility. 4. The new local TestCache() instance in CacheTests.cancelWhilePendingWithoutClientNotificationTest() should probably call super.canceled() for thoroughness.