| Summary: | Deadlock in CacheAdapter in multithreaded environment | ||
|---|---|---|---|
| Product: | [Modeling] MDT.UML2 | Reporter: | Ralf Zozmann <r.zozmann> |
| Component: | Core | Assignee: | Christian Damus <give.a.damus> |
| Status: | VERIFIED FIXED | QA Contact: | |
| Severity: | major | ||
| Priority: | P2 | CC: | eclipse, Kenn.Hussey, nyssen |
| Version: | unspecified | Flags: | give.a.damus:
luna+
Kenn.Hussey: review+ |
| Target Milestone: | M6 | ||
| Hardware: | PC | ||
| OS: | Windows XP | ||
| Whiteboard: | |||
|
Description
Ralf Zozmann
It seems to be a difference between the following two implementations for unloading the resource set:
1. Deadlock
final ResourceSet resourceSet = editingDomain.getResourceSet();
for(final Iterator<Resource> it = resourceSet.getResources().iterator(); it.hasNext();)
{
final Resource tmpResource = it.next();
tmpResource.unload();
it.remove();
}// for
2. Currently no deadlock
final ResourceSet resourceSet = editingDomain.getResourceSet();
for(final Iterator<Resource> it = resourceSet.getResources().iterator(); it.hasNext();)
{
final Resource tmpResource = it.next();
tmpResource.unload();
}// for
resourceSet.getResources().clear();
Do the changes attached to bug 335125 address your issues? After some months of observing: With workaround from issue 335125 and other changes (in our parts) the number of deadlocks is reduced, but just present. Reducing the severity of this issue given that there is a (partial) workaround. See also bug 335135, which is about data structure corruption (and subsequent exceptions or infinite loops - but no deadlocks). That one was worked around by using thread-specific CacheAdapters. (In reply to comment #5) > See also bug 335135, which is about data structure corruption (and subsequent > exceptions or infinite loops - but no deadlocks). That one was worked around by > using thread-specific CacheAdapters. You are right, in most cases a infinite loop is the reason for our problems. The changes described in bug 335125 are now available in an integration build. I don't think there's muce more that can be done in UML2 itself to address thread safety issues. Please re-open this bug if you believe otherwise. Nice, thanks, Kenn! We very recently ran into the same issue when migrating one of our tools to use Sphinx for loading UML models. We identified that the direct cause for this issue (which very much explains the stack trace within comment #1) is actually not a deadlock, but a life-lock, which is caused by the fact the the hash entries within InternalCrossReferencer may get corrupted during simultaneous loading of a model (from 2 different threads) in a sense that they actually form a cycle. Thereby, self-adapt can never be left again after both threads have called it (both are in an endless loop). We could reproduce this when we started a runtime-workspace where we had the Eclipse UML2 editor left open for a big model when closing it before; while Sphinx now automatically loaded the model during startup as well), however, we were not able to debug it properly (we already tried to use a condition breakpoint that checks when the hash map entries become cyclic, but evaluation was so slow that we could not reproduce it during debugging. I am thus not able to tell what the root cause for this is, only that the mess seems to happen when the InverseCrossReferencer is filled, not afterwards. Maybe that helps... Alexander, did you try using thread-specific cache adapters? Rafael indicates in comment #5 that this may be a workaround for the problems you are seeing... (In reply to Kenn Hussey from comment #10) > Alexander, did you try using thread-specific cache adapters? Rafael > indicates in comment #5 that this may be a workaround for the problems you > are seeing... Yes, we also tried this but ran into the problem my colleague Patrick reported in comment #9 of bug #335125, so this is no solution for us either. It seems as if: - CacheAdapter (including its nested InverseCrossReferencer) is not thread-safe (reported here) - not all parts of the framework know how to deal with thread-local cache adapters properly (see bug #335125) It would be fine if we could solve at least one of those... I have pushed a new branch bugs/332057 with the following changes: Commit 143f8f0: A JUnit test suite, including two test methods that may be renamed by removing the leading underscore to run some performance statistics (see below). The principal test faithfully reproduces the live-lock on the circular hash-bucket corruption described in comment #9. Commit f59f382: A crude solution to this particular problem, introducing overrides in the CacheAdapter to synchronize critical sections that operate on the inverse cross referencer map. Performance Impact Analysis: Using the JUnit test suite, I collected some statistics on a 2x2 matrix of these dimensions: * original CacheAdapter vs "fixed" CacheAdapter (with additional synchronization) * parallel execution of a scenario on multiple threads vs execution of the same amount of work on a single thread The "work" performed in each scenario is loading and resolving all references in the UML metamodel in a discrete new resource set, then unloading everything in that resource set, 95 times. That is, 95 times in sequence on one thread or 5 times in sequence on each of 19 concurrent threads. In each cell of the 2x2 matrix, the test executed 11 runs of the scenario: once to warm up and then 10 more runs, on which the average running time was computed. The results are as follows: * single thread using original cache adapter: * mean: 29.75 seconds (std dev: 0.20 seconds) * parallel threads using original cache adapter (in the thread-local mode): * mean: 7.11 seconds (std dev: 0.32 seconds) * single thread using the "fixed" cache adapter: * mean: 29.51 seconds (std dev: 0.09 seconds) * parallel threads using the "fixed" cache adapter (singleton, not thread-local): * mean: 22.18 seconds (std dev: 2.57 seconds) So, as we can see, the performance of single-threaded applications is essentially the same with or without this fix. That is good, as it indicates that performance of most applications is not apparently regressed (I am assuming that most applications having multiple threads accessing models already have some external synchronization mechanisms in place that effectively serialize access to the models, anyways). However, the performance of the parallel test configuration is quite comparable to performance of the single-threaded configuration: parallelization provides some statistically significant gain (as indicated by the standard deviation) but the "fixed" CacheAdapter largely serializes its book-keeping. Note that I ran these tests on a 2012 Retina MacBook Pro with solid-state storage. Parallelization could conceivably make a bigger difference on other systems (especially with slower storage media). Finally, note that I write "fixed" in scare quotes because this, so far, addresses the specific use case described in the re-opened bug (comment #9) but there are probably other synchronization holes to be plugged that might show themselves in other use cases. Also, it may be worth investigating other approaches than this very simple introduction of new synchronized code blocks, which could provide more elegant/complete/performant solutions. Thanks, Christian, the code looks good. But when I ran the tests, I witness a number of memory errors: Exception in thread "CacheAdapterTest-1" java.lang.OutOfMemoryError: Java heap space Does the launch config for the tests need to be updated to launch with more heap space memory? (In reply to comment #13) > > Does the launch config for the tests need to be updated to launch with more heap > space memory? I've pushed a new commit 9787fb0 that adds heap sizing parameters to the launch configs and otherwise aligns them to actual run-time requirements. My local Hudson build completes normally on this commit, but it did before, too. Of course, most VMs default their max heap size according to the system memory available at the time they start up, so I wouldn't be surprised that the build at Eclipse had trouble. (although 19 concurrent instances of the UML metamodel and its dependencies shouldn't be *that* big ... right?) Thanks, Christian. The changes have been merged/pushed to the ‘master’ branch in git and will be available in an integration build soon. The fix is now available an integration build for UML2 5.0. |