Community
Participate
Working Groups
Hi, In our application, we sometimes encounter deadlocks if we commit while the session is invalidated. Indeed: - During the commit, the transaction is locked and waits for the SessionInvalidator in session.startLocalCommit() - During the session invalidation, the SessionInvalidator is locked and waits for the transaction in the updatePermissions method when comparing it with the head view. The deadlock can be avoided by adding the following code in the doAfterActivate method in CDOSessionImpl, as done in CDOViewImpl: ExecutorService executorService = getExecutorService(); invalidator.setDelegate(executorService); It allows to not keep the lock on the SessionInvalidator, but is it the best solution?
Thank you very much for the analysis. I'm on it...
(In reply to comment #0) > It allows to not keep the lock on the SessionInvalidator, but is it the best > solution? No, I don't think it's good to change the threading model. In particular, the session invalidator is designed to execute the invalidations synchronously. It's hard to anticipate what other problems result if that is changed. I rather think about changing the CDOTransactionImpl.commitSynced() such that the session.startLocalCommit() call happens before the view lock is taken. Something like: private CDOCommitInfo commitSynced() { synchronized (session) { synchronized (getViewMonitor()) { commitToken = (CommitToken)session.startLocalCommit(); } } synchronized (getViewMonitor()) { transactionStrategy.commit(); ... } } So this means fixing the deadlock through lock ordering. But first I need to analyze more carefully what it means to take the view lock twice. And write a test case that reproduces the problem...
It turns out that the session.startLocalCommit() call doesn't need the view lock because it already takes the session invalidator lock and that one is broader, anyway. That makes the fix really simple. The test case to reproduce the specific timing was much harder. To avoid the configuration of a security manager for the updatePermissions() call it instruments the getViews() call that comes a bit earlier. The changes are committed: https://git.eclipse.org/c/cdo/cdo.git/commit/?id=4c5cc26ab9266fcd3b0fcbb3d620b162f5234b1d Thank you again for providing the steps to reproduce!
Thank very much you for the correction, I appreciate your help! But I can still have the deadlock on the session.endLocalCommit() this time. By the way, I have another question about the javadoc from the transaction.commit() method. The sample code recommends to lock the transaction before changing the model and commit, but in this case the deadlock on session.startLocalCommit() comes back. So, is it better to avoid locking the transaction?
Unfortunately these are both very good points! While the issue with the endLocalCommit() call could be fixed easily the fact that the user could (often should, in fact) apply highest-level synchronization on the view is the killer for the lock-ordering approach ;-( Reopening to investigate what of updatePermissions() exactly needs to execute against the views and how much of it can be moved into the view invalidation process (which then happens asynchronously)...
I found a new solution that seems to work well. The updatePermissions() method did two things: a) Assert that neither auditing nor branching is used together with a security manager. b) Request new permissions for revisions that are used by "viewed" objects. The check a) is now moved into the asynchronous view invalidation. That is a bit late (i.e., session state is already adjusted when the assertion barks), but since the underlying reason is really a server configuration problem this seems okay. Then I've changed b) to request new permissions for all current revisions of the entire session. This avoids taking any locks on the views. And I think it's even a bit more correct because now even revisions that are not currently viewed, but could become viewed later, will receive the new permissions. In theory it may take a bit longer because more revisions can be updated, but this price is only paid by secured deployments. All tests, including the new deadlock test, are green. I'm looking forward to hear what your eagle eyes find now :P The changes are committed: https://git.eclipse.org/c/cdo/cdo.git/commit/?id=7d10d7c9d86eb2cad9ed5ac6df60bcb7a5f71b39
Unfortunately, I still have a deadlock :P The session invalidator attempts to lock the transaction in the RemoteInvalidationEventQueue. It’s at the same moment as mentioned by Robert Schulk on the forum: https://www.eclipse.org/forums/index.php?t=rview&goto=1821881#msg_1821881 Furthermore, I encountered another error after your last commit, the updatePermissions() method attempts to update detached objects. The error no longer occurs if I remove all revisions that are an instance of StubCDORevision before.
(In reply to comment #7) > Unfortunately, I still have a deadlock :P > The session invalidator attempts to lock the transaction in the > RemoteInvalidationEventQueue. It’s at the same moment as mentioned by Robert > Schulk on the forum: > https://www.eclipse.org/forums/index.php?t=rview&goto=1821881#msg_1821881 Ok, I get the impression that that use case is entirely different from the one I test. The new stack trace suggests that the two invalidations are caused by two local commits. I'll have to analyze that more carefully when I return from my current travel on Tuesday. Here's the new stack trace for reference: "pool-1-thread-1" Id=92 BLOCKED on org.eclipse.emf.internal.cdo.session.CDOSessionImpl$SessionInvalidator@771b23 owned by "Thread-26" Id=164 org.eclipse.emf.internal.cdo.session.CDOSessionImpl$SessionInvalidator.reorderInvalidations(CDOSessionImpl.java:2001) org.eclipse.emf.internal.cdo.session.CDOSessionImpl.invalidate(CDOSessionImpl.java:1193) org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl$CDOCommitContextImpl.postCommit(CDOTransactionImpl.java:4800) org.eclipse.emf.internal.cdo.transaction.CDOSingleTransactionStrategyImpl.commit(CDOSingleTransactionStrategyImpl.java:66) org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commitSynced(CDOTransactionImpl.java:1670) org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commit(CDOTransactionImpl.java:1623) org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commit(CDOTransactionImpl.java:1589) <<<<<<my code calls transaction.commit here>>>>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) "Thread-26" Id=164 BLOCKED on org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl@283ab206 owned by "pool-1-thread-1" Id=92 org.eclipse.emf.internal.cdo.view.AbstractCDOView.getBranch(AbstractCDOView.java:926) org.eclipse.emf.spi.cdo.AbstractChangeSetsConflictResolver$RemoteInvalidationEventQueue.handleEvent(AbstractChangeSetsConflictResolver.java:273) org.eclipse.emf.spi.cdo.CDOSessionInvalidationEventQueue$1.notifyEvent(CDOSessionInvalidationEventQueue.java:48) org.eclipse.net4j.util.event.Notifier.fireEventSafe(Notifier.java:172) org.eclipse.net4j.util.event.Notifier.fireEvent(Notifier.java:118) org.eclipse.net4j.util.event.Notifier.fireEvent(Notifier.java:93) org.eclipse.net4j.util.container.Container.fireEvent(Container.java:63) org.eclipse.emf.internal.cdo.session.CDOSessionImpl$SessionInvalidation.doRun(CDOSessionImpl.java:2120) org.eclipse.net4j.util.concurrent.RunnableWithName.run(RunnableWithName.java:29) org.eclipse.net4j.util.concurrent.SerializingExecutor.run(SerializingExecutor.java:82) org.eclipse.net4j.util.concurrent.SynchronousExecutor.execute(SynchronousExecutor.java:32) org.eclipse.net4j.util.concurrent.SerializingExecutor.schedule(SerializingExecutor.java:131) org.eclipse.net4j.util.concurrent.SerializingExecutor.execute(SerializingExecutor.java:61) org.eclipse.emf.internal.cdo.session.CDOSessionImpl$SessionInvalidator.scheduleInvalidations(CDOSessionImpl.java:2020) org.eclipse.emf.internal.cdo.session.CDOSessionImpl$SessionInvalidator.reorderInvalidations(CDOSessionImpl.java:2012) org.eclipse.emf.internal.cdo.session.CDOSessionImpl.invalidate(CDOSessionImpl.java:1193) org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl$CDOCommitContextImpl.postCommit(CDOTransactionImpl.java:4800) org.eclipse.emf.internal.cdo.transaction.CDOSingleTransactionStrategyImpl.commit(CDOSingleTransactionStrategyImpl.java:66) org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commitSynced(CDOTransactionImpl.java:1670) org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commit(CDOTransactionImpl.java:1623) org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commit(CDOTransactionImpl.java:1589) <<<<<<my code calls transaction.commit here>>>>>>>>>> java.lang.Thread.run(Thread.java:748) > Furthermore, I encountered another error after your last commit, the > updatePermissions() method attempts to update detached objects. The error no > longer occurs if I remove all revisions that are an instance of StubCDORevision > before. I don't understand this. The new updatePermissions() method does not deal with objects or views anymore. Nevertheless I've tried to modify the test case such that it deletes objects here or there. Still no failures. Maybe it's related to the fact that your entire scenario is different from the test case. I've committed the new test case: https://git.eclipse.org/c/cdo/cdo.git/commit/?id=0927747b99bcd0de1ea42369dd98fe527bbd795d Are you using auditing or branching?
From the stack trace it's clear now that this deadlock is totally different. It's definitely not related to updatePermissions(), but rather caused by AbstractChangeSetsConflictResolver, probably its subclass CDOMergingConflictResolver. AbstractChangeSetsConflictResolver mostly operates on view/transaction level but also registers, through RemoteInvalidationEventQueue, a session event listener. This listener is notified while the session's invalidator lock is held and that causes the deadlock. Next week I'll probably try to move the session listener notification out of the invalidator's synchronized scope...
It can only help if I describe our uses cases. We use a CDOResourceFolder as a project and create Role for each project to give read/write permissions to it. Each time a project is created, renamed or deleted, we create, update or delete roles on the server. The deadlocks appear mainly during these operations and when it’s done on a remote session. We create a project on a remote session, the system session is invalidated and we react to the invalidation event to update the roles. This update invalidates the remote session next. And if we create projects quickly the deadlock appears. The problem with the updatePermissions() method which try to update detached objects appears only after the roles update when deleting a project.
> Are you using auditing or branching? Sorry, I forgot this question. Yes, we are using auditing and branching. In fact we create the repository with branching but don’t use new branches as we know it’s incompatible with the security. About the auditing, we use it to get a history of what has been committed and allow opening a view in an old timestamp but in read-only mode and we made sure to close it before the call of updatePermissions() to avoid the exception. In the use cases I mentioned before, no view with old timestamp was open.
Regarding the deadlock-stacktrace that Eicke posted: this came from my post in the EMF forums. We are not using auditing/branches in our repository. Yes, we are indeed using a CDOMergingConflictResolver ;).
(In reply to comment #11) > > Are you using auditing or branching? Ok, I just wanted to make sure that you're using security within the constraints that are imposed by the design. I don't think it has a relation to this problem.
(In reply to comment #10) > It can only help if I describe our uses cases. > [...] A description is better than nothing, but a test case that helps to reproduce your exact problem would be even greater ;-) Do you think you can come up with one? > The problem with the updatePermissions() method which try to update detached > objects appears only after the roles update when deleting a project. In comment #8 I already mentioned that "The new updatePermissions() method does not deal with objects or views anymore.". I still don't understand how you mean this.
(In reply to Eike Stepper from comment #14) > A description is better than nothing, but a test case that helps to > reproduce your exact problem would be even greater ;-) > Do you think you can come up with one? I will try to do it quickly > In comment #8 I already mentioned that "The new updatePermissions() method > does not deal with objects or views anymore.". I still don't understand how > you mean this. I mean during the call of revisionManager.getCache().forEachCurrentRevision(), some revisions are DetachedCDORevision
(In reply to Vincent Sennedot from comment #15) > I mean during the call of > revisionManager.getCache().forEachCurrentRevision(), some revisions are > DetachedCDORevision Ok, Tomorrow I'll try to reproduce that ;-)
(In reply to comment #16) > (In reply to Vincent Sennedot from comment #15) > > I mean during the call of > > revisionManager.getCache().forEachCurrentRevision(), some revisions are > > DetachedCDORevision I could reproduce it and am about to commit the fix: revisionManager.getCache().forEachCurrentRevision(r -> { if (!(r instanceof SyntheticCDORevision)) { revisions.add((InternalCDORevision)r); } });
The changes are committed: https://git.eclipse.org/c/cdo/cdo.git/commit/?id=446530253edac54411077148347823de57c12c4d I'll kick another M3 milestone build...
Thank you very much again! I didn't managed to create a test case for it as I hadn't really identified what was leading to the creation of DetachedCDORevision. For the remaining deadlock, adding "transactionUnderTest.options().addConflictResolver(new CDOMergingConflictResolver());" to the test case is enough to have it. And I think it's my last problem for now :)
(In reply to comment #19) > Thank you very much again! You're welcome. > I didn't managed to create a test case for it as I > hadn't really identified what was leading to the creation of > DetachedCDORevision. Indeed that's not so easy to know. A hint for the future: Set a breakpoint in DetachedCDORevision's constructor and execute the "CDO AllTests" test suite under debug control. In this case it was even better to set a conditional breakpoint in AbstractCDORevisionCache.addRevision(CDORevision) ;-) > For the remaining deadlock, adding > "transactionUnderTest.options().addConflictResolver(new > CDOMergingConflictResolver());" to the test case is enough to have it. And I > think it's my last problem for now :) Yes, that nicely shows the deadlock. And I'm about to commit a fix...
The changes are committed: https://git.eclipse.org/c/cdo/cdo.git/commit/?id=17712c8490d2244692a4f3bac37964243da5deca
After few tests, all seems to work for this part. Thank you :)
You're welcome. Sorry that it took a few rounds ;-)
I created a new bugzilla for the remaining deadlock when using the CDOMergingConflictResolver: https://bugs.eclipse.org/bugs/show_bug.cgi?id=560957
Closing.