Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 351912 - Lock coordination with SynchronizableRepositories
Summary: Lock coordination with SynchronizableRepositories
Status: CLOSED FIXED
Alias: None
Product: EMF
Classification: Modeling
Component: cdo.core (show other bugs)
Version: 4.1   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Caspar D. CLA
QA Contact: Eike Stepper CLA
URL:
Whiteboard: Power to the People
Keywords:
Depends on: 353691
Blocks:
  Show dependency tree
 
Reported: 2011-07-13 00:18 EDT by Caspar D. CLA
Modified: 2012-09-21 07:17 EDT (History)
2 users (show)

See Also:
stepper: review-
stepper: review+


Attachments
Patch v1 (15.58 KB, patch)
2011-07-20 04:26 EDT, Caspar D. CLA
no flags Details | Diff
Patch v2 (18.65 KB, patch)
2011-07-20 06:46 EDT, Caspar D. CLA
no flags Details | Diff
Patch v3 (28.94 KB, patch)
2011-07-25 06:54 EDT, Caspar D. CLA
no flags Details | Diff
Patch v4 (39.54 KB, patch)
2011-07-26 07:04 EDT, Caspar D. CLA
no flags Details | Diff
Patch v5 (41.72 KB, patch)
2011-07-28 01:59 EDT, Caspar D. CLA
no flags Details | Diff
Patch v6 (50.50 KB, patch)
2011-07-28 06:59 EDT, Caspar D. CLA
no flags Details | Diff
Patch v7 (54.53 KB, patch)
2011-08-01 04:07 EDT, Caspar D. CLA
no flags Details | Diff
Patch v8 (63.05 KB, patch)
2011-08-02 07:14 EDT, Caspar D. CLA
no flags Details | Diff
Patch v9 (SVN patch, not an Eclipse WS patch) (31.54 KB, patch)
2011-08-22 07:35 EDT, Caspar D. CLA
no flags Details | Diff
Patch v10 (156.45 KB, patch)
2011-08-30 03:25 EDT, Caspar D. CLA
no flags Details | Diff
Patch v11 (163.06 KB, patch)
2011-08-30 06:56 EDT, Eike Stepper CLA
no flags Details | Diff
Test failures (4.99 KB, text/plain)
2011-08-30 07:24 EDT, Caspar D. CLA
no flags Details
Patch v12 (163.19 KB, patch)
2011-08-31 03:11 EDT, Caspar D. CLA
no flags Details | Diff
Patch v13 (207.63 KB, patch)
2011-09-01 03:25 EDT, Eike Stepper CLA
no flags Details | Diff
Patch v14 (incremental) (18.46 KB, patch)
2011-09-01 06:21 EDT, Caspar D. CLA
no flags Details | Diff
Patch v15 (incremental) (41.97 KB, patch)
2011-09-05 06:03 EDT, Caspar D. CLA
no flags Details | Diff
Patch for test breakage (incremental) (1.68 KB, patch)
2011-09-05 23:31 EDT, Caspar D. CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Caspar D. CLA 2011-07-13 00:18:52 EDT
Currently locks are largely meaningless in a master-clone (M/C)
replication config, because they are only obtained in the clone,
which typically has only a single client. The locks therefore do
not constrain the operations of other clients on the master repo.

To make locks meaningful in an M/C config, locks should be
obtained on both the master and the clone, whenever the clone is
ONLINE. This should be analogous to a write-through transaction,
in the sense that the client requests the locks on the clone repo
it's working with, and this clone repo transaparently tries to
obtain the locks on the master its synchronizing with. We could
call this a "lock-through" strategy ;-) A lock request would
succeed only if the lock is obtained on both the master and the
clone.

When the clone is OFFLINE, the clients lock requests should
automatically be targeted at its local branch. Again, this is
analogous to how commit behavior varies between OFFLINE and ONLINE
modes.
Comment 1 Eike Stepper CLA 2011-07-13 00:27:39 EDT
+1

(In reply to comment #0)
> When the clone is OFFLINE, the clients lock requests should
> automatically be targeted at its local branch. 

Note that, as of now, a local branch is only created when a commit happens in state OFFLINE. So the new locking behaviour must also be capable of *creating* new local branches.
Comment 2 Caspar D. CLA 2011-07-20 04:26:13 EDT
Created attachment 199960 [details]
Patch v1

Refactoring only, so far. Moved invocation of lockManager from
the signal to the repository, so that SynchronizableRepo will
be able to override it.
Comment 3 Caspar D. CLA 2011-07-20 06:46:12 EDT
Created attachment 199971 [details]
Patch v2
Comment 4 Caspar D. CLA 2011-07-25 06:54:46 EDT
Created attachment 200266 [details]
Patch v3
Comment 5 Caspar D. CLA 2011-07-26 07:04:36 EDT
Created attachment 200347 [details]
Patch v4
Comment 6 Caspar D. CLA 2011-07-28 01:59:15 EDT
Created attachment 200489 [details]
Patch v5
Comment 7 Caspar D. CLA 2011-07-28 04:44:03 EDT
(In reply to comment #1)
> Note that, as of now, a local branch is only created when a commit happens in
> state OFFLINE. So the new locking behaviour must also be capable of *creating*
> new local branches.

I was working on this and it made me wonder whether this really makes
sense.

Do we really want to *silenty* create a local branch when the user
is in fact trying to lock something in a non-local branch?

The typical use of locks is of course to prevent other users from
touching the objects being locked, so it seems unlikely that
a user would want his locking request on a non-local branch to
*apparently* succeed, while in fact he has only obtained local
locks on a silently created local branch -- and is not made aware
of that.

I suggest that we don't silently create local branches just for the
purpose of allowing lock requests to succeed. I think it's more
reasonable to reject the lock request under these circumstances.
If some user really does want local locks (an exotic proposition by
any standard), he should switch to a local branch explicitly, and
then request the locks.
Comment 8 Eike Stepper CLA 2011-07-28 05:28:12 EDT
+1
Comment 9 Caspar D. CLA 2011-07-28 06:59:11 EDT
Created attachment 200517 [details]
Patch v6
Comment 10 Caspar D. CLA 2011-08-01 04:07:04 EDT
Created attachment 200645 [details]
Patch v7
Comment 11 Caspar D. CLA 2011-08-01 04:58:26 EDT
Latest patch delivers what appears to be a working implementation
of the functionality. Still need to add tests.

I wonder whether lockAreas should get replicated from the master
to the clones. (While online, lockAreas are created on both the
clone that the client is connected to, and the master; the question
concerns *other* clones than the one where the lockArea is
initially created.)

I think they need to be replicated. If they don't get replicated,
then the lockArea can only get loaded by a client that's connected
to the clone where it was originally created.
Comment 12 Caspar D. CLA 2011-08-02 04:01:58 EDT
I think we need 2 additional things:

1. signal for notifying *other* clones/clients of lock changes
   when those other clones are online

2. extension of the SYNC mechanism, for updating clones with
   the latest locking info after an offline period
Comment 13 Caspar D. CLA 2011-08-02 04:52:00 EDT
Item (1) we can approach as a general locking notification
mechanism. I'll open a separate Bugzilla for this.

Item (2) seems pretty complicated to me. I think it comes
down to syncing the lockAreas, and it has to be made to
work in both the normal and raw replication modes. But 
unlike the syncing of data, for locks we only need to 
sync the latest state; how do we do this? Do we compute
a set of lock changes since the last sync?
Comment 14 Caspar D. CLA 2011-08-02 07:14:16 EDT
Created attachment 200701 [details]
Patch v8

Added a unit test.
Comment 16 Eike Stepper CLA 2011-08-03 04:05:29 EDT
Renaming this bug to cover fail-over (-participants), too.
Comment 17 Caspar D. CLA 2011-08-22 07:35:14 EDT
Created attachment 201909 [details]
Patch v9 (SVN patch, not an Eclipse WS patch)
Comment 19 Eike Stepper CLA 2011-08-29 04:07:00 EDT
I just got an NPE in LockManager:

org.eclipse.emf.cdo.tests.config.impl.ConfigTestException: Error in Bugzilla_316444_Test.testLockParentWithEAttributeChange [Combined, H2-branching-ranges, JVM, Native]
	at org.eclipse.emf.cdo.tests.config.impl.ConfigTest.runBare(ConfigTest.java:535)
	at junit.framework.TestResult$1.protect(TestResult.java:110)
	at junit.framework.TestResult.runProtected(TestResult.java:128)
	at junit.framework.TestResult.run(TestResult.java:113)
	at junit.framework.TestCase.run(TestCase.java:124)
	at org.eclipse.net4j.util.tests.AbstractOMTest.run(AbstractOMTest.java:260)
	at junit.framework.TestSuite.runTest(TestSuite.java:243)
	at org.eclipse.emf.cdo.tests.config.impl.ConfigTestSuite$TestWrapper.runTest(ConfigTestSuite.java:119)
	at junit.framework.TestSuite.run(TestSuite.java:238)
	at junit.framework.TestSuite.runTest(TestSuite.java:243)
	at junit.framework.TestSuite.run(TestSuite.java:238)
	at junit.framework.TestSuite.runTest(TestSuite.java:243)
	at junit.framework.TestSuite.run(TestSuite.java:238)
	at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: org.eclipse.emf.cdo.util.CommitException: Rollback in DBStore: java.lang.NullPointerException
	at org.eclipse.emf.cdo.internal.server.LockManager.getLockEntryObject(LockManager.java:115)
	at org.eclipse.emf.cdo.internal.server.TransactionCommitContext.isContainerLocked(TransactionCommitContext.java:836)
	at org.eclipse.emf.cdo.internal.server.TransactionCommitContext.isContainerLocked(TransactionCommitContext.java:820)
	at org.eclipse.emf.cdo.internal.server.TransactionCommitContext.lockObjects(TransactionCommitContext.java:749)
	at org.eclipse.emf.cdo.tests.bugzilla.Bugzilla_316444_Test$1$1.lockObjects(Bugzilla_316444_Test.java:111)
	at org.eclipse.emf.cdo.internal.server.TransactionCommitContext.write(TransactionCommitContext.java:443)
	at org.eclipse.emf.cdo.spi.server.InternalCommitContext$1.runLoop(InternalCommitContext.java:42)
	at org.eclipse.emf.cdo.spi.server.InternalCommitContext$1.runLoop(InternalCommitContext.java:1)
	at org.eclipse.net4j.util.om.monitor.ProgressDistributor.run(ProgressDistributor.java:96)
	at org.eclipse.emf.cdo.server.internal.net4j.protocol.CommitTransactionIndication.indicatingCommit(CommitTransactionIndication.java:248)
	at org.eclipse.emf.cdo.server.internal.net4j.protocol.CommitTransactionIndication.indicating(CommitTransactionIndication.java:96)
	at org.eclipse.emf.cdo.server.internal.net4j.protocol.CDOServerIndicationWithMonitoring.indicating(CDOServerIndicationWithMonitoring.java:109)
	at org.eclipse.net4j.signal.IndicationWithMonitoring.indicating(IndicationWithMonitoring.java:84)
	at org.eclipse.net4j.signal.IndicationWithResponse.doExtendedInput(IndicationWithResponse.java:90)
	at org.eclipse.net4j.signal.Signal.doInput(Signal.java:326)
	at org.eclipse.net4j.signal.IndicationWithResponse.execute(IndicationWithResponse.java:63)
	at org.eclipse.net4j.signal.IndicationWithMonitoring.execute(IndicationWithMonitoring.java:63)
	at org.eclipse.net4j.signal.Signal.runSync(Signal.java:251)
	at org.eclipse.net4j.signal.Signal.run(Signal.java:147)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

	at org.eclipse.emf.internal.cdo.transaction.CDOSingleTransactionStrategyImpl.commit(CDOSingleTransactionStrategyImpl.java:94)
	at org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commit(CDOTransactionImpl.java:1099)
	at org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commit(CDOTransactionImpl.java:1119)
	at org.eclipse.emf.cdo.tests.bugzilla.Bugzilla_316444_Test$ThreadA.doRun(Bugzilla_316444_Test.java:433)
	at org.eclipse.emf.cdo.tests.bugzilla.Bugzilla_316444_Test$AbstactTestThread.run(Bugzilla_316444_Test.java:383)

Please fix it in this bugzilla.
Comment 20 Caspar D. CLA 2011-08-30 03:25:07 EDT
Created attachment 202387 [details]
Patch v10

It works. See OfflineLockingTest.

Still many TODOs, will work in those in the coming days.
Comment 21 Caspar D. CLA 2011-08-30 03:28:19 EDT
Flagging for review... I know it's a big patch, but hey at least
it's smaller than the one for bug 353691 was.

Also, this one contains a considerable amount of renames, small
improvements, etc., which should be trivial to review.
Comment 22 Eike Stepper CLA 2011-08-30 06:56:55 EDT
Created attachment 202402 [details]
Patch v11

2 tests are now failing ;-(
Comment 23 Caspar D. CLA 2011-08-30 07:23:26 EDT
Curiously, I cannot reproduce the failures. All tests are passing
with your patch, as they were with mine.

What I do notice though, is that your line numbers are slightly 
different in the stack traces you posted in Skype. Your first
failure happens at LockingNotificationsTest.java:191, but
that's a blank line for me. In my trunk + patch codebase
the assertion is on line 193 . . .

And your second failure is on line 331, but there I have
"tx.commit()". I have the assertion on line 327.

Not sure what's going on here.
Comment 24 Caspar D. CLA 2011-08-30 07:24:58 EDT
Created attachment 202407 [details]
Test failures

Copied from our Skype convo.
Comment 25 Caspar D. CLA 2011-08-31 03:11:30 EDT
Created attachment 202479 [details]
Patch v12
Comment 26 Eike Stepper CLA 2011-08-31 05:10:08 EDT
- CDOSessionImpl.OptionsImpl.setLockNotificationMode(LockNotificationMode) does not fire a CDOCommonSession.Options.LockNotificationModeEvent (also missing).

Committed revision 8997:
- trunk/plugins/org.eclipse.emf.cdo.tests
Comment 27 Eike Stepper CLA 2011-09-01 03:25:36 EDT
Created attachment 202586 [details]
Patch v13

Ignore the previous comment!

- I've added org.eclipse.emf.cdo.common.CDOCommonView.Options.
- CDOSessionImpl.OptionsImpl.setLockNotificationMode(LockNotificationMode) does not fire a CDOCommonSession.Options.LockNotificationModeEvent (also missing).
- Javadoc missing on CDOLockChangeInfoHandler, CDOSessionLocksChangedEvent.
- I've consolidated the two LocksChangedEvents and added a getSender() method.
- Please double-check that the notification timestamps are correct (needed for raw replication).
- I've added CDOSessionConfiguration.setLockNotificationMode() and used it in OpenSessionRequest/Indication
- Please consolidate replication of commits and locks (e.g. retry behaviour).

All tests in the main suites are passing. Please request a brief review after the above action items are addressed. Thanks!!
Comment 28 Caspar D. CLA 2011-09-01 03:55:58 EDT
Committed revision 8999.
Comment 29 Caspar D. CLA 2011-09-01 06:21:26 EDT
Created attachment 202598 [details]
Patch v14 (incremental)

(In reply to comment #27)
> CDOSessionImpl.OptionsImpl.setLockNotificationMode(LockNotificationMode)
> does not fire a
> CDOCommonSession.Options.LockNotificationModeEvent (also
> missing).

Added.

> Javadoc missing on CDOLockChangeInfoHandler,
> CDOSessionLocksChangedEvent.

Added.

> I've consolidated the two LocksChangedEvents and added a
> getSender() method.

OK.

> Please double-check that the notification timestamps are correct
> (needed for raw replication).

TBD.

> - I've added CDOSessionConfiguration.setLockNotificationMode()
> and used it in OpenSessionRequest/Indication

OK. I added a signal to change this option while the session is open.

> - Please consolidate replication of commits and locks (e.g.
> retry behaviour).

TBD.

I also implemented #updateLockArea in the DBStoreAccessor. H2
offline suite passing now.
Comment 30 Caspar D. CLA 2011-09-01 06:51:13 EDT
> - Please consolidate replication of commits and locks (e.g.
> retry behaviour).

Actually I'm not sure what you mean. Both are governed by the
same try/catch in ReplicateRunnable... what is there to consolidate?
Comment 31 Caspar D. CLA 2011-09-05 05:48:01 EDT
(In reply to comment #30)
> > - Please consolidate replication of commits and locks (e.g.
> > retry behaviour).
> 
> Actually I'm not sure what you mean. Both are governed by the
> same try/catch in ReplicateRunnable... what is there to consolidate?

OK, I got it. Effected by moving the retry behavior into an abstract
baseclass RetryingRunnable.
Comment 32 Caspar D. CLA 2011-09-05 05:49:37 EDT
> Please double-check that the notification timestamps are correct
> (needed for raw replication).

Done.
Comment 33 Caspar D. CLA 2011-09-05 06:03:58 EDT
Created attachment 202743 [details]
Patch v15 (incremental)
Comment 34 Eike Stepper CLA 2011-09-05 06:20:57 EDT
(In reply to comment #31)
> > Actually I'm not sure what you mean. Both are governed by the
> > same try/catch in ReplicateRunnable... what is there to consolidate?
> 
> OK, I got it. Effected by moving the retry behavior into an abstract
> baseclass RetryingRunnable.

Hehe, it pays off not always to reply immediately :P
Comment 35 Eike Stepper CLA 2011-09-05 06:53:32 EDT
Good work!
Comment 36 Caspar D. CLA 2011-09-05 07:10:10 EDT
Committed revision 9036.

I consider this feature to be in a reasonable working state now,
albeit still a bit beta-ish. Will continue to polish this, but
I'll log new Bugzillas for further enhancements. Therefore,
resolving this one.
Comment 37 Eike Stepper CLA 2011-09-05 13:19:52 EDT
With the incremental patch committed lots of tests in the H2All suite are failing (~50 failures per scenario). I guess it's caused by a single minor problem. Can you please look at it?
Comment 38 Caspar D. CLA 2011-09-05 23:31:13 EDT
Created attachment 202776 [details]
Patch for test breakage (incremental)

(In reply to comment #37)
> With the incremental patch committed lots of tests in the H2All suite are
> failing (~50 failures per scenario). I guess it's caused by a single minor
> problem. Can you please look at it?

Hmm... I shouldn't have missed that.

Assuming that you want the build stable ASAP, I've committed
the attached fix in revision 9039.
Comment 39 Caspar D. CLA 2011-09-05 23:59:57 EDT
H2 AllTests passing cleanly now.
Comment 40 Eike Stepper CLA 2011-09-06 00:56:34 EDT
Thanks, better now ;-)
Comment 41 Eike Stepper CLA 2012-09-21 07:17:47 EDT
Closing.