Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 312535 - Partial commits
Summary: Partial commits
Status: CLOSED FIXED
Alias: None
Product: EMF
Classification: Modeling
Component: cdo.core (show other bugs)
Version: 4.0   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Caspar D. CLA
QA Contact: Eike Stepper CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-05-12 03:29 EDT by Caspar D. CLA
Modified: 2010-09-13 04:56 EDT (History)
1 user (show)

See Also:
stepper: review+
stepper: review+


Attachments
Patch (81.62 KB, patch)
2010-09-08 03:27 EDT, Caspar D. CLA
no flags Details | Diff
Patch v2 - ready to be committed (84.01 KB, patch)
2010-09-11 06:26 EDT, Eike Stepper CLA
no flags Details | Diff
Patch (incremental) (3.38 KB, patch)
2010-09-13 02:30 EDT, Caspar D. CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Caspar D. CLA 2010-05-12 03:29:04 EDT
It would be useful to somehow support partial commits; that
is, to somehow allow for only some of the dirty data in a tx
(i.e.  updates, deletions and new objects) to be committed,
instead of all.

The use case for this is when, from the user's conceptual
perspective, the repository contains distinct groups of data
(e.g. "modules"). The user will likely edit multiple groups
at the same time, but might want to save them separately.
(Viewing the groups in different tx's is not generally a
solution, as there will likely be relations across group
boundaries.)

One difficulty with this idea is that we can't allow
inconsistencies to arise. So if, for example, an object A is
moved from parent P1 to parent P2, then a partial commit
including A, must check that P1 and P2 are also included, as
otherwise the resulting persisted state would be
inconsistent.

Solution ideas that Eike and I discussed elsewhere:

1) Add sth like: 
   CDOTransaction.commit(Set<CDOObject> committables)

   IMO, this approach is clean from the caller's
perspective, but admittedly, the drawback is that this
undermines the fundamental idea that a TX is the mechanism
for separating one set of changes from another.

2) Implement sth like
   SomeUtil.copyChanges(sourceTx, targetTx, cdoObjects)

  This would allow partial commits to be realized by copying
the subset of changes over to another tx, and committing
only that 2nd tx. The difficulty with this approach, is that
we need the committed object's state to transition to CLEAN
not only in the 2nd tx, but also in the original sourceTx, 
and retain their updated state in the sourceTx.
   Another question is how to avoid/handle the "phantom
conflicts" that will arise as a consequence of the sourceTx
receiving a CommitNotification from the server, concerning
the targetTx's commit, which includes the sourceTx's own
changes.
Comment 1 Caspar D. CLA 2010-05-25 05:52:52 EDT
As discussed on Skype between Eike and myself, another
possible approach is to introduce something like 

 CDOTransaction.setCommittables(Set<CDOObject> committables)

This set would be null by default, and when left null, the
commit behavior would revert to what it currently is. This
way, the current commit API would not be impacted.

When the set is not null, it acts as a filter on the commit.
That is, only the dirty/new/detached objects contained in
the set, are actually committed.

Yet another alternative is to use the inverse concept:

 CDOTransaction.setExcludeFromCommit(Set<CDOObject> excludes)

But the first approach corresponds more closely to what is
used in DCVS systems such as Git, and is probably also more
straightforward from a user point of view.

Will explore further.
Comment 2 Caspar D. CLA 2010-06-01 06:47:13 EDT
The major challenge that I'm encountering, is this: during
preCommit, we need to check the integrity of the commit.
That is, we need to check that the set of committables
includes enough to constitute a coherent set of updates. For
example, if an object was moved from parent1 to parent2,
then both parent1 and parent2 must be included in the
commit. But this is not easy...

The difficulty is this: once an object is dirty, the
revision that was originally loaded for it, is no longer
guaranteed to be available from the revisionManager's cache.
It may have been GC'ed, in which case fetching it from the
revMgr will actually cause it to be retrieved from the
server, and that retrieved revision may be different from
what was originally loaded in the client's session. (On a
side note: this also gives rise to bug 315026.) 

So, knowing this, how does one check the consistency of a
partial commit? How to check what the old parent was, for
example? This info is contained only in the revision that
was originally loaded, and that may have disappeared from
the session.

It may be necessary to introduce a map that for every dirty
object, holds on to the revisions that were originally
loaded. I'm not sure how complicated this will get. There
are quite a few places in the code where
setInternalCDORevision(..) is called, and I suspect that
each of those cases will have to be thought through.

Another possibility is to keep the old values in the
featureDeltas. But I'm not sure what currenlty happens in
the case of multiple deltas for the same feature, so this
too could be less easy than it sounds.
Comment 3 Eike Stepper CLA 2010-06-29 04:50:26 EDT
Rebasing all outstanding enhancements requests to version 4.0
Comment 4 Caspar D. CLA 2010-09-08 03:27:08 EDT
Created attachment 178373 [details]
Patch

Here's a working implementation of the 'partial commit' concept,
including a mechanism that thoroughly checks if the collection
of objects being committed, are a complete and consistent whole
(see CommitIntegrityCheck.java), i.e. will not cause corruption
of the graph on the server side.
Comment 5 Caspar D. CLA 2010-09-08 03:28:57 EDT
Forgot to mention: unit tests are also included in the patch:
PartialCommitTest.java.
Comment 6 Eike Stepper CLA 2010-09-11 06:26:59 EDT
Created attachment 178678 [details]
Patch v2 - ready to be committed
Comment 7 Eike Stepper CLA 2010-09-11 06:27:39 EDT
I wonder if we really need the get/setCommittables() method in XA transaction. Shouldn't we move them down from CDOUserTransaction to CDOTransaction?
Comment 8 Caspar D. CLA 2010-09-13 01:47:04 EDT
Committed to HEAD
Comment 9 Caspar D. CLA 2010-09-13 02:30:48 EDT
Created attachment 178715 [details]
Patch (incremental)

(In reply to comment #7)
> I wonder if we really need the get/setCommittables() method in XA transaction.
> Shouldn't we move them down from CDOUserTransaction to CDOTransaction?

Yeah I see your point.

I guess it's unnecessary to have get/setCommittables on the XA tx, because
the caller can (should?) call these methods on the underlying normal tx's. 
And the caller probably wants to set different committables for each 
tx...right? I'm not sure about this; I don't know the use cases that XA 
tx's are meant to support.

So.. up to you really. Here's a patch for it anyway.
Comment 10 Eike Stepper CLA 2010-09-13 03:35:56 EDT
Go ahead, please ;-)
Comment 11 Caspar D. CLA 2010-09-13 04:54:22 EDT
Committed to HEAD