Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 334981

Summary: LockObjectsIndication inappropriately throws exception for stale revisions
Product: [Modeling] EMF Reporter: Caspar D. <caspar_d>
Component: cdo.coreAssignee: Caspar D. <caspar_d>
Status: CLOSED FIXED QA Contact: Eike Stepper <stepper>
Severity: minor    
Priority: P3 CC: saulius.tvarijonas, stepper
Version: 4.0Flags: stepper: review+
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard:
Attachments:
Description Flags
Patch v1
none
Patch v2
none
Patch v3
none
Patch v4 none

Description Caspar D. CLA 2011-01-21 03:55:09 EST
As I already noted a while ago in a comment at LockObjectsIndication.java,
lines 76 ff., it doesn't make much sense that an exception is thrown
when the logic in #handleViewedRevision discovers that one of the objects
that the client is trying to lock, is not up to date. In this situation,
nothing is really "wrong"; it's normal for a client (especially ones
with passiveUpdates=false) to hold some stale revisions.

One could argue that an exception is simply the easiest way of
reporting the problem back to the client, but it has unfortunate 
side effects. A server-side exception gets wrapped in a RemoteException,
sent to the client, and is dumped to the console there, suggesting
serious breakage to the user while in fact the problem was to be
expected. It's also problematic that the user app can't deal with this
expected problem in a straightforward manner.

I propose that we rework this to report the problem back through
the responding/confirming mechanism.
Comment 1 Caspar D. CLA 2011-01-21 03:55:49 EST
Sample stacktrace:
[ERROR] Client's revision of object OID39 is not the latest version in branch Branch[id=0, name=MAIN]
java.lang.IllegalArgumentException: Client's revision of object OID39 is not the latest version in branch Branch[id=0, name=MAIN]
        at org.eclipse.emf.cdo.server.internal.net4j.protocol.LockObjectsIndication.handleViewedRevision(LockObjectsIndication.java:91)
        at org.eclipse.emf.cdo.server.internal.net4j.protocol.LockObjectsIndication.indicating(LockObjectsIndication.java:59)
        at org.eclipse.emf.cdo.server.internal.net4j.protocol.CDOServerIndication.indicating(CDOServerIndication.java:84)
        at org.eclipse.net4j.signal.IndicationWithResponse.doExtendedInput(IndicationWithResponse.java:90)
        at org.eclipse.net4j.signal.Signal.doInput(Signal.java:316)
        at org.eclipse.net4j.signal.IndicationWithResponse.execute(IndicationWithResponse.java:63)
        at org.eclipse.emf.cdo.server.internal.net4j.protocol.CDOReadIndication.execute(CDOReadIndication.java:36)
        at org.eclipse.net4j.signal.Signal.runSync(Signal.java:241)
        at org.eclipse.net4j.signal.Signal.run(Signal.java:147)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Comment 2 Eike Stepper CLA 2011-01-25 14:06:20 EST
Caspar, can you work on this? I agree with your prososal.
Comment 3 Caspar D. CLA 2011-01-25 22:08:54 EST
OK, assigning to myself..
Comment 4 Caspar D. CLA 2011-02-09 05:29:18 EST
Created attachment 188576 [details]
Patch v1
Comment 5 Caspar D. CLA 2011-02-09 05:40:22 EST
The patch I posted makes the desired changes in the signal classes
involved, and that works perfectly fine.

But the remaining question is how to work this into the CDOObject API.

Consider a call cdoObject.readLock().lock()

The return type is currently void. We could keep it that way and
translate a failure (which, with my patch in place, is no longer
communicated as a RemoteException, but as a normal result through
the response stream) back to a local exception, e.g. a 
'StaleRevisionEx' or whatever.

But a lock could also fail because of a timeout. (This I haven't
yet changed to be communicated back as a regular result. Should I?
I suppose so.)

Or should we have a return type enum LockResult {OK,STALE,TIMEOUT} ?

It's a matter of preference really. But the spirit of this bugzilla
seems to be that certain reasons for a lock to fail are "normal" and
therefore should be communicated as a regular result of the call,
rather than as exceptions. I think both stale revisions and timeouts
are such "normal" reasons for a lock to fail.

Comments anyone?
Comment 6 Eike Stepper CLA 2011-02-10 02:49:09 EST
LockObjectsResult seems to have no effect. LockObjectsResult.isSuccessful is not called from anywhere. Is something missing?

Oh, you didn't request a review, yet. Maybe it's work in progress...
Comment 7 Caspar D. CLA 2011-02-10 03:17:08 EST
(In reply to comment #6)
> Oh, you didn't request a review, yet. Maybe it's work in progress...

Yeah.. I explained the status of the patch in my my last post
of 2011-02-09, and asked for comments...
Comment 8 Eike Stepper CLA 2011-02-10 08:15:34 EST
Maybe I don't see all consequences, yet, but I'd vote for another unchecked exception. My arguments:

1) Part of the CDOLock API is inherited from java.util.concurrent.locks.Lock, where we can not change return types.

2) There is also CDOView.lockObject(...) and it would requre a different return type.

3) Return values are overlooked all too easily
Comment 9 Caspar D. CLA 2011-02-10 23:45:49 EST
Created attachment 188736 [details]
Patch v2

This patch uses the same exceptions that were thrown before, i.e.
an TimeoutRuntimeEx for lock timeouts, and an IllegalArgEx for
lock failures due to stale revisions. But unlike the original 
code, the failures are communicated back through the normal
response stream instead of using RemoteExceptions and an additional
signal. The exceptions are constructed in CDOViewImpl.lockObjects
Comment 10 Caspar D. CLA 2011-02-11 00:06:14 EST
Created attachment 188737 [details]
Patch v3

This patch uses custom exception types.
Comment 11 Eike Stepper CLA 2011-02-11 02:59:00 EST
Created attachment 188747 [details]
Patch v4

I've tried to simplify it a little by directly returning the appropriate exception from confirming().

I was just about to approve the patch, when I realized that it breaks the tests. Both patches do that, v3 and v4.

The failure is strange, it happens during commit because of revision.setRevised(revision.getTimeStamp()-1).

And it only seems to happen if all tests are executed in sequence. Executing the failures only does not fail. Some static state involved??
Comment 12 Eike Stepper CLA 2011-02-11 03:03:10 EST
The failures seem to occur less frequently if executing in debug mode, but they do occur, too.
Comment 13 Eike Stepper CLA 2011-02-11 03:05:20 EST
Darn, the failures also occur without any of the two patches. Just even less frequently. Investigating...
Comment 14 Eike Stepper CLA 2011-02-11 03:14:19 EST
Okay, that was just a timing bug in the RepositoryConfig setup. I fixed it ;-)
Comment 15 Eike Stepper CLA 2011-02-11 03:21:03 EST
Please go ahead and commit, if you don't object to my changes...
Comment 16 Caspar D. CLA 2011-02-11 04:46:52 EST
Committed to trunk, rev. 7061
Comment 17 Eike Stepper CLA 2011-06-23 03:40:23 EDT
Available in R20110608-1407