Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 346792 - OfflineCloneExample fails with IllegalArgumentException: Cannot end transaction with unknown timestamp 1305996854765
Summary: OfflineCloneExample fails with IllegalArgumentException: Cannot end transacti...
Status: CLOSED WORKSFORME
Alias: None
Product: EMF
Classification: Modeling
Component: cdo.core (show other bugs)
Version: 4.2   Edit
Hardware: PC Windows XP
: P3 major (vote)
Target Milestone: ---   Edit
Assignee: Eike Stepper CLA
QA Contact: Eike Stepper CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 336806
  Show dependency tree
 
Reported: 2011-05-21 13:03 EDT by Martin Fluegge CLA
Modified: 2012-12-31 04:52 EST (History)
5 users (show)

See Also:
Ed.Merks: pmc_approved+


Attachments
Patch v1 (4.13 KB, patch)
2011-05-24 02:57 EDT, Caspar D. CLA
no flags Details | Diff
Proposed updated checkEvent() method. (1.00 KB, patch)
2011-06-08 21:50 EDT, Steve Robenalt CLA
no flags Details | Diff
Revised test cases. (1.00 KB, patch)
2011-06-08 21:52 EDT, Steve Robenalt CLA
no flags Details | Diff
Prototype Offline Persistence Service source code (14.92 KB, application/octet-stream)
2011-06-21 13:17 EDT, Steve Robenalt CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Fluegge CLA 2011-05-21 13:03:49 EDT
While working on the OfflineCloneExample I got the attached exception. Way to reproduce:
-	Start Master
-	Start Clone
-	Start Client
-	Hit <Enter> to make the client commit some data
-	Watch the exception appear
I had a quick look and noticed that there is a comment about bug Bug 297940, which seems to be somehow related. So I added Caspar, maybe he has an idea what is going wrong here. 

Committing an object to MAIN
Exception in thread "main" java.lang.RuntimeException: org.eclipse.emf.cdo.util.CommitException: Rollback in DBStore: java.lang.IllegalArgumentException: Cannot end transaction with unknown timestamp 1305996854765
	at org.eclipse.emf.cdo.internal.server.TimeStampAuthority.endCommit(TimeStampAuthority.java:119)
	at org.eclipse.emf.cdo.internal.server.Repository.endCommit(Repository.java:795)
	at org.eclipse.emf.cdo.internal.server.TransactionCommitContext.commit(TransactionCommitContext.java:436)
	at org.eclipse.emf.cdo.internal.server.syncing.SynchronizableRepository$WriteThroughCommitContext.commit(SynchronizableRepository.java:526)
	at org.eclipse.emf.cdo.spi.server.InternalCommitContext$2.runLoop(InternalCommitContext.java:52)
	at org.eclipse.emf.cdo.spi.server.InternalCommitContext$2.runLoop(InternalCommitContext.java:1)
	at org.eclipse.net4j.util.om.monitor.ProgressDistributor.run(ProgressDistributor.java:96)
	at org.eclipse.emf.cdo.server.internal.net4j.protocol.CommitTransactionIndication.indicatingCommit(CommitTransactionIndication.java:244)
	at org.eclipse.emf.cdo.server.internal.net4j.protocol.CommitTransactionIndication.indicating(CommitTransactionIndication.java:92)
	at org.eclipse.emf.cdo.server.internal.net4j.protocol.CDOServerIndicationWithMonitoring.indicating(CDOServerIndicationWithMonitoring.java:109)
	at org.eclipse.net4j.signal.IndicationWithMonitoring.indicating(IndicationWithMonitoring.java:84)
	at org.eclipse.net4j.signal.IndicationWithResponse.doExtendedInput(IndicationWithResponse.java:90)
	at org.eclipse.net4j.signal.Signal.doInput(Signal.java:326)
	at org.eclipse.net4j.signal.IndicationWithResponse.execute(IndicationWithResponse.java:63)
	at org.eclipse.net4j.signal.IndicationWithMonitoring.execute(IndicationWithMonitoring.java:63)
	at org.eclipse.net4j.signal.Signal.runSync(Signal.java:251)
	at org.eclipse.net4j.signal.Signal.run(Signal.java:147)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
	at java.lang.Thread.run(Thread.java:595)

	at org.eclipse.emf.cdo.examples.server.offline.OfflineExampleClient.addObject(OfflineExampleClient.java:66)
	at org.eclipse.emf.cdo.examples.server.offline.OfflineExampleClient.main(OfflineExampleClient.java:158)
Caused by: org.eclipse.emf.cdo.util.CommitException: Rollback in DBStore: java.lang.IllegalArgumentException: Cannot end transaction with unknown timestamp 1305996854765
	at org.eclipse.emf.cdo.internal.server.TimeStampAuthority.endCommit(TimeStampAuthority.java:119)
	at org.eclipse.emf.cdo.internal.server.Repository.endCommit(Repository.java:795)
	at org.eclipse.emf.cdo.internal.server.TransactionCommitContext.commit(TransactionCommitContext.java:436)
	at org.eclipse.emf.cdo.internal.server.syncing.SynchronizableRepository$WriteThroughCommitContext.commit(SynchronizableRepository.java:526)
	at org.eclipse.emf.cdo.spi.server.InternalCommitContext$2.runLoop(InternalCommitContext.java:52)
	at org.eclipse.emf.cdo.spi.server.InternalCommitContext$2.runLoop(InternalCommitContext.java:1)
	at org.eclipse.net4j.util.om.monitor.ProgressDistributor.run(ProgressDistributor.java:96)
	at org.eclipse.emf.cdo.server.internal.net4j.protocol.CommitTransactionIndication.indicatingCommit(CommitTransactionIndication.java:244)
	at org.eclipse.emf.cdo.server.internal.net4j.protocol.CommitTransactionIndication.indicating(CommitTransactionIndication.java:92)
	at org.eclipse.emf.cdo.server.internal.net4j.protocol.CDOServerIndicationWithMonitoring.indicating(CDOServerIndicationWithMonitoring.java:109)
	at org.eclipse.net4j.signal.IndicationWithMonitoring.indicating(IndicationWithMonitoring.java:84)
	at org.eclipse.net4j.signal.IndicationWithResponse.doExtendedInput(IndicationWithResponse.java:90)
	at org.eclipse.net4j.signal.Signal.doInput(Signal.java:326)
	at org.eclipse.net4j.signal.IndicationWithResponse.execute(IndicationWithResponse.java:63)
	at org.eclipse.net4j.signal.IndicationWithMonitoring.execute(IndicationWithMonitoring.java:63)
	at org.eclipse.net4j.signal.Signal.runSync(Signal.java:251)
	at org.eclipse.net4j.signal.Signal.run(Signal.java:147)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
	at java.lang.Thread.run(Thread.java:595)

	at org.eclipse.emf.internal.cdo.transaction.CDOSingleTransactionStrategyImpl.commit(CDOSingleTransactionStrategyImpl.java:94)
	at org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commit(CDOTransactionImpl.java:1058)
	at org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commit(CDOTransactionImpl.java:1078)
	at org.eclipse.emf.cdo.examples.server.offline.OfflineExampleClient.addObject(OfflineExampleClient.java:59)
	... 1 more
Comment 1 Eike Stepper CLA 2011-05-23 12:10:20 EDT
I suspect that the entire RepositorySynchronizer might be broken: Caspar, can you please look at this urgently?
Comment 2 Eike Stepper CLA 2011-05-23 13:01:41 EDT
Indeed, all H2 offline tests fail with:

[ERROR] Cannot end transaction with unknown timestamp 1306169975876
java.lang.IllegalArgumentException: Cannot end transaction with unknown timestamp 1306169975876
	at org.eclipse.emf.cdo.internal.server.TimeStampAuthority.endCommit(TimeStampAuthority.java:119)
	at org.eclipse.emf.cdo.internal.server.Repository.endCommit(Repository.java:795)
	at org.eclipse.emf.cdo.internal.server.TransactionCommitContext.commit(TransactionCommitContext.java:445)
	at org.eclipse.emf.cdo.internal.server.syncing.SynchronizableRepository.handleCommitInfo(SynchronizableRepository.java:215)
	at org.eclipse.emf.cdo.tests.config.impl.RepositoryConfig$OfflineConfig$1.handleCommitInfo(RepositoryConfig.java:599)
	at org.eclipse.emf.cdo.internal.server.syncing.RepositorySynchronizer$CommitRunnable.run(RepositorySynchronizer.java:511)
	at org.eclipse.net4j.util.concurrent.QueueRunner.work(QueueRunner.java:26)
	at org.eclipse.net4j.util.concurrent.QueueRunner.work(QueueRunner.java:1)
	at org.eclipse.net4j.util.concurrent.QueueWorker.doWork(QueueWorker.java:81)
	at org.eclipse.net4j.util.concurrent.QueueWorker.work(QueueWorker.java:72)
	at org.eclipse.net4j.util.concurrent.Worker$WorkerThread.run(Worker.java:206)
Comment 3 Caspar D. CLA 2011-05-24 02:57:50 EDT
Created attachment 196397 [details]
Patch v1

This allows some tests to pass, but not all.
Comment 4 Eike Stepper CLA 2011-06-08 06:12:00 EDT
Committed revision 7962
Comment 5 Eike Stepper CLA 2011-06-08 06:14:28 EDT
This is a minor fix that does not cure all problems I discovered in the tests, but it is essential to only get the OfflineExample up and running. PMC, please approve for late contribution to Indigo.
Comment 6 Steve Robenalt CLA 2011-06-08 12:02:27 EDT
(In reply to comment #5)
> This is a minor fix that does not cure all problems I discovered in the tests,
> but it is essential to only get the OfflineExample up and running. PMC, please
> approve for late contribution to Indigo.

Let me know if there is any way I can help with this issue.
Comment 7 Eike Stepper CLA 2011-06-08 13:09:48 EDT
(In reply to comment #6)
> Let me know if there is any way I can help with this issue.

I *think* I've fixed this particular problem but the "CDO AllTests (H2 offline)" test launch demos other problems. If you want to and have time you could try to run these tests and see if you can find a reason/fix. I'm currently quite busy completing the docs...
Comment 8 Steve Robenalt CLA 2011-06-08 13:17:38 EDT
(In reply to comment #7)
> (In reply to comment #6)
> > Let me know if there is any way I can help with this issue.
> 
> I *think* I've fixed this particular problem but the "CDO AllTests (H2
> offline)" test launch demos other problems. If you want to and have time you
> could try to run these tests and see if you can find a reason/fix. I'm
> currently quite busy completing the docs...

I'll set up a workspace and see if I can get it to run.
Comment 9 Eike Stepper CLA 2011-06-08 13:49:26 EDT
It's best to follow this simple tutorial: http://wiki.eclipse.org/CDO_Source_Installation
Comment 10 Steve Robenalt CLA 2011-06-08 21:50:46 EDT
Created attachment 197661 [details]
Proposed updated checkEvent() method.

Allows the event listener to be polled properly.
Comment 11 Steve Robenalt CLA 2011-06-08 21:52:55 EDT
Created attachment 197662 [details]
Revised test cases.

Updated assertions to reflect creation of 2 folders.
Comment 12 Steve Robenalt CLA 2011-06-08 22:09:39 EDT
Sorry to put the proposed patches before the text. I thought the attachments would be included when I committed the entire bug.

Attached are 2 proposed patches to cover problems I found with the offline tests in MEM configuration, rather than H2. Based on the nature of the issues, I thought it best to post these now since they will almost certainly affect other test cases.

First, there are 2 test cases that were failing due to a discrepancy in the assert statements. Apparently, the original expected values for the number of CDO objects created and the number of events produced expected a single folder, but the folder created with the company contains a primary and a subfolder component, which causes two folder objects to be created. Since the test case doesn't appear to have changed wrt the folder being created, I presume that CDO itself has changed, but I did not verify that any further.

The second problem affected the testMasterCommits_NotificationsFromBackup test case and caused it to miss the events that were produced when the CDO Objects were created/updated. The origin of the problem is with the CDOSessionImpl.waitForUpdate() method, which was revised as part of the fix for bug #339064. 

The updated waitForUpdate() method does not wait if there are no views registered. Since there were no views on the backupSession, it simply returned a value of true immediately, which is expected by the test case. However, the checkEvent() method as it was implemented did not properly poll the listener, so the event would only be caught if it existed when the checkEvent method was first called. I've submitted a proposed patch which fixes the checkEvent method so that it polls properly, and thus fixes the test cases.

I'll defer to the CDO experts as to whether or not the waitForUpdate method should actually wait in the absence of views. It seems to me that it should wait, but maybe that's only a concern for this test case.

I'm continuing to test as well.
Comment 13 Steve Robenalt CLA 2011-06-08 22:13:02 EDT
Note that the proposed patches are not directly related to this bug. Rather, they are related to the failed test cases associated with offline mode.
Comment 14 Steve Robenalt CLA 2011-06-10 17:22:07 EDT
I verified today that my prototype offline mode application works correctly when I run a locally built CDO 4.0.0 incorporating the fix in revision 7962 mentioned by Eike in comment #4.

Using the same build, my automatic merge to main when the connection is restored also works.

I'm continuing to look at failing/error test cases in the H2 offline configuration - there are quite a few. Progress is a bit slow as I'm learning the CDO and test codebase in the process.
Comment 15 Eike Stepper CLA 2011-06-11 01:49:42 EDT
(In reply to comment #14)
> I verified today that my prototype offline mode application works correctly
> when I run a locally built CDO 4.0.0 incorporating the fix in revision 7962
> mentioned by Eike in comment #4.

That's good news.

> Using the same build, my automatic merge to main when the connection is
> restored also works.

Even better ;-)

> I'm continuing to look at failing/error test cases in the H2 offline
> configuration - there are quite a few. 

It can be related with the test setups, or not...

> Progress is a bit slow as I'm learning
> the CDO and test codebase in the process.

Well, I expect that's pure fun :P
Comment 16 Steve Robenalt CLA 2011-06-14 19:52:45 EDT
(In reply to comment #15)
Okay, I found the cause of at least some of the test errors in the H2 offline group (and possibly other tests as well). There's a minor flaw in the test setUp() methods regarding the activation of the acceptors.

In ManagedContainer.getElement(String, String, String), which gets the acceptor and activates it (because the activate flag is forced to true, the first action is to call checkActive(), which returns normally if the acceptor is already active, but throws IllegalStateException if the acceptor is currently inactive. As a result, it never gets to the point where it can be activated.

This normally causes the first test in a test case to pass (or fail) normally, close the acceptor during tearDown() (leaving it inactive) and causes the remaining tests to throw an exception in the test setup, resulting in an error.

It seems to me that the best fix would be to make checkActive() a postcondition (if the activate flag is true) for the method, rather than a precondition. I'm experimenting with this change locally, but wanted to be sure I'm not missing something.

Should I create a new bug for this?

> (In reply to comment #14)
> > I verified today that my prototype offline mode application works correctly
> > when I run a locally built CDO 4.0.0 incorporating the fix in revision 7962
> > mentioned by Eike in comment #4.
> 
> That's good news.
> 
> > Using the same build, my automatic merge to main when the connection is
> > restored also works.
> 
> Even better ;-)
> 
> > I'm continuing to look at failing/error test cases in the H2 offline
> > configuration - there are quite a few. 
> 
> It can be related with the test setups, or not...
> 
> > Progress is a bit slow as I'm learning
> > the CDO and test codebase in the process.
> 
> Well, I expect that's pure fun :P
Comment 17 Caspar D. CLA 2011-06-20 03:11:10 EDT
(In reply to comment #14)
> I verified today that my prototype offline mode application
> works correctly

Steve, would you be willing to share this prototype, or is this
something you need to keep proprietary? Either way is fine
of course, just asking so I won't duplicate your efforts if I
don't have to.

Thanks very much
--
Caspar
Comment 18 Steve Robenalt CLA 2011-06-20 11:02:32 EDT
(In reply to comment #17)
> (In reply to comment #14)
> > I verified today that my prototype offline mode application
> > works correctly
> 
> Steve, would you be willing to share this prototype, or is this
> something you need to keep proprietary? Either way is fine
> of course, just asking so I won't duplicate your efforts if I
> don't have to.
> 
> Thanks very much
> --
> Caspar

Hi Caspar,

The part that handles CDO interactions is implemented as an OSGI service using Declarative Services (DS) for wiring, so I can easily share the CDO part without sharing the entire application, which is too big for a demo app in any case. I've been intending to do a writeup on it and post it on the CDO wiki. To that end, I've expanded the example a bit and have created 3 versions of the same service:

1) Local CDO repository using a JVM connector.
2) Remote CDO repository using a TCP connector.
3) Offline CDO repository using JVM local and TCP remote.

All use the same interface and are interchangeable.

Also, the DS wiring makes it very easy to use, but is not critical to the example.

Steve
Comment 19 Caspar D. CLA 2011-06-21 02:36:22 EDT
(In reply to comment #18)

Sounds great!

Will it be possible for you to make this stuff available soon? I don't
mean to rush you, so if you're not ready that's fine. It's just that
I'd like to get all these broken unit tests working again, and right
now I don't have an example of a working setup.

Thanks
--
Caspar
Comment 20 Steve Robenalt CLA 2011-06-21 13:06:54 EDT
(In reply to comment #19)
> (In reply to comment #18)
> 
> Sounds great!
> 
> Will it be possible for you to make this stuff available soon? I don't
> mean to rush you, so if you're not ready that's fine. It's just that
> I'd like to get all these broken unit tests working again, and right
> now I don't have an example of a working setup.
> 
> Thanks
> --
> Caspar

Hi Caspar,

I'll post my persistence class on this bug for you to work with. As a prototype, it has some config props hard coded, and a few places where possible failures are ignored, but it should be suitable for testing. If you need the DS component.xml, I can provide it, but it would probably be simpler to put the same code into a plugin/activator class.

Steve
Comment 21 Steve Robenalt CLA 2011-06-21 13:17:21 EDT
Created attachment 198348 [details]
Prototype Offline Persistence Service source code

Attaching a working prototype of CDO offline persistence including automatic merge to the main branch when connectivity is reestablished.

For IP purposes, the code was created by me - based heavily on examples in the Eclipse wiki and CDO examples code. I've indicated in the main comment that it's under the EPL and will provide any information necessary to allow it to be used under the same licensing terms as the rest of CDO and EMF.
Comment 22 Steve Robenalt CLA 2011-06-22 11:42:32 EDT
Hi Caspar,

FYI, further testing has shown that my example still has problems with some cases when merging an offline branch. I'm currently looking at the CDOWorkspaceImpl class, which contains comments relative to the problem I'm having (Local Ids need to be mapped to proper temp ids), to determine how best to apply it to my case and will post an updated version when it is ready.

Steve
Comment 23 Steve Robenalt CLA 2011-06-22 13:20:34 EDT
Merge problem identified. The fix for bug #341081 causes the merge from the offline branch to fail. I've added a comment to that bug with details.
Comment 24 Eike Stepper CLA 2011-07-06 00:58:23 EDT
(In reply to comment #16)

Hi Steve,

I guess you've seen that Caspar has done some significant improvements in Bug 350649. Together with some simple changes in Scenario.java it seems now that all offline tests are passing again. Is it possible for you to re-evaluate the changes you proposed here so that I get a clue what's left to fix?
Comment 25 Steve Robenalt CLA 2011-07-06 11:15:56 EDT
(In reply to comment #24)
> (In reply to comment #16)
> 
> Hi Steve,
> 
> I guess you've seen that Caspar has done some significant improvements in Bug
> 350649. Together with some simple changes in Scenario.java it seems now that
> all offline tests are passing again. Is it possible for you to re-evaluate the
> changes you proposed here so that I get a clue what's left to fix?

Yes, I'll go back and take a look and post a followup.
Comment 26 Eike Stepper CLA 2011-07-06 12:03:11 EDT
Thank you Steve!
Comment 27 Eike Stepper CLA 2012-06-05 07:29:01 EDT
Moving all open bug reports to 4.1 because the release is very near and it's hghly unlikely that there will be spare time to address 4.0 problems.

Please make sure that your patches can be applied against the master branch and that your problem is not already fixed there!!!
Comment 28 Eike Stepper CLA 2012-08-14 22:57:23 EDT
Moving all open issues to 4.2. Open bugs can be ported to 4.1 maintenance after they've been fixed in master.
Comment 29 Eike Stepper CLA 2012-12-31 04:52:51 EST
I believe that most of the mentioned problems have been fixed recently through other bugs (namely the new offline example app).