| Summary: | OfflineCloneExample fails with IllegalArgumentException: Cannot end transaction with unknown timestamp 1305996854765 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Modeling] EMF | Reporter: | Martin Fluegge <martin.fluegge> | ||||||||||
| Component: | cdo.core | Assignee: | Eike Stepper <stepper> | ||||||||||
| Status: | CLOSED WORKSFORME | QA Contact: | Eike Stepper <stepper> | ||||||||||
| Severity: | major | ||||||||||||
| Priority: | P3 | CC: | caspar_d, cyril.jaquier, Ed.Merks, martin.fluegge, steve | ||||||||||
| Version: | 4.2 | Flags: | Ed.Merks:
pmc_approved+
|
||||||||||
| Target Milestone: | --- | ||||||||||||
| Hardware: | PC | ||||||||||||
| OS: | Windows XP | ||||||||||||
| Whiteboard: | |||||||||||||
| Bug Depends on: | |||||||||||||
| Bug Blocks: | 336806 | ||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Martin Fluegge
I suspect that the entire RepositorySynchronizer might be broken: Caspar, can you please look at this urgently? Indeed, all H2 offline tests fail with: [ERROR] Cannot end transaction with unknown timestamp 1306169975876 java.lang.IllegalArgumentException: Cannot end transaction with unknown timestamp 1306169975876 at org.eclipse.emf.cdo.internal.server.TimeStampAuthority.endCommit(TimeStampAuthority.java:119) at org.eclipse.emf.cdo.internal.server.Repository.endCommit(Repository.java:795) at org.eclipse.emf.cdo.internal.server.TransactionCommitContext.commit(TransactionCommitContext.java:445) at org.eclipse.emf.cdo.internal.server.syncing.SynchronizableRepository.handleCommitInfo(SynchronizableRepository.java:215) at org.eclipse.emf.cdo.tests.config.impl.RepositoryConfig$OfflineConfig$1.handleCommitInfo(RepositoryConfig.java:599) at org.eclipse.emf.cdo.internal.server.syncing.RepositorySynchronizer$CommitRunnable.run(RepositorySynchronizer.java:511) at org.eclipse.net4j.util.concurrent.QueueRunner.work(QueueRunner.java:26) at org.eclipse.net4j.util.concurrent.QueueRunner.work(QueueRunner.java:1) at org.eclipse.net4j.util.concurrent.QueueWorker.doWork(QueueWorker.java:81) at org.eclipse.net4j.util.concurrent.QueueWorker.work(QueueWorker.java:72) at org.eclipse.net4j.util.concurrent.Worker$WorkerThread.run(Worker.java:206) Created attachment 196397 [details]
Patch v1
This allows some tests to pass, but not all.
Committed revision 7962 This is a minor fix that does not cure all problems I discovered in the tests, but it is essential to only get the OfflineExample up and running. PMC, please approve for late contribution to Indigo. (In reply to comment #5) > This is a minor fix that does not cure all problems I discovered in the tests, > but it is essential to only get the OfflineExample up and running. PMC, please > approve for late contribution to Indigo. Let me know if there is any way I can help with this issue. (In reply to comment #6) > Let me know if there is any way I can help with this issue. I *think* I've fixed this particular problem but the "CDO AllTests (H2 offline)" test launch demos other problems. If you want to and have time you could try to run these tests and see if you can find a reason/fix. I'm currently quite busy completing the docs... (In reply to comment #7) > (In reply to comment #6) > > Let me know if there is any way I can help with this issue. > > I *think* I've fixed this particular problem but the "CDO AllTests (H2 > offline)" test launch demos other problems. If you want to and have time you > could try to run these tests and see if you can find a reason/fix. I'm > currently quite busy completing the docs... I'll set up a workspace and see if I can get it to run. It's best to follow this simple tutorial: http://wiki.eclipse.org/CDO_Source_Installation Created attachment 197661 [details]
Proposed updated checkEvent() method.
Allows the event listener to be polled properly.
Created attachment 197662 [details]
Revised test cases.
Updated assertions to reflect creation of 2 folders.
Sorry to put the proposed patches before the text. I thought the attachments would be included when I committed the entire bug. Attached are 2 proposed patches to cover problems I found with the offline tests in MEM configuration, rather than H2. Based on the nature of the issues, I thought it best to post these now since they will almost certainly affect other test cases. First, there are 2 test cases that were failing due to a discrepancy in the assert statements. Apparently, the original expected values for the number of CDO objects created and the number of events produced expected a single folder, but the folder created with the company contains a primary and a subfolder component, which causes two folder objects to be created. Since the test case doesn't appear to have changed wrt the folder being created, I presume that CDO itself has changed, but I did not verify that any further. The second problem affected the testMasterCommits_NotificationsFromBackup test case and caused it to miss the events that were produced when the CDO Objects were created/updated. The origin of the problem is with the CDOSessionImpl.waitForUpdate() method, which was revised as part of the fix for bug #339064. The updated waitForUpdate() method does not wait if there are no views registered. Since there were no views on the backupSession, it simply returned a value of true immediately, which is expected by the test case. However, the checkEvent() method as it was implemented did not properly poll the listener, so the event would only be caught if it existed when the checkEvent method was first called. I've submitted a proposed patch which fixes the checkEvent method so that it polls properly, and thus fixes the test cases. I'll defer to the CDO experts as to whether or not the waitForUpdate method should actually wait in the absence of views. It seems to me that it should wait, but maybe that's only a concern for this test case. I'm continuing to test as well. Note that the proposed patches are not directly related to this bug. Rather, they are related to the failed test cases associated with offline mode. I verified today that my prototype offline mode application works correctly when I run a locally built CDO 4.0.0 incorporating the fix in revision 7962 mentioned by Eike in comment #4. Using the same build, my automatic merge to main when the connection is restored also works. I'm continuing to look at failing/error test cases in the H2 offline configuration - there are quite a few. Progress is a bit slow as I'm learning the CDO and test codebase in the process. (In reply to comment #14) > I verified today that my prototype offline mode application works correctly > when I run a locally built CDO 4.0.0 incorporating the fix in revision 7962 > mentioned by Eike in comment #4. That's good news. > Using the same build, my automatic merge to main when the connection is > restored also works. Even better ;-) > I'm continuing to look at failing/error test cases in the H2 offline > configuration - there are quite a few. It can be related with the test setups, or not... > Progress is a bit slow as I'm learning > the CDO and test codebase in the process. Well, I expect that's pure fun :P (In reply to comment #15) Okay, I found the cause of at least some of the test errors in the H2 offline group (and possibly other tests as well). There's a minor flaw in the test setUp() methods regarding the activation of the acceptors. In ManagedContainer.getElement(String, String, String), which gets the acceptor and activates it (because the activate flag is forced to true, the first action is to call checkActive(), which returns normally if the acceptor is already active, but throws IllegalStateException if the acceptor is currently inactive. As a result, it never gets to the point where it can be activated. This normally causes the first test in a test case to pass (or fail) normally, close the acceptor during tearDown() (leaving it inactive) and causes the remaining tests to throw an exception in the test setup, resulting in an error. It seems to me that the best fix would be to make checkActive() a postcondition (if the activate flag is true) for the method, rather than a precondition. I'm experimenting with this change locally, but wanted to be sure I'm not missing something. Should I create a new bug for this? > (In reply to comment #14) > > I verified today that my prototype offline mode application works correctly > > when I run a locally built CDO 4.0.0 incorporating the fix in revision 7962 > > mentioned by Eike in comment #4. > > That's good news. > > > Using the same build, my automatic merge to main when the connection is > > restored also works. > > Even better ;-) > > > I'm continuing to look at failing/error test cases in the H2 offline > > configuration - there are quite a few. > > It can be related with the test setups, or not... > > > Progress is a bit slow as I'm learning > > the CDO and test codebase in the process. > > Well, I expect that's pure fun :P (In reply to comment #14) > I verified today that my prototype offline mode application > works correctly Steve, would you be willing to share this prototype, or is this something you need to keep proprietary? Either way is fine of course, just asking so I won't duplicate your efforts if I don't have to. Thanks very much -- Caspar (In reply to comment #17) > (In reply to comment #14) > > I verified today that my prototype offline mode application > > works correctly > > Steve, would you be willing to share this prototype, or is this > something you need to keep proprietary? Either way is fine > of course, just asking so I won't duplicate your efforts if I > don't have to. > > Thanks very much > -- > Caspar Hi Caspar, The part that handles CDO interactions is implemented as an OSGI service using Declarative Services (DS) for wiring, so I can easily share the CDO part without sharing the entire application, which is too big for a demo app in any case. I've been intending to do a writeup on it and post it on the CDO wiki. To that end, I've expanded the example a bit and have created 3 versions of the same service: 1) Local CDO repository using a JVM connector. 2) Remote CDO repository using a TCP connector. 3) Offline CDO repository using JVM local and TCP remote. All use the same interface and are interchangeable. Also, the DS wiring makes it very easy to use, but is not critical to the example. Steve (In reply to comment #18) Sounds great! Will it be possible for you to make this stuff available soon? I don't mean to rush you, so if you're not ready that's fine. It's just that I'd like to get all these broken unit tests working again, and right now I don't have an example of a working setup. Thanks -- Caspar (In reply to comment #19) > (In reply to comment #18) > > Sounds great! > > Will it be possible for you to make this stuff available soon? I don't > mean to rush you, so if you're not ready that's fine. It's just that > I'd like to get all these broken unit tests working again, and right > now I don't have an example of a working setup. > > Thanks > -- > Caspar Hi Caspar, I'll post my persistence class on this bug for you to work with. As a prototype, it has some config props hard coded, and a few places where possible failures are ignored, but it should be suitable for testing. If you need the DS component.xml, I can provide it, but it would probably be simpler to put the same code into a plugin/activator class. Steve Created attachment 198348 [details]
Prototype Offline Persistence Service source code
Attaching a working prototype of CDO offline persistence including automatic merge to the main branch when connectivity is reestablished.
For IP purposes, the code was created by me - based heavily on examples in the Eclipse wiki and CDO examples code. I've indicated in the main comment that it's under the EPL and will provide any information necessary to allow it to be used under the same licensing terms as the rest of CDO and EMF.
Hi Caspar, FYI, further testing has shown that my example still has problems with some cases when merging an offline branch. I'm currently looking at the CDOWorkspaceImpl class, which contains comments relative to the problem I'm having (Local Ids need to be mapped to proper temp ids), to determine how best to apply it to my case and will post an updated version when it is ready. Steve Merge problem identified. The fix for bug #341081 causes the merge from the offline branch to fail. I've added a comment to that bug with details. (In reply to comment #16) Hi Steve, I guess you've seen that Caspar has done some significant improvements in Bug 350649. Together with some simple changes in Scenario.java it seems now that all offline tests are passing again. Is it possible for you to re-evaluate the changes you proposed here so that I get a clue what's left to fix? (In reply to comment #24) > (In reply to comment #16) > > Hi Steve, > > I guess you've seen that Caspar has done some significant improvements in Bug > 350649. Together with some simple changes in Scenario.java it seems now that > all offline tests are passing again. Is it possible for you to re-evaluate the > changes you proposed here so that I get a clue what's left to fix? Yes, I'll go back and take a look and post a followup. Thank you Steve! Moving all open bug reports to 4.1 because the release is very near and it's hghly unlikely that there will be spare time to address 4.0 problems. Please make sure that your patches can be applied against the master branch and that your problem is not already fixed there!!! Moving all open issues to 4.2. Open bugs can be ported to 4.1 maintenance after they've been fixed in master. I believe that most of the mentioned problems have been fixed recently through other bugs (namely the new offline example app). |