Community
Participate
Working Groups
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008072820 Firefox/3.0.1 Build Identifier: CDO 2.0.0.v200906160459 Set up a test environment that keeps a CDO server very busy. Open a CDO session with short commitTimeout, e.g. 3 seconds. Create resource, add an object, commit. Observe timeout (see stacktrace1.txt): org.eclipse.net4j.util.concurrent.TimeoutRuntimeException. This is as expected. Then, catch and retry committing same transaction. In our test environment this gives (see stacktrace2.txt): java.lang.IllegalStateException: Timer already cancelled." I must admit that I have been unable to reproduce this is a pure CDO testcase. This is puzzling. But our stacktraces are pretty convincing. The problem is real. But how does the Timer get canceled? I've been unable to find any code in CDO that might do this, but hopefully you guys can think of something. Reproducible: Sometimes
Created attachment 148866 [details] Stacktrace 1 (first commit)
Created attachment 148867 [details] Stacktrace 2 (2nd commit, this is the bug)
In the bug description, "I must admit that I have been unable to reproduce this is a pure CDO testcase." should have read: "I must admit that I have been unable to reproduce this IN a pure CDO testcase." It would be nice if bug reporters could edit the bug description of their own bugs...
I can remember running into that one once or twice. I had a chat with Eike about it back then. As far as I can remember, our final theory about the timer being cancelled was that the garbage collector would collect it for some reason. But that was only a theory. Basically, I think there are two ways to solve this. 1. If it is really some GC-related problem, we could move the static TIMER to the OM-class, which should never be GC'd. 2. If it is not, we could try to replace the if(TIMER == null) condition with if(TIMER == null || TIMER.isCanceled()) Eike, can you remember our talk a few weeks ago (the guess about GC was an idea from Ed)? Do you have a suggestion?
Stefan, Thanks for your comments. I don't see how the GC explanation could hold, but perhaps Ed had a sound argument for it. And I wanted to try the "TIMER.isCanceled()" approach -- but java.util.Timer has no such method, nor any way (it seems) of checking whether an instance has been canceled or not. With a bit of Googling I stumbled on a simpler explanation: a Timer may consider itself canceled if an unhandled exception occurs during execution of one of its tasks. A basic test confirms that indeed this is the case, and a glance at CDO's subclasses of TimerTask gives me the impression that they only have top-level catches for *checked* exceptions. Any RuntimeException or Error occuring in the task could therefore cancel the Timer. Shall we patch all TimerTasks to catch Throwable instead of Exception? /Caspar
I think the catch idea is worth trying.
Created attachment 149419 [details] Patch for 2.0.0 Uploading patch. This makes all TimerTasks in CDO/Net4J catch Throwable in their run() method.
Good catches! Committed to R2_0_maintenance.
Comment on attachment 149419 [details] Patch for 2.0.0 Jasper, please confirm that: 1) The number of lines that you changed is smaller than 250. 2) You are the only author of these changed lines. 3) You apply the EPL to these changed lines.
(In reply to comment #9) > (From update of attachment 149419 [details]) > Jasper, please confirm that: > > 1) The number of lines that you changed is smaller than 250. > 2) You are the only author of these changed lines. > 3) You apply the EPL to these changed lines. I confirm.
Available in 2.0.2: https://build.eclipse.org/hudson/job/emf-cdo-maintenance/44/artifact/result/site.p2/