Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 291462 - Second commit after first one times out, throws "Timer already cancelled"
Summary: Second commit after first one times out, throws "Timer already cancelled"
Status: CLOSED FIXED
Alias: None
Product: EMF
Classification: Modeling
Component: cdo.core (show other bugs)
Version: 2.0   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Eike Stepper CLA
QA Contact: Eike Stepper CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-06 07:51 EDT by Caspar D. CLA
Modified: 2010-06-29 09:22 EDT (History)
3 users (show)

See Also:
stepper: galileo+
stepper: review+


Attachments
Stacktrace 1 (first commit) (1.73 KB, text/plain)
2009-10-06 07:54 EDT, Caspar D. CLA
no flags Details
Stacktrace 2 (2nd commit, this is the bug) (1.92 KB, text/plain)
2009-10-06 07:55 EDT, Caspar D. CLA
no flags Details
Patch for 2.0.0 (5.38 KB, patch)
2009-10-13 05:00 EDT, Caspar D. CLA
stepper: iplog+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Caspar D. CLA 2009-10-06 07:51:47 EDT
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008072820 Firefox/3.0.1
Build Identifier: CDO 2.0.0.v200906160459

Set up a test environment that keeps a CDO server very busy. Open a CDO session with short commitTimeout, e.g. 3 seconds. Create resource, add an object, commit. Observe timeout (see stacktrace1.txt):

  org.eclipse.net4j.util.concurrent.TimeoutRuntimeException. 

This is as expected. Then, catch and retry committing same transaction. In our test environment this gives (see stacktrace2.txt): 

  java.lang.IllegalStateException: Timer already cancelled."

I must admit that I have been unable to reproduce this is a pure CDO testcase. This is puzzling. But our stacktraces are pretty convincing. The problem is real. But how does the Timer get canceled? I've been unable to find any code in CDO that might do this, but hopefully you guys can think of something.

Reproducible: Sometimes
Comment 1 Caspar D. CLA 2009-10-06 07:54:58 EDT
Created attachment 148866 [details]
Stacktrace 1 (first commit)
Comment 2 Caspar D. CLA 2009-10-06 07:55:44 EDT
Created attachment 148867 [details]
Stacktrace 2 (2nd commit, this is the bug)
Comment 3 Caspar D. CLA 2009-10-06 08:00:01 EDT
In the bug description,

  "I must admit that I have been unable to reproduce this is a pure CDO testcase."

should have read:

  "I must admit that I have been unable to reproduce this IN a pure CDO testcase."

It would be nice if bug reporters could edit the bug description of their own bugs...
Comment 4 Stefan Winkler CLA 2009-10-06 08:18:21 EDT
I can remember running into that one once or twice. I had a chat with Eike about it back then. 

As far as I can remember, our final theory about the timer being cancelled was that the garbage collector would collect it for some reason. But that was only a theory. 

Basically, I think there are two ways to solve this.

1. If it is really some GC-related problem, we could move the static TIMER to the OM-class, which should never be GC'd. 

2. If it is not, we could try to replace the 
  if(TIMER == null) 
condition with
  if(TIMER == null || TIMER.isCanceled())

Eike, can you remember our talk a few weeks ago (the guess about GC was an idea from Ed)? Do you have a suggestion?
Comment 5 Caspar D. CLA 2009-10-07 01:13:52 EDT
Stefan,

Thanks for your comments. I don't see how the GC explanation could hold, but perhaps Ed had a sound argument for it. And I wanted to try the "TIMER.isCanceled()" approach -- but java.util.Timer has no such method, nor any way (it seems) of checking whether an instance has been canceled or not.

With a bit of Googling I stumbled on a simpler explanation: a Timer may consider itself canceled if an unhandled exception occurs during execution of one of its tasks. A basic test confirms that indeed this is the case, and a glance at CDO's subclasses of TimerTask gives me the impression that they only have top-level catches for *checked* exceptions. Any RuntimeException or Error occuring in the task could therefore cancel the Timer.

Shall we patch all TimerTasks to catch Throwable instead of Exception?

/Caspar
Comment 6 Eike Stepper CLA 2009-10-07 03:14:06 EDT
I think the catch idea is worth trying.
Comment 7 Caspar D. CLA 2009-10-13 05:00:51 EDT
Created attachment 149419 [details]
Patch for 2.0.0

Uploading patch.

This makes all TimerTasks in CDO/Net4J catch Throwable in their run() method.
Comment 8 Eike Stepper CLA 2009-11-05 07:42:12 EST
Good catches!

Committed to R2_0_maintenance.
Comment 9 Eike Stepper CLA 2009-11-05 07:47:09 EST
Comment on attachment 149419 [details]
Patch for 2.0.0

Jasper, please confirm that:

1) The number of lines that you changed is smaller than 250.
2) You are the only author of these changed lines.
3) You apply the EPL to these changed lines.
Comment 10 Caspar D. CLA 2009-11-05 22:32:04 EST
(In reply to comment #9)
> (From update of attachment 149419 [details])
> Jasper, please confirm that:
> 
> 1) The number of lines that you changed is smaller than 250.
> 2) You are the only author of these changed lines.
> 3) You apply the EPL to these changed lines.

I confirm.