Community
Participate
Working Groups
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 Build Identifier: It is impossible to interrupt a Job.join(). This is because the job catches the interrupted exception and spins around again. The javadoc indicates that the Job may be interrupted. The method signature declares that it can throw an InterruptedException, but it never does. I can create a patch for this, but I'm waiting for bug 283449 to be approved first. Reproducible: Always
This is a pretty hideous bug and is a blocker for some adopters. It seems absolutely incorrect, and directly against javadoc, that this is the current implementation. Either change the javadoc, or fix the interruption. Current code is as follows, at the end of the JobManager.join(InternalJob) method: try { while (true) { //notify hook to service pending syncExecs before falling asleep lockManager.aboutToWait(job.getThread()); try { if (barrier.acquire(Long.MAX_VALUE)) break; } catch (InterruptedException e) { //loop and keep trying } } } finally { lockManager.aboutToRelease(); job.removeJobChangeListener(listener); } It is fully ignoring the interruption.
According to git, the offending code has been there since 2006, so it has "always been this way", but, I'd still like to know for what reason or under what circumstances, the join() call would be interrupted and NOT want to be propagated. The javadoc indicates interrupts should be propagated, but clearly they aren't and never were. There must be a reason.
The current behaviour exists to handle a particular class of deadlock: - UI thread attempts to join a background job - The background job attempt to perform a syncExec -> Deadlock. The UI thread is blocked so it cannot service the syncExec, which blocks the background job. This deadlock is resolved as follows: - While UI thread is waiting on a join, it processes syncExec requests each time it is interrupted. - When a job posts a syncExec and the UI thread is blocked by a lock owned by that thread, the job thread interrupts the UI thread. In this way the syncExec is processed and the system never deadlocks. Because of this, an interrupt signal in the join method has a particular meaning, so it is not treated as a complete interrupt. The above deadlock scenario is quite common so it would be a breaking change to start propagating exceptions here. So it is not a very satisfying answer, but if your main concern is the disconnect between javadoc and current behaviour, I am inclined to fix the javadoc to match the long-standing behaviour. Unfortunately removing throws clause would be a breaking change so this would be a documentation only change.
I think the issue is the swallowing of the interrupt. That method doesn't actually has to support interruption. Instead of just looping, why not set a boolean that an interrupt occurred? Then, when leaving the method, check the boolean and if set then throw a new interrupted exception. Alternatively, you could call Thread.currentThread().interrupt(). But since this method is allowed to actually throw the exception, it could throw it. I don't think it really matters either way, as long as its restored upon exit.
(In reply to comment #4) > I think the issue is the swallowing of the interrupt. That method doesn't > actually has to support interruption. Instead of just looping, why not set > a boolean that an interrupt occurred? Then, when leaving the method, check > the boolean and if set then throw a new interrupted exception. Because we are sending the interrupt signal internally within the job framework to resolve the deadlock described above. Once we have caught and handled our own interrupt, and the job we are joining has completed, propagating the interrupt back to the client is misleading. It would imply the join failed and the job may still be running. Of course the problem is that the client might have been the one sending the interrupt and we have prevented them from aborting the join.
In the case where the origin of the interrupt is the Job manager/Lock manager system itself, it does seem odd that the interrupt would be sent to the client. However, int the 2nd point you brought up, it is very much expected. i.e. perhaps you have a long running task that depends on an interrupt signal. Say the client has interrupted this task. If that task had been running the Job.join() call, the task would miss the interrupt signal, and fail to be interrupted. i.e. just because the job.join() isn't interruptible, doesn't mean that other parts of the callers tasks are not. If the interrupt isn't restored, then the callers job continues to run even though it shouldn't have. It might be complicated, but it should be possible to determine if the interrupt was caused by the internal framework vs a foreign thread. That is, don't really on just the interrupt signal - also check a variable that says that this is an internal interrupt. The lock manager (or whatever is doing the interrupt) should set a flag + do the interrupt.
> but if your main concern is the disconnect between javadoc and current behaviour, This is not my main concern. My main concern is that there's absolutely no way for me to interrupt a thread which has joined a Job thread I don't have a reference to if that Job thread itself is frozen. For example, take this method body: Thread t1 = new Thread() { public void run() { // Launch a job which will freeze Job j = new Job () { public void run(IProgressMonitor mon) { Do Something someLock.wait(); Do something Else return Status.OK; } // Job can be canceled, but nobody has reference to do it protected void canceling() { synchronized(someLock) { // I'm being canceled. Interrupt the lock and let the job complete someLock.interrupt(); } } }.schedule(); // Now join try { j.join(); // Frozen } catch(InterruptedException ie) { // never reachable. Goal is if t1 is interrupted, cancel the frozen job j.cancel(); } } } t1.start(); // wait 5 seconds t1.interrupt(); // Does nothing // don't have access to job, so can't cancel it Imagine the Job freezes. I cannot interrupt it or cancel it because I have no reference to it. I do however have a reference to t1, which is joined to the job. After interrupting my t1 thread which is stopped on join(), I could then cancel the frozen job. But, an interrupt on t1 does nothing at all. Therefore, there's nothing I can do at all. I'm stuck with a frozen thread and a frozen job and both will remain that way forever no matter how many things I try to interrupt. It's obvious that the Job.join method was designed to be interruptable and the code that was added re: not throwing interrupts was done to fix the UI deadlock you mention. This doesn't absolve you from following javadoc, rather it compels you to find a workable solution that satisifies both. IF that involves using a package-protected or internal method to set a flag before doing your UI work, then that definitely makes sense. I'm sure a little bit of imagination can come up with a solution that solves both.
I have been thinking about this for some time and it seems there could be a way to reconcile both approaches. Comment 3 says that "an interrupt signal in the join method has a particular meaning" because of the deadlock scenario described there. But this deadlock scenario can happen only in case the join method is called from the UI thread, which means an interrupt signal in the join method has a particular meaning only for the UI thread. Unless there is another deadlock scenario not mentioned here, non-UI threads calling join are not deadlock-prone with syncExec's, thus we could propagate the exception. The proposal would be: - if in UI thread, then loop and keep trying - if in non-UI thread, propagate the exception There is already a method that tells whether this is the UI thread or not, so the fix could look like: } catch (InterruptedException e) { // if non-UI thread, propagate the exception if (lockManager.canBlock()) throw e; // if UI thread, loop and keep trying } This approach would solve the case from comment 7 under condition that the t1 thread is not the UI thread which would be better than not allowing to interrupt threads at all, plus we would still be safe from the deadlock scenario in comment 3.
Created attachment 233828 [details] Patch v.0.1 Patch implementing solution proposed in comment 8. This contribution complies with http://www.eclipse.org/legal/CoO.php. I will convert it to Gerrit if it looks reasonable.
John, can you please have a look?
It looks good.
Thanks so much for looking into this guys.
Thanks John! Gerrit change is here: https://git.eclipse.org/r/#/c/20645/
(In reply to John Arthorne from comment #11) > It looks good. Shall we push this for M6?
I pushed your change.
Here is the commit in master: http://git.eclipse.org/c/platform/eclipse.platform.runtime.git/commit/?id=de700a09e57a02876a9c5a888b76f604df300555
Thanks for all the help Szymon!
Verified in I20140303-2000.