This Bugzilla instance is deprecated, and most Eclipse projects now use GitHub or Eclipse GitLab. Please see the deprecation plan for details.
Bug 293312 - Job.join() can not be interrupted even though the javadoc says it can
Summary: Job.join() can not be interrupted even though the javadoc says it can
Status: VERIFIED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Runtime (show other bugs)
Version: 3.5   Edit
Hardware: PC Windows XP
: P3 critical (vote)
Target Milestone: 4.4 M6   Edit
Assignee: Szymon Ptaszkiewicz CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-26 09:54 EDT by Min Idzelis CLA
Modified: 2014-03-04 10:39 EST (History)
6 users (show)

See Also:
john.arthorne: review+


Attachments
Patch v.0.1 (3.58 KB, patch)
2013-07-26 09:37 EDT, Szymon Ptaszkiewicz CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Min Idzelis CLA 2009-10-26 09:54:30 EDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3
Build Identifier: 

It is impossible to interrupt a Job.join(). This is because the job catches the interrupted exception and spins around again. The javadoc indicates that the Job may be interrupted. The method signature declares that it can throw an InterruptedException, but it never does. 

I can create a patch for this, but I'm waiting for bug 283449 to be approved first. 

Reproducible: Always
Comment 1 Rob Stryker CLA 2013-05-07 08:22:50 EDT
This is a pretty hideous bug and is a blocker for some adopters. It seems absolutely incorrect, and directly against javadoc, that this is the current implementation. 

Either change the javadoc, or fix the interruption. 

Current code is as follows, at the end of the JobManager.join(InternalJob) method:

		try {
			while (true) {
				//notify hook to service pending syncExecs before falling asleep
				lockManager.aboutToWait(job.getThread());
				try {
					if (barrier.acquire(Long.MAX_VALUE))
						break;
				} catch (InterruptedException e) {
					//loop and keep trying
				}
			}
		} finally {
			lockManager.aboutToRelease();
			job.removeJobChangeListener(listener);
		}

It is fully ignoring the interruption.
Comment 2 Rob Stryker CLA 2013-05-07 13:05:30 EDT
According to git, the offending code has been there since 2006, so it has "always been this way", but, I'd still like to know for what reason or under what circumstances, the join() call would be interrupted and NOT want to be propagated. 

The javadoc indicates interrupts should be propagated, but clearly they aren't and never were. There must be a reason.
Comment 3 John Arthorne CLA 2013-05-07 14:31:46 EDT
The current behaviour exists to handle a particular class of deadlock:

 - UI thread attempts to join a background job
 - The background job attempt to perform a syncExec

-> Deadlock. The UI thread is blocked so it cannot service the syncExec, which blocks the background job.

This deadlock is resolved as follows:

- While UI thread is waiting on a join, it processes syncExec requests each time it is interrupted.
- When a job posts a syncExec and the UI thread is blocked by a lock owned by that thread, the job thread interrupts the UI thread.

In this way the syncExec is processed and the system never deadlocks. Because of this, an interrupt signal in the join method has a particular meaning, so it is not treated as a complete interrupt. The above deadlock scenario is quite common so it would be a breaking change to start propagating exceptions here.

So it is not a very satisfying answer, but if your main concern is the disconnect between javadoc and current behaviour, I am inclined to fix the javadoc to match the long-standing behaviour. Unfortunately removing throws clause would be a breaking change so this would be a documentation only change.
Comment 4 Min Idzelis CLA 2013-05-07 14:48:17 EDT
I think the issue is the swallowing of the interrupt. That method doesn't actually  has to support interruption. Instead of just looping, why not set a boolean that an interrupt occurred? Then, when leaving the method, check the boolean and if set then throw a new interrupted exception.

Alternatively, you could call Thread.currentThread().interrupt(). But since this method is allowed to actually throw the exception, it could throw it. I don't think it really matters either way, as long as its restored upon exit.
Comment 5 John Arthorne CLA 2013-05-07 15:04:54 EDT
(In reply to comment #4)
> I think the issue is the swallowing of the interrupt. That method doesn't
> actually  has to support interruption. Instead of just looping, why not set
> a boolean that an interrupt occurred? Then, when leaving the method, check
> the boolean and if set then throw a new interrupted exception.

Because we are sending the interrupt signal internally within the job framework to resolve the deadlock described above. Once we have caught and handled our own interrupt, and the job we are joining has completed, propagating the interrupt back to the client is misleading. It would imply the join failed and the job may still be running. Of course the problem is that the client might have been the one sending the interrupt and we have prevented them from aborting the join.
Comment 6 Min Idzelis CLA 2013-05-07 15:12:41 EDT
In the case where the origin of the interrupt is the Job manager/Lock manager system itself, it does seem odd that the interrupt would be sent to the client. 

However, int the 2nd point you brought up, it is very much expected. i.e. perhaps you have a long running task that depends on an interrupt signal. Say the client has interrupted this task. If that task had been running the Job.join() call, the task would miss the interrupt signal, and fail to be interrupted. 

i.e. just because the job.join() isn't interruptible, doesn't mean that other parts of the callers tasks are not. If the interrupt isn't restored, then the callers job continues to run even though it shouldn't have. 

It might be complicated, but it should be possible to determine if the interrupt was caused by the internal framework vs a foreign thread. That is, don't really on just the interrupt signal - also check a variable that says that this is an internal interrupt. The lock manager (or whatever is doing the interrupt) should set a flag + do the interrupt.
Comment 7 Rob Stryker CLA 2013-05-07 16:29:09 EDT
> but if your main concern is the disconnect between javadoc and current behaviour, 

This is not my main concern. My main concern is that there's absolutely no way for me to interrupt a thread which has joined a Job thread I don't have a reference to if that Job thread itself is frozen. 

For example, take this method body:

Thread t1 = new Thread() {
  public void run() {
   // Launch a job which will freeze
   Job j = new Job () { 
     public void run(IProgressMonitor mon) {
              Do Something
              someLock.wait(); 
              Do something Else
              return Status.OK;
          }

     // Job can be canceled, but nobody has reference to do it
     protected void canceling() {
        synchronized(someLock) {
          // I'm being canceled. Interrupt the lock and let the job complete
          someLock.interrupt();
        }
    }
  }.schedule();

   // Now join
   try {
      j.join();
      // Frozen
   } catch(InterruptedException ie) {
     // never reachable. Goal is if t1 is interrupted, cancel the frozen job
     j.cancel();
   }
  }
}

t1.start();

// wait 5 seconds
t1.interrupt(); // Does nothing

// don't have access to job, so can't cancel it
Imagine the Job freezes. I cannot interrupt it or cancel it because I have no reference to it. I do however have a reference to t1, which is joined to the job. After interrupting my t1 thread which is stopped on join(), I could then cancel the frozen job.  But, an interrupt on t1 does nothing at all. Therefore, there's nothing I can do at all. I'm stuck with a frozen thread and a frozen job and both will remain that way forever no matter how many things I try to interrupt. 

It's obvious that the Job.join method was designed to be interruptable and the code that was added re: not throwing interrupts was done to fix the UI deadlock you mention. This doesn't absolve you from following javadoc, rather it compels you to find a workable solution that satisifies both. 

IF that involves using a package-protected or internal method to set a flag before doing your UI work, then that definitely makes sense. 

I'm sure a little bit of imagination can come up with a solution that solves both.
Comment 8 Szymon Ptaszkiewicz CLA 2013-07-25 12:38:20 EDT
I have been thinking about this for some time and it seems there could be a way to reconcile both approaches.

Comment 3 says that "an interrupt signal in the join method has a particular meaning" because of the deadlock scenario described there. But this deadlock scenario can happen only in case the join method is called from the UI thread, which means an interrupt signal in the join method has a particular meaning only for the UI thread. Unless there is another deadlock scenario not mentioned here, non-UI threads calling join are not deadlock-prone with syncExec's, thus we could propagate the exception.

The proposal would be:
 - if in UI thread, then loop and keep trying
 - if in non-UI thread, propagate the exception

There is already a method that tells whether this is the UI thread or not, so the fix could look like:

	} catch (InterruptedException e) {
		// if non-UI thread, propagate the exception
		if (lockManager.canBlock())
			throw e;
		// if UI thread, loop and keep trying
	}

This approach would solve the case from comment 7 under condition that the t1 thread is not the UI thread which would be better than not allowing to interrupt threads at all, plus we would still be safe from the deadlock scenario in comment 3.
Comment 9 Szymon Ptaszkiewicz CLA 2013-07-26 09:37:48 EDT
Created attachment 233828 [details]
Patch v.0.1

Patch implementing solution proposed in comment 8. This contribution complies with http://www.eclipse.org/legal/CoO.php. I will convert it to Gerrit if it looks reasonable.
Comment 10 Szymon Ptaszkiewicz CLA 2014-01-14 11:34:16 EST
John, can you please have a look?
Comment 11 John Arthorne CLA 2014-01-14 14:15:44 EST
It looks good.
Comment 12 Rob Stryker CLA 2014-01-15 00:38:02 EST
Thanks so much for looking into this guys.
Comment 13 Szymon Ptaszkiewicz CLA 2014-01-15 05:56:48 EST
Thanks John! Gerrit change is here: https://git.eclipse.org/r/#/c/20645/
Comment 14 Szymon Ptaszkiewicz CLA 2014-02-14 07:48:07 EST
(In reply to John Arthorne from comment #11)
> It looks good.

Shall we push this for M6?
Comment 15 John Arthorne CLA 2014-02-16 11:43:29 EST
I pushed your change.
Comment 17 Rob Stryker CLA 2014-02-24 02:27:28 EST
Thanks for all the help  Szymon!
Comment 18 Szymon Ptaszkiewicz CLA 2014-03-04 10:39:25 EST
Verified in I20140303-2000.