Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 403271 - JobManager.join() can hang when the JobManager is suspended
Summary: JobManager.join() can hang when the JobManager is suspended
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Runtime (show other bugs)
Version: 4.2.1   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: 4.3 M7   Edit
Assignee: John Arthorne CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-13 17:33 EDT by Mike Kucera CLA
Modified: 2013-04-29 14:16 EDT (History)
1 user (show)

See Also:


Attachments
Patch v.0.1 (4.20 KB, patch)
2013-03-19 07:59 EDT, Szymon Ptaszkiewicz CLA
no flags Details | Diff
Patch v.0.2 (10.19 KB, patch)
2013-03-20 13:03 EDT, Szymon Ptaszkiewicz CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kucera CLA 2013-03-13 17:33:48 EDT
I'm using the JobManager in my JUnits to work around a defect in CDT (http://bugs.eclipse.org/327126). The problem is that if you programatically delete the project while an indexer thread is running it can deadlock. So to work around it I'm using code that looks like this...

IJobManager jobManager = Job.getJobManager();
try {
    jobManager.suspend();
    jobManager.join(CCorePlugin.getIndexManager(), null);	                                 
    cdtProject.delete(true, true, null); 
}
finally {
    jobManager.resume();
}

The idea here is that the suspend call prevents any new indexer jobs from starting, then the join call waits until any existing indexer jobs are done. Therefore the call to delete should be in the clear. But... the call to join hangs forever.

I traced this in the debugger and it looks like a defect in the join method.

Here is how the hang occurs... First it creates a set that should only contain jobs from the job family that are in the running state.  Then it creates a listener object to remove or add jobs from the set. The problem is the scheduled method of the listener does not do the same filtering on job state that was done when the set was first created. This means that new jobs that are in the waiting state can get added to the set, but they shouldn't be. So in my case what's happening is that an indexer job in the waiting state is being added to the set at some point. But since the JobManager is suspended the indexer job will always stay in the waiting state and will never be removed from the set. This causes the while loop at the bottom of the method to spin forever.
Comment 1 John Arthorne CLA 2013-03-14 09:34:26 EDT
I think you are correct that this is a bug in JobManager#join. The API states that when the job manager is suspended it will only join running jobs, but this is not the case as you point out. Thanks for the detailed report and analysis.
Comment 2 John Arthorne CLA 2013-03-14 09:35:40 EDT
Szymon P if you'd like to contribute a fix here it would be welcome.
Comment 3 Szymon Ptaszkiewicz CLA 2013-03-14 21:04:52 EDT
(In reply to comment #2)
> Szymon P if you'd like to contribute a fix here it would be welcome.

I will take a look.
Comment 4 Szymon Ptaszkiewicz CLA 2013-03-19 07:59:17 EDT
Created attachment 228624 [details]
Patch v.0.1

The fix is simple but the test requires some explanation.

There are three jobs required to reproduce the problem:
a) main job that calls manager.join(..)
a) running job for which the main job is waiting for
c) waiting job scheduled when the job manager is suspended and the main job is already joining the running job

In order to make the test repeatable and run the scenario always when the exact schedule conditions of waiting job are met additional synchronization using barrier is used. As usual with barriers, instead of a dedalock we get timeout when the test fails.
Comment 5 Szymon Ptaszkiewicz CLA 2013-03-19 08:14:00 EDT
(In reply to comment #4)
> There are three jobs required to reproduce the problem:
> a) ...
> a) ...
> c) ...

Typo, should be a), b) and c).
Comment 6 Szymon Ptaszkiewicz CLA 2013-03-20 08:57:07 EDT
(In reply to comment #4)
> Created attachment 228624 [details]
> Patch v.0.1
> 
> The fix is simple but the test requires some explanation.

I was wrong. It is simple only when we know when the suspend method is called. It can be called by any thread at any time and even multiple times so Patch v.0.1 does not solve the problem completely, only the case described in comment 0.
Comment 7 Szymon Ptaszkiewicz CLA 2013-03-20 13:03:20 EDT
Created attachment 228730 [details]
Patch v.0.2

The patch fixes problem described in comment 0 and additional cases when job is scheduled when job manager is suspended.
Comment 8 Szymon Ptaszkiewicz CLA 2013-04-09 08:33:34 EDT
John, when you get a moment, can you please review? Thanks!
Comment 9 John Arthorne CLA 2013-04-29 14:16:21 EDT
Thanks Szymon. I released fix and test with one small change. If the job manager is suspended we don't need to do anything when a job is scheduled. We only need to wait on it if/when the job starts running.

http://git.eclipse.org/c/platform/eclipse.platform.runtime.git/commit/?id=c912fe926531d35e8d5f53b3b781bf85762262c2