Community
Participate
Working Groups
I'm using the JobManager in my JUnits to work around a defect in CDT (http://bugs.eclipse.org/327126). The problem is that if you programatically delete the project while an indexer thread is running it can deadlock. So to work around it I'm using code that looks like this... IJobManager jobManager = Job.getJobManager(); try { jobManager.suspend(); jobManager.join(CCorePlugin.getIndexManager(), null); cdtProject.delete(true, true, null); } finally { jobManager.resume(); } The idea here is that the suspend call prevents any new indexer jobs from starting, then the join call waits until any existing indexer jobs are done. Therefore the call to delete should be in the clear. But... the call to join hangs forever. I traced this in the debugger and it looks like a defect in the join method. Here is how the hang occurs... First it creates a set that should only contain jobs from the job family that are in the running state. Then it creates a listener object to remove or add jobs from the set. The problem is the scheduled method of the listener does not do the same filtering on job state that was done when the set was first created. This means that new jobs that are in the waiting state can get added to the set, but they shouldn't be. So in my case what's happening is that an indexer job in the waiting state is being added to the set at some point. But since the JobManager is suspended the indexer job will always stay in the waiting state and will never be removed from the set. This causes the while loop at the bottom of the method to spin forever.
I think you are correct that this is a bug in JobManager#join. The API states that when the job manager is suspended it will only join running jobs, but this is not the case as you point out. Thanks for the detailed report and analysis.
Szymon P if you'd like to contribute a fix here it would be welcome.
(In reply to comment #2) > Szymon P if you'd like to contribute a fix here it would be welcome. I will take a look.
Created attachment 228624 [details] Patch v.0.1 The fix is simple but the test requires some explanation. There are three jobs required to reproduce the problem: a) main job that calls manager.join(..) a) running job for which the main job is waiting for c) waiting job scheduled when the job manager is suspended and the main job is already joining the running job In order to make the test repeatable and run the scenario always when the exact schedule conditions of waiting job are met additional synchronization using barrier is used. As usual with barriers, instead of a dedalock we get timeout when the test fails.
(In reply to comment #4) > There are three jobs required to reproduce the problem: > a) ... > a) ... > c) ... Typo, should be a), b) and c).
(In reply to comment #4) > Created attachment 228624 [details] > Patch v.0.1 > > The fix is simple but the test requires some explanation. I was wrong. It is simple only when we know when the suspend method is called. It can be called by any thread at any time and even multiple times so Patch v.0.1 does not solve the problem completely, only the case described in comment 0.
Created attachment 228730 [details] Patch v.0.2 The patch fixes problem described in comment 0 and additional cases when job is scheduled when job manager is suspended.
John, when you get a moment, can you please review? Thanks!
Thanks Szymon. I released fix and test with one small change. If the job manager is suspended we don't need to do anything when a job is scheduled. We only need to wait on it if/when the job starts running. http://git.eclipse.org/c/platform/eclipse.platform.runtime.git/commit/?id=c912fe926531d35e8d5f53b3b781bf85762262c2