This Bugzilla instance is deprecated, and most Eclipse projects now use GitHub or Eclipse GitLab. Please see the deprecation plan for details.
Bug 412138 - deadlock in JobManager
Summary: deadlock in JobManager
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Runtime (show other bugs)
Version: 4.3   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: 4.4 M1   Edit
Assignee: Szymon Ptaszkiewicz CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-02 15:00 EDT by Daniel Friederich CLA
Modified: 2013-07-17 10:28 EDT (History)
1 user (show)

See Also:


Attachments
stack traces of deadlock as reported by jstack (win32) (20.01 KB, text/plain)
2013-07-02 15:01 EDT, Daniel Friederich CLA
no flags Details
Patch (8.52 KB, patch)
2013-07-16 11:20 EDT, Szymon Ptaszkiewicz CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Friederich CLA 2013-07-02 15:00:19 EDT
I just tried using Kepler for the first time and immediately run into a deadlock :-(.
The jstack utility shows a deadlock with 3 threads all threads blocked in the JobManager class.

This is with build R-4.3-201306052000
Comment 1 Daniel Friederich CLA 2013-07-02 15:01:33 EDT
Created attachment 233001 [details]
stack traces of deadlock as reported by jstack (win32)

Search for "Found one Java-level deadlock" at the end
Comment 2 Szymon Ptaszkiewicz CLA 2013-07-03 06:47:11 EDT
The problem seems to be in Worker-3. There is a call to com.accurev.ide.eclipse.ResourcesManager$3.runInWorkspace(ResourcesManager.java:416) which tries to join a job. By comparing hash codes you can see that 0x1314f5b8 belongs to AutoBuildJob (see Worker-4). AutoBuildJob is an internal job and it is not exposed anywhere, so the question is: how is it possible that com.accurev.ide.eclipse.ResourcesManager$3.runInWorkspace has a reference to AutoBuildJob to be able to call AutoBuildJob.join()?

The only technical way to do that is that com.accurev.ide.eclipse.ResourcesManager$3.runInWorkspace could be using Eclipse internal classes to get access to AutoBuildJob and that is not supported. Please file a bug against vendor of com.accurev.ide.eclipse.* to investigate this issue.
Comment 3 Paul Webster CLA 2013-07-04 14:21:07 EDT
(In reply to comment #2)
> The problem seems to be in Worker-3. There is a call to
> com.accurev.ide.eclipse.ResourcesManager$3.runInWorkspace(ResourcesManager.
> java:416) which tries to join a job. By comparing hash codes you can see
> that 0x1314f5b8 belongs to AutoBuildJob (see Worker-4). AutoBuildJob is an
> internal job and it is not exposed anywhere, 

Can't they get it from the IJobManager (depending on when they call the methods, it could be returned from there or passed in an IJobChangeListener)

PW
Comment 4 Szymon Ptaszkiewicz CLA 2013-07-05 05:51:20 EDT
(In reply to comment #3)
> Can't they get it from the IJobManager (depending on when they call the
> methods, it could be returned from there or passed in an IJobChangeListener)
> 
> PW

You're right. I will prepare a test case for that.
Comment 5 Szymon Ptaszkiewicz CLA 2013-07-16 11:20:18 EDT
Created attachment 233526 [details]
Patch

Job is an API class so it can be used as a synchronization object by clients. We shouldn't perform any locking directly on a job since we can get deadlocks like the one in comment 1. The patch solves the deadlock by adding a helper object for synchronization purposes. Test for this bug is also added.

There is also another place where we synchronize on a job internally: ThreadJob.run(IProgressMonitor) and ThreadJob.isRunning(). This is an internal job implementation not exposed to clients so I haven't changed that to use the new object, but it may be worth to consider changing it as well.
Comment 6 John Arthorne CLA 2013-07-16 11:50:46 EDT
Thanks for the patch Szymon. For future patches, we need you to "sign off" on the commit to indicate accepting the Eclipse CoO [1]. This can be done either of the following ways:

- add your sign-off in a commit, and push the commit to gerrit for review
- Attach patch in the git format-patch format with the required header. For example if you create the change in a new commit in master you can generate a patch with required headers as follows:

git format-patch HEAD~1 --stdout -s
Comment 7 Szymon Ptaszkiewicz CLA 2013-07-16 12:03:52 EDT
(In reply to comment #6)
> Thanks for the patch Szymon. For future patches, we need you to "sign off"
> on the commit to indicate accepting the Eclipse CoO [1]. This can be done
> either of the following ways:
> 
> - add your sign-off in a commit, and push the commit to gerrit for review
> - Attach patch in the git format-patch format with the required header. For
> example if you create the change in a new commit in master you can generate
> a patch with required headers as follows:
> 
> git format-patch HEAD~1 --stdout -s

Ok, will do.
Comment 8 John Arthorne CLA 2013-07-17 10:26:52 EDT
I took a slightly different approach for a fix. The only reason for the synchronization there is to lazily initialize the listener list object. The reason for that is to avoid the extra overhead of an object per job if it is not needed. Adding a lock object defeats the purpose of that. So I have removed the synchronization, and eagerly initialized the listener list instead:

http://git.eclipse.org/c/platform/eclipse.platform.runtime.git/commit/?id=3a89bc4229d1c798e8e72d58624be452dc177f28
Comment 9 John Arthorne CLA 2013-07-17 10:28:50 EDT
Changing assignee to Szymon because he did most of the work here.