Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 331719 - Deadlock in DependencyGraphImpl call to yieldRule from the AutoBuildJob
Summary: Deadlock in DependencyGraphImpl call to yieldRule from the AutoBuildJob
Status: RESOLVED FIXED
Alias: None
Product: WTP Common Tools
Classification: WebTools
Component: wst.common (show other bugs)
Version: 3.2.2   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.2.3   Edit
Assignee: Roberto Sanchez Herrera CLA
QA Contact: Carl Anderson CLA
URL:
Whiteboard: PMC_approved
Keywords:
: 324502 334050 335423 (view as bug list)
Depends on:
Blocks: 335645
  Show dependency tree
 
Reported: 2010-12-02 16:10 EST by Carlin Rogers CLA
Modified: 2011-09-13 09:43 EDT (History)
5 users (show)

See Also:
shr31223: pmc_approved? (david_williams)
carlin.rogers: pmc_approved?
shr31223: pmc_approved? (naci.dai)
shr31223: pmc_approved? (deboer)
shr31223: pmc_approved? (neil.hauge)
shr31223: pmc_approved? (kaloyan)
raghunathan.srinivasan: pmc_approved+
ccc: review+
cbridgha: review+


Attachments
deadlock thread dump (16.43 KB, text/plain)
2010-12-02 16:11 EST, Carlin Rogers CLA
no flags Details
Patch 1 (8.68 KB, patch)
2011-01-24 17:49 EST, Roberto Sanchez Herrera CLA
no flags Details | Diff
Updated patch (12.34 KB, patch)
2011-01-25 17:04 EST, Roberto Sanchez Herrera CLA
no flags Details | Diff
thread dumps of deadlock from running tests with patch (6.35 KB, application/zip)
2011-01-26 01:42 EST, Carlin Rogers CLA
no flags Details
Patch version 3 (16.60 KB, patch)
2011-01-26 11:55 EST, Roberto Sanchez Herrera CLA
no flags Details | Diff
Patch version 4 (17.17 KB, patch)
2011-01-26 15:46 EST, Roberto Sanchez Herrera CLA
ccc: iplog+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Carlin Rogers CLA 2010-12-02 16:10:01 EST
This was found in an Mbuild of WTP 3.2.3,  wtp-sdk-M-3.2.3-20101130001821.zip

We've been seeing deadlocks occur in runs of the JSF Tools JUnit tests. It does not reproduce easily in our development environments. I was able to hit this while running in the debugger and could then check the JobManager's DeadlockDetector lock graph to help verify the issue.

The deadlock is between the main test thread which has a MultiRule as a scheduling rule and is waiting to acquire the WorkManager.lock and the AutoBuildJob which has yielded the WorkspaceRule but still has the WorkManager.lock, and is waiting to get the WorkspaceRule. I'm not sure why the WorkManager.lock OrderedLock was not release with the WorkspaceRule during the yield rule.

Here's the thread stack from the debugger. I will also attach the full thread dump.

org.eclipse.equinox.launcher.Main at localhost:1876  
  Daemon System Thread [Attach Listener] (Running)  
  Daemon System Thread [Signal Dispatcher] (Running)  
  Daemon System Thread [Finalizer] (Running)  
  Daemon System Thread [Reference Handler] (Running)  
  Thread [main] (Suspended)  
    owns: RunnableLock  (id=93)  
    waiting for: Semaphore  (id=92)  
    Object.wait(long) line: not available [native method]  
    Semaphore.acquire(long) line: 39  
    OrderedLock.doAcquire(Semaphore, long) line: 176  
    OrderedLock.acquire(long) line: 110  
    OrderedLock.acquire() line: 84  
    WorkManager.checkIn(ISchedulingRule, IProgressMonitor) line: 118  
    Workspace.prepareOperation(ISchedulingRule, IProgressMonitor) line: 1914  
    Workspace.run(IWorkspaceRunnable, ISchedulingRule, int, IProgressMonitor) line: 1970  
    SetClasspathOperation(JavaModelOperation).runOperation(IProgressMonitor) line: 793  
    JavaProject.setRawClasspath(IClasspathEntry[], IClasspathEntry[], IPath, boolean, IProgressMonitor) line: 3026  
    JavaProject.setRawClasspath(IClasspathEntry[], IPath, boolean, IProgressMonitor) line: 2988  
    JavaProject.setRawClasspath(IClasspathEntry[], IProgressMonitor) line: 3041  
    JDTTestEnvironment.<init>(ProjectTestEnvironment) line: 82  
    TestJDTBeanPropertyWorkingCopy.setUp() line: 58  
    TestJDTBeanPropertyWorkingCopy(TestCase).runBare() line: 132  
    TestResult$1.protect() line: 110  
    TestResult.runProtected(Test, Protectable) line: 128  
    TestResult.run(TestCase) line: 113  
    TestJDTBeanPropertyWorkingCopy(TestCase).run(TestResult) line: 124  
    TestSuite.runTest(Test, TestResult) line: 232  
    TestSuite.run(TestResult) line: 227  
    TestSuite.runTest(Test, TestResult) line: 232  
    TestSuite.run(TestResult) line: 227  
    SuiteMethod(JUnit38ClassRunner).run(RunNotifier) line: 83  
    JUnit4TestClassReference(JUnit4TestReference).run(TestExecution) line: 49  
    TestExecution.run(ITestReference[]) line: 38  
    RemotePluginTestRunner(RemoteTestRunner).runTests(String[], String, TestExecution) line: 467  
    RemotePluginTestRunner(RemoteTestRunner).runTests(TestExecution) line: 683  
    RemotePluginTestRunner(RemoteTestRunner).run() line: 390  
    RemotePluginTestRunner.main(String[]) line: 62  
    UITestApplication$1.run() line: 116  
    RunnableLock.run() line: 35  
    UISynchronizer(Synchronizer).runAsyncMessages(boolean) line: 134  
    Display.runAsyncMessages(boolean) line: 4041  
    Display.readAndDispatch() line: 3660  
    Workbench.runEventLoop(Window$IExceptionHandler, Display) line: 2640  
    Workbench.runUI() line: 2604  
    Workbench.access$4(Workbench) line: 2438  
    Workbench$7.run() line: 671  
    Realm.runWithDefault(Realm, Runnable) line: 332  
    Workbench.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 664  
    PlatformUI.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 149  
    IDEApplication.start(IApplicationContext) line: 115  
    UITestApplication.start(IApplicationContext) line: 47  
    EclipseAppHandle.run(Object) line: 196  
    EclipseAppLauncher.runApplication(Object) line: 110  
    EclipseAppLauncher.start(Object) line: 79  
    EclipseStarter.run(Object) line: 369  
    EclipseStarter.run(String[], Runnable) line: 179  
    NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]  
    NativeMethodAccessorImpl.invoke(Object, Object[]) line: 39  
    DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 25  
    Method.invoke(Object, Object...) line: 597  
    Main.invokeFramework(String[], URL[]) line: 619  
    Main.basicRun(String[]) line: 574  
    Main.run(String[]) line: 1407  
    Main.main(String[]) line: 1383  
  Daemon Thread [Framework Event Dispatcher] (Running)  
  Daemon Thread [Start Level Event Dispatcher] (Running)  
  Daemon Thread [Bundle File Closer] (Running)  
  Daemon Thread [[Timer] - Main Queue Handler] (Running)  
  Thread [Worker-JM] (Running)  
  Thread [Worker-0] (Running)  
  Daemon Thread [Java indexing] (Running)  
  Thread [Worker-2] (Running)  
  Thread [WorkbenchTestable] (Running)  
  Thread [Worker-3] (Running)  
  Thread [Worker-4] (Suspended)  
    waiting for: Object  (id=94)  
    Object.wait(long) line: not available [native method]  
    ThreadJob.waitForRun(ThreadJob, IProgressMonitor, InternalJob, Thread) line: 269  
    ThreadJob.joinRun(ThreadJob, IProgressMonitor) line: 199  
    JobManager.yieldRule(InternalJob, IProgressMonitor) line: 1398  
    AutoBuildJob(InternalJob).yieldRule(IProgressMonitor) line: 600  
    AutoBuildJob(Job).yieldRule(IProgressMonitor) line: 709  
    DependencyGraphImpl.waitForAllUpdates(IProgressMonitor) line: 681  
    DependencyGraphImpl.getReferencingComponents(IProject) line: 102  
    J2EEModuleVirtualComponent(VirtualComponent).getReferencingComponents() line: 577  
    EarUtilities.getReferencingEARProjects(IProject) line: 274  
    J2EEModuleVirtualComponent.calculateManifestReferences(IVirtualComponent, boolean) line: 370  
    J2EEModuleVirtualComponent.cacheManifestReferences() line: 360  
    J2EEModuleVirtualComponent.getReferences(Map<String,Object>) line: 169  
    WebAppLibrariesContainer.computeReferences(IVirtualComponent) line: 100  
    WebAppLibrariesContainer(FlexibleProjectContainer).computeClasspathEntries() line: 183  
    WebAppLibrariesContainer.computeClasspathEntries() line: 119  
    WebAppLibrariesContainer(FlexibleProjectContainer).isOutOfDate(IResourceDelta) line: 156  
    FlexibleProjectContainer$Listener.resourceChanged(IResourceChangeEvent) line: 476  
    NotificationManager$2.run() line: 291  
    SafeRunner.run(ISafeRunnable) line: 42  
    NotificationManager.notify(ResourceChangeListenerList$ListenerEntry[], IResourceChangeEvent, boolean) line: 285  
    NotificationManager.broadcastChanges(ElementTree, ResourceChangeEvent, boolean) line: 149  
    Workspace.broadcastBuildEvent(Object, int, int) line: 311  
    AutoBuildJob.doBuild(IProgressMonitor) line: 140  
    AutoBuildJob.run(IProgressMonitor) line: 242  
    Worker.run() line: 54  
  Thread [ReaderThread] (Running)  
  Daemon Thread [JavaScript indexing] (Running)  
  Daemon Thread [[ThreadPool Manager] - Idle Thread] (Running)  

The following was copied from the lock graph displayed in the variables view...

this  WorkManager  (id=136)  
  checkInFailed  ThreadLocal<T>  (id=170)  
  hasBuildChanges  false  
  jobManager  JobManager  (id=172)  
    active  true  
    implicitJobs  ImplicitJobs  (id=176)  
    internalWorker  InternalWorker  (id=32)  
    jobListeners  JobListeners  (id=178)  
    lock  Object  (id=180)  
    lockManager  LockManager  (id=181)  
      lockListener  UILockListener  (id=192)  
      locks  DeadlockDetector  (id=195)  
        graph   (id=201)  
          [0]   (id=320)  
            [0]  1  
            [1]  0  
            [2]  -1  
          [1]   (id=321)  
            [0]  -1  
            [1]  1  
            [2]  0  
        locks  ArrayList<E>  (id=203)  
          elementData  Object[10]  (id=294)  
            [0]  OrderedLock  (id=135)  
            [1]  MultiRule  (id=164)  
            [2]  WorkspaceRoot  (id=295)  
            [3]  null  
            ...
          modCount  5651  
          size  3  
        lockThreads  ArrayList<E>  (id=205)  
          elementData  Object[10]  (id=206)  
            [0]  Worker  (id=48)  
            [1]  Thread  (id=1)  
            [2]  null  
            ...
          modCount  448  
          size  2  


Main thread has a MultiRule that contains the test project as the scheduling rule for the JDT SetClasspathOperation and is waiting to acquire the WorkManager.lock.

Worker-4 is the AutoBuildJob which has the WorkManager.lock OrderedLock and is notifying the WebAppLibrariesContainer of a resource change. While checking to see if a container is out of date, the DependencyGraphImpl is in a state such that its graph update job needs to be rescheduled, and calls yieldRule. Appears the scheduling rule is released but the JobManager's DeadlockDetector lock graph indicates that it still has the WorkManager.lock... causing the deadlock when this AutoBuildJob waits to get back the WorkspaceRule. The value of canBlock in the ThreadJob.waitForRun() is true so there's no timeout.
Comment 1 Carlin Rogers CLA 2010-12-02 16:11:17 EST
Created attachment 184409 [details]
deadlock thread dump
Comment 2 Carl Anderson CLA 2010-12-02 16:27:02 EST
Assigning to Jason for initial investigation.
Comment 3 Carlin Rogers CLA 2010-12-08 19:00:31 EST
The JSF core tests (org.eclipse.jst.jsf.core.tests.AllTests) have failed to finish in the last couple of builds. It is the same tests suite that I identified this deadlock in. We have also seen this same deadlock in JUnit testing of our adopter product.

I gather that a DNF status on a test suite indicates it was killed after not completing within a given time limit, correct?

Curious if these tests are failing to complete before the required time limit or if they fail because they've hit this deadlock. Are there files on the build machine that show the individual tests from the suite that have completed before the suite is killed? Just wondering how far along the tests are before they a killed. If were not close to the end then likely a deadlock.

Also wondering if the test build script could force a thread dump of the test process with jstack before actually killing the test process. If it is a deadlock the thread dump would be extremely helpful.

Thanks,
Carlin
Comment 4 Roberto Sanchez Herrera CLA 2011-01-24 17:16:24 EST
*** Bug 334050 has been marked as a duplicate of this bug. ***
Comment 5 Roberto Sanchez Herrera CLA 2011-01-24 17:49:47 EST
Created attachment 187481 [details]
Patch 1

Hello Carlin. It looks like you are able to reproduce this deadlock. Would you be willing to try the attached patch out? I already ran the JEE tools JUnits and the JSF core tests (org.eclipse.jst.jsf.core.tests.AllTests)using this patch.

Thank you.
Comment 6 Carlin Rogers CLA 2011-01-24 20:03:02 EST
Hi Roberto,

Reproducing the bug in a development environment has proven to be very difficult. The time I managed to reproduce it was after several attempts and was just fortunate to hit it. This afternoon I ran the tests from my dev env (without the patch) many times without a deadlock so I'm not sure just adding the patch and running is going to guarantee that it solves the problem. 

Have you been able to force the deadlock in the debugger by managing/stepping threads? Just curious if we can simulate it that way and use the same technique to verify the patch resolves the condition.

I noticed that the core tests still hang on occasion - most recent core test DNF for a build of 3.2.3 was 1/21, 19:28 UTC. I will try testing some more but maybe the automated tests will be the best indicator that the patch works.

Carlin
Comment 7 Roberto Sanchez Herrera CLA 2011-01-25 10:00:56 EST
No, I haven't been able to force the deadlock.
Comment 8 Roberto Sanchez Herrera CLA 2011-01-25 17:04:42 EST
Created attachment 187584 [details]
Updated patch

I was able to "simulate" the deadlock by putting a break point in line 458 of DependencyGraphImpl.java:

protected IStatus run(IProgressMonitor monitor) {
......
incrementModStamp();
				if(ResourcesPlugin.getPlugin().getBundle().getState() != Bundle.ACTIVE){ <--- Here

.....

}


I updated the patch based on the testing.
Comment 9 Carlin Rogers CLA 2011-01-26 01:42:43 EST
Created attachment 187611 [details]
thread dumps of deadlock from running tests with patch

Hi Roberto,

I ran the JSF core tests with the updated patch today and twice I hit the deadlock. I will attach a zip with the thread dumps from both deadlocks. The first deadlock is between Main and Worker-4, the second is between Main and Worker-1.

Carlin
Comment 10 Danail Branekov CLA 2011-01-26 07:10:34 EST
*** Bug 335423 has been marked as a duplicate of this bug. ***
Comment 11 Roberto Sanchez Herrera CLA 2011-01-26 08:36:53 EST
(In reply to comment #9)
> Created attachment 187611 [details]
> thread dumps of deadlock from running tests with patch
> 
> Hi Roberto,
> 
> I ran the JSF core tests with the updated patch today and twice I hit the
> deadlock. I will attach a zip with the thread dumps from both deadlocks. The
> first deadlock is between Main and Worker-4, the second is between Main and
> Worker-1.
> 
> Carlin

Thank you for the dumps, I will take a look. Just to confirm, did you use the updated patch?
Comment 12 Roberto Sanchez Herrera CLA 2011-01-26 10:11:49 EST
Carlin, by looking at the thread dumps, I see that the line numbers in WebAppLibrariesContainer does not match with the line numbers in the same class after applying the patch. Could you confirm if you had the latest patch please? Thank you.
Comment 13 Carl Anderson CLA 2011-01-26 11:37:56 EST
My initial review shows that the latest patch needs some more cleanup.
First, there are two new classes- these need the EPL header as a comment at the top.
Second, since those classes are referenced in J2EE code, the MANIFEST.MF files for both org.eclipse.jst.j2ee and org.eclipse.jst.j2ee.web need to be updated to raise the minimum version of org.eclipse.wst.common.modulecore to 1.2.5.
Third, since J2EE code is being modified, this bug will need Chuck's review, too.
Comment 14 Carlin Rogers CLA 2011-01-26 11:48:12 EST
(In reply to comment #12)
> Carlin, by looking at the thread dumps, I see that the line numbers in
> WebAppLibrariesContainer does not match with the line numbers in the same class
> after applying the patch. Could you confirm if you had the latest patch please?
> Thank you.

Sorry Roberto, you're correct. WebAppLibrariesContainer was not updated correctly. I applied the most recent patch but the org.eclipse.jst.j2ee.web plugin was closed in my workspace so WebAppLibrariesContainer did not get updated. Looks like the other plugin files in my dev workspace were patched correctly.
Comment 15 Roberto Sanchez Herrera CLA 2011-01-26 11:55:09 EST
Created attachment 187652 [details]
Patch version 3

Updated patch based on Carl's comments
Comment 16 Roberto Sanchez Herrera CLA 2011-01-26 11:56:14 EST
Adding Chuck as reviewer
Comment 17 Roberto Sanchez Herrera CLA 2011-01-26 15:46:55 EST
Created attachment 187684 [details]
Patch version 4

Added code to handle one more scenario.
Comment 18 Chuck Bridgham CLA 2011-01-26 15:50:55 EST
approve
Comment 19 Carl Anderson CLA 2011-01-26 16:41:56 EST
I successfully ran the entire Java EE JUnit bucket with patch version 4.
Comment 20 Roberto Sanchez Herrera CLA 2011-01-26 16:50:38 EST
1. Explain why you believe this is a stop-ship defect. Or, if it is a "hotbug" (requested by an adopter) please document it as such. 

Deadlocks might occur during startup while classpath containers are installed

2. Is there a work-around? If so, why do you believe the work-around is insufficient? 

The only work-around is restart eclipse

3. How has the fix been tested? Is there a test case attached to the bugzilla record? Has a JUnit Test been added? 

The fix was tested running the existing JEE JUnits and by hand


4. Give a brief technical overview. Who has reviewed this fix? 

Validations were added in the JEE and Web classpath containers to check if the state of the dependency graph is current. If it is not, then do not call a deadlock-prone code in the dependency graph until it is updated. 
The fix was reviewed by Chuck Bridgham, Jason Sholl and Carl Anderson


5. What is the risk associated with this fix? 
Medium
Comment 21 Carlin Rogers CLA 2011-01-26 17:54:04 EST
The JSF core tests (org.eclipse.jst.jsf.core.tests.AllTests) pass with patch version 4.
Comment 22 Carl Anderson CLA 2011-01-27 14:08:30 EST
Committed to R3_2_maintenance
Comment 23 Carlin Rogers CLA 2011-01-31 14:22:08 EST
One of the JSF JUnit test suites (org.eclipse.jst.jsf.context.symbol.tests.AllTests) failed to
finish in the most recent 3.2.3 build (Jan 31, M-3.2.3-20110131093146). I wonder if we still have a deadlock issue or something else is going on.

Does anyone know if there is a way we could get a thread dump from the tests that have status DNF? Just wondering if the test build script could force a thread dump of the test process with jstack before actually killing the test process. If it is still a deadlock, the thread dump would be extremely helpful.

Thanks,
Carlin
Comment 24 Carl Anderson CLA 2011-01-31 14:25:54 EST
Carlin, we opened another bug 335645 for a DNF we saw.  We believe we have a fix for that issue.
Comment 25 Carlin Rogers CLA 2011-01-31 14:28:52 EST
(In reply to comment #24)
> Carlin, we opened another bug 335645 for a DNF we saw.  We believe we have a
> fix for that issue.

OK, thanks Carl. I will add myself to the bug.
Comment 26 Carlin Rogers CLA 2011-02-11 17:41:21 EST
Nice job. I have not seen a DNF in the tests from the recent 3.2.3 builds. Looks like the patches from this bug and bug 335645 resolve the deadlock. Thanks!

I noticed that the code patches have not yet been merged into HEAD for the 3.3 builds and we still see DNF results there. Should I clone the bug for 3.3 or will the changes get merged from 3.2_maintenance into HEAD?
Comment 27 Carl Anderson CLA 2011-09-13 09:43:23 EDT
*** Bug 324502 has been marked as a duplicate of this bug. ***