Community
Participate
Working Groups
This was found in an Mbuild of WTP 3.2.3, wtp-sdk-M-3.2.3-20101130001821.zip We've been seeing deadlocks occur in runs of the JSF Tools JUnit tests. It does not reproduce easily in our development environments. I was able to hit this while running in the debugger and could then check the JobManager's DeadlockDetector lock graph to help verify the issue. The deadlock is between the main test thread which has a MultiRule as a scheduling rule and is waiting to acquire the WorkManager.lock and the AutoBuildJob which has yielded the WorkspaceRule but still has the WorkManager.lock, and is waiting to get the WorkspaceRule. I'm not sure why the WorkManager.lock OrderedLock was not release with the WorkspaceRule during the yield rule. Here's the thread stack from the debugger. I will also attach the full thread dump. org.eclipse.equinox.launcher.Main at localhost:1876 Daemon System Thread [Attach Listener] (Running) Daemon System Thread [Signal Dispatcher] (Running) Daemon System Thread [Finalizer] (Running) Daemon System Thread [Reference Handler] (Running) Thread [main] (Suspended) owns: RunnableLock (id=93) waiting for: Semaphore (id=92) Object.wait(long) line: not available [native method] Semaphore.acquire(long) line: 39 OrderedLock.doAcquire(Semaphore, long) line: 176 OrderedLock.acquire(long) line: 110 OrderedLock.acquire() line: 84 WorkManager.checkIn(ISchedulingRule, IProgressMonitor) line: 118 Workspace.prepareOperation(ISchedulingRule, IProgressMonitor) line: 1914 Workspace.run(IWorkspaceRunnable, ISchedulingRule, int, IProgressMonitor) line: 1970 SetClasspathOperation(JavaModelOperation).runOperation(IProgressMonitor) line: 793 JavaProject.setRawClasspath(IClasspathEntry[], IClasspathEntry[], IPath, boolean, IProgressMonitor) line: 3026 JavaProject.setRawClasspath(IClasspathEntry[], IPath, boolean, IProgressMonitor) line: 2988 JavaProject.setRawClasspath(IClasspathEntry[], IProgressMonitor) line: 3041 JDTTestEnvironment.<init>(ProjectTestEnvironment) line: 82 TestJDTBeanPropertyWorkingCopy.setUp() line: 58 TestJDTBeanPropertyWorkingCopy(TestCase).runBare() line: 132 TestResult$1.protect() line: 110 TestResult.runProtected(Test, Protectable) line: 128 TestResult.run(TestCase) line: 113 TestJDTBeanPropertyWorkingCopy(TestCase).run(TestResult) line: 124 TestSuite.runTest(Test, TestResult) line: 232 TestSuite.run(TestResult) line: 227 TestSuite.runTest(Test, TestResult) line: 232 TestSuite.run(TestResult) line: 227 SuiteMethod(JUnit38ClassRunner).run(RunNotifier) line: 83 JUnit4TestClassReference(JUnit4TestReference).run(TestExecution) line: 49 TestExecution.run(ITestReference[]) line: 38 RemotePluginTestRunner(RemoteTestRunner).runTests(String[], String, TestExecution) line: 467 RemotePluginTestRunner(RemoteTestRunner).runTests(TestExecution) line: 683 RemotePluginTestRunner(RemoteTestRunner).run() line: 390 RemotePluginTestRunner.main(String[]) line: 62 UITestApplication$1.run() line: 116 RunnableLock.run() line: 35 UISynchronizer(Synchronizer).runAsyncMessages(boolean) line: 134 Display.runAsyncMessages(boolean) line: 4041 Display.readAndDispatch() line: 3660 Workbench.runEventLoop(Window$IExceptionHandler, Display) line: 2640 Workbench.runUI() line: 2604 Workbench.access$4(Workbench) line: 2438 Workbench$7.run() line: 671 Realm.runWithDefault(Realm, Runnable) line: 332 Workbench.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 664 PlatformUI.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 149 IDEApplication.start(IApplicationContext) line: 115 UITestApplication.start(IApplicationContext) line: 47 EclipseAppHandle.run(Object) line: 196 EclipseAppLauncher.runApplication(Object) line: 110 EclipseAppLauncher.start(Object) line: 79 EclipseStarter.run(Object) line: 369 EclipseStarter.run(String[], Runnable) line: 179 NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method] NativeMethodAccessorImpl.invoke(Object, Object[]) line: 39 DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 25 Method.invoke(Object, Object...) line: 597 Main.invokeFramework(String[], URL[]) line: 619 Main.basicRun(String[]) line: 574 Main.run(String[]) line: 1407 Main.main(String[]) line: 1383 Daemon Thread [Framework Event Dispatcher] (Running) Daemon Thread [Start Level Event Dispatcher] (Running) Daemon Thread [Bundle File Closer] (Running) Daemon Thread [[Timer] - Main Queue Handler] (Running) Thread [Worker-JM] (Running) Thread [Worker-0] (Running) Daemon Thread [Java indexing] (Running) Thread [Worker-2] (Running) Thread [WorkbenchTestable] (Running) Thread [Worker-3] (Running) Thread [Worker-4] (Suspended) waiting for: Object (id=94) Object.wait(long) line: not available [native method] ThreadJob.waitForRun(ThreadJob, IProgressMonitor, InternalJob, Thread) line: 269 ThreadJob.joinRun(ThreadJob, IProgressMonitor) line: 199 JobManager.yieldRule(InternalJob, IProgressMonitor) line: 1398 AutoBuildJob(InternalJob).yieldRule(IProgressMonitor) line: 600 AutoBuildJob(Job).yieldRule(IProgressMonitor) line: 709 DependencyGraphImpl.waitForAllUpdates(IProgressMonitor) line: 681 DependencyGraphImpl.getReferencingComponents(IProject) line: 102 J2EEModuleVirtualComponent(VirtualComponent).getReferencingComponents() line: 577 EarUtilities.getReferencingEARProjects(IProject) line: 274 J2EEModuleVirtualComponent.calculateManifestReferences(IVirtualComponent, boolean) line: 370 J2EEModuleVirtualComponent.cacheManifestReferences() line: 360 J2EEModuleVirtualComponent.getReferences(Map<String,Object>) line: 169 WebAppLibrariesContainer.computeReferences(IVirtualComponent) line: 100 WebAppLibrariesContainer(FlexibleProjectContainer).computeClasspathEntries() line: 183 WebAppLibrariesContainer.computeClasspathEntries() line: 119 WebAppLibrariesContainer(FlexibleProjectContainer).isOutOfDate(IResourceDelta) line: 156 FlexibleProjectContainer$Listener.resourceChanged(IResourceChangeEvent) line: 476 NotificationManager$2.run() line: 291 SafeRunner.run(ISafeRunnable) line: 42 NotificationManager.notify(ResourceChangeListenerList$ListenerEntry[], IResourceChangeEvent, boolean) line: 285 NotificationManager.broadcastChanges(ElementTree, ResourceChangeEvent, boolean) line: 149 Workspace.broadcastBuildEvent(Object, int, int) line: 311 AutoBuildJob.doBuild(IProgressMonitor) line: 140 AutoBuildJob.run(IProgressMonitor) line: 242 Worker.run() line: 54 Thread [ReaderThread] (Running) Daemon Thread [JavaScript indexing] (Running) Daemon Thread [[ThreadPool Manager] - Idle Thread] (Running) The following was copied from the lock graph displayed in the variables view... this WorkManager (id=136) checkInFailed ThreadLocal<T> (id=170) hasBuildChanges false jobManager JobManager (id=172) active true implicitJobs ImplicitJobs (id=176) internalWorker InternalWorker (id=32) jobListeners JobListeners (id=178) lock Object (id=180) lockManager LockManager (id=181) lockListener UILockListener (id=192) locks DeadlockDetector (id=195) graph (id=201) [0] (id=320) [0] 1 [1] 0 [2] -1 [1] (id=321) [0] -1 [1] 1 [2] 0 locks ArrayList<E> (id=203) elementData Object[10] (id=294) [0] OrderedLock (id=135) [1] MultiRule (id=164) [2] WorkspaceRoot (id=295) [3] null ... modCount 5651 size 3 lockThreads ArrayList<E> (id=205) elementData Object[10] (id=206) [0] Worker (id=48) [1] Thread (id=1) [2] null ... modCount 448 size 2 Main thread has a MultiRule that contains the test project as the scheduling rule for the JDT SetClasspathOperation and is waiting to acquire the WorkManager.lock. Worker-4 is the AutoBuildJob which has the WorkManager.lock OrderedLock and is notifying the WebAppLibrariesContainer of a resource change. While checking to see if a container is out of date, the DependencyGraphImpl is in a state such that its graph update job needs to be rescheduled, and calls yieldRule. Appears the scheduling rule is released but the JobManager's DeadlockDetector lock graph indicates that it still has the WorkManager.lock... causing the deadlock when this AutoBuildJob waits to get back the WorkspaceRule. The value of canBlock in the ThreadJob.waitForRun() is true so there's no timeout.
Created attachment 184409 [details] deadlock thread dump
Assigning to Jason for initial investigation.
The JSF core tests (org.eclipse.jst.jsf.core.tests.AllTests) have failed to finish in the last couple of builds. It is the same tests suite that I identified this deadlock in. We have also seen this same deadlock in JUnit testing of our adopter product. I gather that a DNF status on a test suite indicates it was killed after not completing within a given time limit, correct? Curious if these tests are failing to complete before the required time limit or if they fail because they've hit this deadlock. Are there files on the build machine that show the individual tests from the suite that have completed before the suite is killed? Just wondering how far along the tests are before they a killed. If were not close to the end then likely a deadlock. Also wondering if the test build script could force a thread dump of the test process with jstack before actually killing the test process. If it is a deadlock the thread dump would be extremely helpful. Thanks, Carlin
*** Bug 334050 has been marked as a duplicate of this bug. ***
Created attachment 187481 [details] Patch 1 Hello Carlin. It looks like you are able to reproduce this deadlock. Would you be willing to try the attached patch out? I already ran the JEE tools JUnits and the JSF core tests (org.eclipse.jst.jsf.core.tests.AllTests)using this patch. Thank you.
Hi Roberto, Reproducing the bug in a development environment has proven to be very difficult. The time I managed to reproduce it was after several attempts and was just fortunate to hit it. This afternoon I ran the tests from my dev env (without the patch) many times without a deadlock so I'm not sure just adding the patch and running is going to guarantee that it solves the problem. Have you been able to force the deadlock in the debugger by managing/stepping threads? Just curious if we can simulate it that way and use the same technique to verify the patch resolves the condition. I noticed that the core tests still hang on occasion - most recent core test DNF for a build of 3.2.3 was 1/21, 19:28 UTC. I will try testing some more but maybe the automated tests will be the best indicator that the patch works. Carlin
No, I haven't been able to force the deadlock.
Created attachment 187584 [details] Updated patch I was able to "simulate" the deadlock by putting a break point in line 458 of DependencyGraphImpl.java: protected IStatus run(IProgressMonitor monitor) { ...... incrementModStamp(); if(ResourcesPlugin.getPlugin().getBundle().getState() != Bundle.ACTIVE){ <--- Here ..... } I updated the patch based on the testing.
Created attachment 187611 [details] thread dumps of deadlock from running tests with patch Hi Roberto, I ran the JSF core tests with the updated patch today and twice I hit the deadlock. I will attach a zip with the thread dumps from both deadlocks. The first deadlock is between Main and Worker-4, the second is between Main and Worker-1. Carlin
*** Bug 335423 has been marked as a duplicate of this bug. ***
(In reply to comment #9) > Created attachment 187611 [details] > thread dumps of deadlock from running tests with patch > > Hi Roberto, > > I ran the JSF core tests with the updated patch today and twice I hit the > deadlock. I will attach a zip with the thread dumps from both deadlocks. The > first deadlock is between Main and Worker-4, the second is between Main and > Worker-1. > > Carlin Thank you for the dumps, I will take a look. Just to confirm, did you use the updated patch?
Carlin, by looking at the thread dumps, I see that the line numbers in WebAppLibrariesContainer does not match with the line numbers in the same class after applying the patch. Could you confirm if you had the latest patch please? Thank you.
My initial review shows that the latest patch needs some more cleanup. First, there are two new classes- these need the EPL header as a comment at the top. Second, since those classes are referenced in J2EE code, the MANIFEST.MF files for both org.eclipse.jst.j2ee and org.eclipse.jst.j2ee.web need to be updated to raise the minimum version of org.eclipse.wst.common.modulecore to 1.2.5. Third, since J2EE code is being modified, this bug will need Chuck's review, too.
(In reply to comment #12) > Carlin, by looking at the thread dumps, I see that the line numbers in > WebAppLibrariesContainer does not match with the line numbers in the same class > after applying the patch. Could you confirm if you had the latest patch please? > Thank you. Sorry Roberto, you're correct. WebAppLibrariesContainer was not updated correctly. I applied the most recent patch but the org.eclipse.jst.j2ee.web plugin was closed in my workspace so WebAppLibrariesContainer did not get updated. Looks like the other plugin files in my dev workspace were patched correctly.
Created attachment 187652 [details] Patch version 3 Updated patch based on Carl's comments
Adding Chuck as reviewer
Created attachment 187684 [details] Patch version 4 Added code to handle one more scenario.
approve
I successfully ran the entire Java EE JUnit bucket with patch version 4.
1. Explain why you believe this is a stop-ship defect. Or, if it is a "hotbug" (requested by an adopter) please document it as such. Deadlocks might occur during startup while classpath containers are installed 2. Is there a work-around? If so, why do you believe the work-around is insufficient? The only work-around is restart eclipse 3. How has the fix been tested? Is there a test case attached to the bugzilla record? Has a JUnit Test been added? The fix was tested running the existing JEE JUnits and by hand 4. Give a brief technical overview. Who has reviewed this fix? Validations were added in the JEE and Web classpath containers to check if the state of the dependency graph is current. If it is not, then do not call a deadlock-prone code in the dependency graph until it is updated. The fix was reviewed by Chuck Bridgham, Jason Sholl and Carl Anderson 5. What is the risk associated with this fix? Medium
The JSF core tests (org.eclipse.jst.jsf.core.tests.AllTests) pass with patch version 4.
Committed to R3_2_maintenance
One of the JSF JUnit test suites (org.eclipse.jst.jsf.context.symbol.tests.AllTests) failed to finish in the most recent 3.2.3 build (Jan 31, M-3.2.3-20110131093146). I wonder if we still have a deadlock issue or something else is going on. Does anyone know if there is a way we could get a thread dump from the tests that have status DNF? Just wondering if the test build script could force a thread dump of the test process with jstack before actually killing the test process. If it is still a deadlock, the thread dump would be extremely helpful. Thanks, Carlin
Carlin, we opened another bug 335645 for a DNF we saw. We believe we have a fix for that issue.
(In reply to comment #24) > Carlin, we opened another bug 335645 for a DNF we saw. We believe we have a > fix for that issue. OK, thanks Carl. I will add myself to the bug.
Nice job. I have not seen a DNF in the tests from the recent 3.2.3 builds. Looks like the patches from this bug and bug 335645 resolve the deadlock. Thanks! I noticed that the code patches have not yet been merged into HEAD for the 3.3 builds and we still see DNF results there. Should I clone the bug for 3.3 or will the changes get merged from 3.2_maintenance into HEAD?
*** Bug 324502 has been marked as a duplicate of this bug. ***