Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 309781 - Deadlock during debugging
Summary: Deadlock during debugging
Status: RESOLVED FIXED
Alias: None
Product: WTP ServerTools
Classification: WebTools
Component: wst.server (show other bugs)
Version: 3.2   Edit
Hardware: PC Windows XP
: P2 critical (vote)
Target Milestone: 3.0.5 P   Edit
Assignee: Angel Vera CLA
QA Contact: Angel Vera CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 327247 328067
  Show dependency tree
 
Reported: 2010-04-20 05:56 EDT by Krzysztof Daniel CLA
Modified: 2010-10-18 15:09 EDT (History)
11 users (show)

See Also:


Attachments
Javacore + Jobs tracing (1.64 MB, application/x-zip-compressed)
2010-04-21 03:20 EDT, Krzysztof Daniel CLA
no flags Details
First approach to solve the problem (1.36 KB, patch)
2010-04-26 06:40 EDT, Krzysztof Daniel CLA
no flags Details | Diff
CorrectedPatch - still first attempt to solve the problem (2.14 KB, patch)
2010-05-04 07:09 EDT, Krzysztof Daniel CLA
no flags Details | Diff
Plugin for Eclipse 3.4.2 for testing (75.48 KB, application/java-archive)
2010-05-04 07:10 EDT, Krzysztof Daniel CLA
no flags Details
A temporal fix (1.10 KB, patch)
2010-05-24 08:34 EDT, Krzysztof Daniel CLA
no flags Details | Diff
Patch that defers republishing until the server is started (2.47 KB, patch)
2010-05-24 08:45 EDT, Krzysztof Daniel CLA
no flags Details | Diff
Patch for 3.0 stream (3.32 KB, patch)
2010-05-25 04:10 EDT, Krzysztof Daniel CLA
no flags Details | Diff
Plugin for testing (3.0.5 stream) (306.68 KB, application/octet-stream)
2010-05-25 04:11 EDT, Krzysztof Daniel CLA
no flags Details
Simplified test case (11.08 KB, application/x-zip-compressed)
2010-06-03 06:54 EDT, Krzysztof Daniel CLA
no flags Details
modified project (zip) (11.03 KB, application/octet-stream)
2010-06-03 11:05 EDT, Darin Wright CLA
no flags Details
deadlock example 2 (11.17 KB, application/octet-stream)
2010-06-03 11:12 EDT, Darin Wright CLA
no flags Details
patch for 3.7 or 3.6 (10.78 KB, patch)
2010-08-10 15:49 EDT, Darin Wright CLA
no flags Details | Diff
patch for 3.4.x (13.35 KB, patch)
2010-08-10 15:50 EDT, Darin Wright CLA
no flags Details | Diff
alternate patch for 3.7 (19.92 KB, application/octet-stream)
2010-08-13 11:53 EDT, Darin Wright CLA
no flags Details
simpler patch (13.04 KB, patch)
2010-08-18 14:55 EDT, Darin Wright CLA
no flags Details | Diff
eqivalent patch for 3.4.x (16.43 KB, patch)
2010-08-18 15:08 EDT, Darin Wright CLA
no flags Details | Diff
screen shot of "Waiting" dialog (20.07 KB, image/png)
2010-08-19 12:59 EDT, Darin Wright CLA
no flags Details
project that can be imported to reproduce the problem (16.69 KB, application/octet-stream)
2010-08-19 13:01 EDT, Darin Wright CLA
no flags Details
v1.0 (18.39 KB, patch)
2010-10-05 17:21 EDT, Angel Vera CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Krzysztof Daniel CLA 2010-04-20 05:56:29 EDT
The following stacktrace is reported. The issue is caused by the fact that during the isCancelled, separate event loop is launched, and DebugPlugin tries to save modified editor again. 

Reproduction steps are for Eclipse based product, but they give pretty good insight into the situation:
1. Start WAS in Debug mode from RAD                                     
                                                                        
                                                                        
2. Develop a Dynamic Web project v 2.5 targeting WAS 7                  
                                                                        
3. Add a Servlet with a sysout in doGet                                 
                                                                        
4. Add the project to WAS                                               
                                                                        
5. Add a breakpoint inside doGet                                        
                                                                        
6. Debug the servlet so that it reaches the breakpoint                  
                                                                        
7. Click on the "bug" button in the debug view to restart the server in 
debug mode                                                              
                                                                        
7. while the server is restarting, add a new space char inside the doGet
method of the servlet.                                                  
                                                                        
If you do this while the server is still shutting down, then you can    
save the servlet.                                                       
                                                                        
If you wait until when the server has initiated the restart, then you   
cannot save the servlet.                                                
                                                                        
You see a progress dialog with three entries:                           
                                                                        
- Restarting WAS 7.0 server                                             
- Save                                                                  
- Starting WAS 7.0 server                                               
                                                                        
8. If you wait for some time (minutes) then you will get a dialog asking
if you want to save. Click OK                                           
                                                                        
9. Now RAD is still hanging, and you see in the progress dialog:        
                                                                        
Save                                                                    
Starting WAS                                                            
Building prerequisite project list                                      
                                                                        
However the server appears to be in started state.                      


at java/lang/Object.wait(Native Method) at java/lang/Object.wait(Bytecode PC:3(Compiled Code)) at org/eclipse/core/internal/jobs/ThreadJob.joinRun(Bytecode PC:277) at org/eclipse/core/internal/jobs/ImplicitJobs.begin(Bytecode PC:207) at org/eclipse/core/internal/jobs/JobManager.beginRule(Bytecode PC:16) at org/eclipse/core/internal/resources/WorkManager.checkIn(Bytecode PC:40) at org/eclipse/core/internal/resources/Workspace.prepareOperation(Bytecode PC:50) at org/eclipse/core/internal/resources/Workspace.run(Bytecode PC:39) at org/eclipse/ui/actions/WorkspaceModifyOperation.run(Bytecode PC:27) at org/eclipse/ui/internal/editors/text/WorkspaceOperationRunner.run(Bytecode PC:18) at org/eclipse/ui/internal/editors/text/WorkspaceOperationRunner.run(Bytecode PC:20) at org/eclipse/ui/editors/text/TextFileDocumentProvider.executeOperation(Bytecode PC:16) at org/eclipse/ui/editors/text/TextFileDocumentProvider.saveDocument(Bytecode PC:24) at org/eclipse/ui/texteditor/AbstractTextEditor.performSave(Bytecode PC:44) at org/eclipse/jdt/internal/ui/javaeditor/CompilationUnitEditor.performSave(Bytecode PC:32) at org/eclipse/jdt/internal/ui/javaeditor/CompilationUnitEditor.doSave(Bytecode PC:115) at org/eclipse/ui/texteditor/AbstractTextEditor$TextEditorSavable.doSave(Bytecode PC:7) at org/eclipse/ui/Saveable.doSave(Bytecode PC:2) at org/eclipse/ui/internal/SaveableHelper.doSaveModel(Bytecode PC:70) at org/eclipse/ui/internal/EditorManager$6.run(Bytecode PC:102) at org/eclipse/ui/internal/SaveableHelper$4.run(Bytecode PC:7) at org/eclipse/jface/operation/ModalContext.runInCurrentThread(Bytecode PC:8) at org/eclipse/jface/operation/ModalContext.run(Bytecode PC:48) at org/eclipse/jface/window/ApplicationWindow$1.run(Bytecode PC:19) at org/eclipse/swt/custom/BusyIndicator.showWhile(Bytecode PC:118) at org/eclipse/jface/window/ApplicationWindow.run(Bytecode PC:302) at org/eclipse/ui/internal/WorkbenchWindow.run(Bytecode PC:170) at org/eclipse/ui/internal/SaveableHelper.runProgressMonitorOperation(Bytecode PC:24) at org/eclipse/ui/internal/EditorManager.saveAll(Bytecode PC:676) at org/eclipse/ui/internal/Workbench$17.run(Bytecode PC:253) at org/eclipse/core/runtime/SafeRunner.run(Bytecode PC:7(Compiled Code)) at org/eclipse/ui/internal/Workbench.saveAllEditors(Bytecode PC:23) at org/eclipse/debug/internal/ui/DebugUIPlugin.saveAllEditors(Bytecode PC:14) at org/eclipse/debug/internal/ui/DebugUIPlugin.preLaunchSave(Bytecode PC:34) at org/eclipse/debug/internal/ui/launchConfigurations/SaveScopeResourcesHandler.handleStatus(Bytecode PC:131) at org/eclipse/debug/internal/ui/sourcelookup/Prompter$1.run(Bytecode PC:19) at org/eclipse/ui/internal/UILockListener.doPendingWork(Bytecode PC:29) at org/eclipse/ui/internal/UISynchronizer$3.run(Bytecode PC:7) at org/eclipse/swt/widgets/RunnableLock.run(Bytecode PC:13(Compiled Code)) at org/eclipse/swt/widgets/Synchronizer.runAsyncMessages(Bytecode PC:29(Compiled Code)) at org/eclipse/swt/widgets/Display.runAsyncMessages(Bytecode PC:5(Compiled Code)) at org/eclipse/swt/widgets/Display.readAndDispatch(Bytecode PC:74(Compiled Code)) at org/eclipse/ui/internal/dialogs/EventLoopProgressMonitor.runEventLoop(Bytecode PC:39) at org/eclipse/ui/internal/dialogs/EventLoopProgressMonitor.isCanceled(Bytecode PC:1) at org/eclipse/core/runtime/ProgressMonitorWrapper.isCanceled(Bytecode PC:6(Compiled Code)) at org/eclipse/core/runtime/SubMonitor$RootInfo.isCanceled(Bytecode PC:6(Compiled Code)) at org/eclipse/core/runtime/SubMonitor.isCanceled(Bytecode PC:4(Compiled Code)) at org/eclipse/core/internal/jobs/ThreadJob.isCanceled(Bytecode PC:3) at org/eclipse/core/internal/jobs/ThreadJob.joinRun(Bytecode PC:84) at org/eclipse/core/internal/jobs/ImplicitJobs.begin(Bytecode PC:207) at org/eclipse/core/internal/jobs/JobManager.beginRule(Bytecode PC:16) at org/eclipse/core/internal/resources/WorkManager.checkIn(Bytecode PC:40) at org/eclipse/core/internal/resources/Workspace.prepareOperation(Bytecode PC:50) at org/eclipse/core/internal/resources/Workspace.run(Bytecode PC:39) at org/eclipse/ui/actions/WorkspaceModifyOperation.run(Bytecode PC:27) at org/eclipse/ui/internal/editors/text/WorkspaceOperationRunner.run(Bytecode PC:18) at org/eclipse/ui/internal/editors/text/WorkspaceOperationRunner.run(Bytecode PC:20) at org/eclipse/ui/editors/text/TextFileDocumentProvider.executeOperation(Bytecode PC:16) at org/eclipse/ui/editors/text/TextFileDocumentProvider.saveDocument(Bytecode PC:24) at org/eclipse/ui/texteditor/AbstractTextEditor.performSave(Bytecode PC:44) at org/eclipse/jdt/internal/ui/javaeditor/CompilationUnitEditor.performSave(Bytecode PC:32) at org/eclipse/jdt/internal/ui/javaeditor/CompilationUnitEditor.doSave(Bytecode PC:115) at org/eclipse/ui/texteditor/AbstractTextEditor$TextEditorSavable.doSave(Bytecode PC:7) at org/eclipse/ui/Saveable.doSave(Bytecode PC:2) at org/eclipse/ui/internal/SaveableHelper.doSaveModel(Bytecode PC:70) at org/eclipse/ui/internal/SaveableHelper$2.run(Bytecode PC:84) at org/eclipse/ui/internal/SaveableHelper$4.run(Bytecode PC:7) at org/eclipse/jface/operation/ModalContext.runInCurrentThread(Bytecode PC:8) at org/eclipse/jface/operation/ModalContext.run(Bytecode PC:48) at org/eclipse/jface/window/ApplicationWindow$1.run(Bytecode PC:19) at org/eclipse/swt/custom/BusyIndicator.showWhile(Bytecode PC:118) at org/eclipse/jface/window/ApplicationWindow.run(Bytecode PC:302) at org/eclipse/ui/internal/WorkbenchWindow.run(Bytecode PC:170) at org/eclipse/ui/internal/SaveableHelper.runProgressMonitorOperation(Bytecode PC:24) at org/eclipse/ui/internal/SaveableHelper.runProgressMonitorOperation(Bytecode PC:4) at org/eclipse/ui/internal/SaveableHelper.saveModels(Bytecode PC:83) at org/eclipse/ui/internal/SaveableHelper.savePart(Bytecode PC:176) at org/eclipse/ui/internal/EditorManager.savePart(Bytecode PC:7) at org/eclipse/ui/internal/WorkbenchPage.savePart(Bytecode PC:7) at org/eclipse/ui/internal/WorkbenchPage.saveEditor(Bytecode PC:4) at org/eclipse/ui/internal/SaveAction.run(Bytecode PC:71) at org/eclipse/jface/action/Action.runWithEvent(Bytecode PC:1) at org/eclipse/jface/action/ActionContributionItem.handleWidgetSelection(Bytecode PC:356) at org/eclipse/jface/action/ActionContributionItem.access$2(Bytecode PC:3) at org/eclipse/jface/action/ActionContributionItem$6.handleEvent(Bytecode PC:60) at org/eclipse/swt/widgets/EventTable.sendEvent(Bytecode PC:216(Compiled Code)) at org/eclipse/swt/widgets/Widget.sendEvent(Bytecode PC:25(Compiled Code)) at org/eclipse/swt/widgets/Display.runDeferredEvents(Bytecode PC:84(Compiled Code)) at org/eclipse/swt/widgets/Display.readAndDispatch(Bytecode PC:59(Compiled Code)) at org/eclipse/ui/internal/Workbench.runEventLoop(Bytecode PC:9(Compiled Code)) at org/eclipse/ui/internal/Workbench.runUI(Bytecode PC:393) at org/eclipse/ui/internal/Workbench.access$4(Bytecode PC:1) at org/eclipse/ui/internal/Workbench$5.run(Bytecode PC:23) at org/eclipse/core/databinding/observable/Realm.runWithDefault(Bytecode PC:14) at org/eclipse/ui/internal/Workbench.createAndRunWorkbench(Bytecode PC:18) at org/eclipse/ui/PlatformUI.createAndRunWorkbench(Bytecode PC:2) at org/eclipse/ui/internal/ide/application/IDEApplication.start(Bytecode PC:84) at org/eclipse/equinox/internal/app/EclipseAppHandle.run(Bytecode PC:137) at org/eclipse/core/runtime/internal/adaptor/EclipseAppLauncher.runApplication(Bytecode PC:105) at org/eclipse/core/runtime/internal/adaptor/EclipseAppLauncher.start(Bytecode PC:29) at org/eclipse/core/runtime/adaptor/EclipseStarter.run(Bytecode PC:149) at org/eclipse/core/runtime/adaptor/EclipseStarter.run(Bytecode PC:183) at sun/reflect/NativeMethodAccessorImpl.invoke0(Native Method) at sun/reflect/NativeMethodAccessorImpl.invoke(Bytecode PC:83) at sun/reflect/DelegatingMethodAccessorImpl.invoke(Bytecode PC:6) at java/lang/reflect/Method.invoke(Bytecode PC:163) at org/eclipse/equinox/launcher/Main.invokeFramework(Bytecode PC:211) at org/eclipse/equinox/launcher/Main.basicRun(Bytecode PC:114) at org/eclipse/equinox/launcher/Main.run(Bytecode PC:4) at org/eclipse/equinox/launcher/Main.main(Bytecode PC:10)
Comment 1 Eric Moffatt CLA 2010-04-20 16:02:22 EDT
Christopher, do you have any idea of where the deadlock can be fixed? I'm trying to triage this defect but can't determine who to send it to for a fix...
Comment 2 Dani Megert CLA 2010-04-21 03:13:19 EDT
Is it really a deadlock? Please attach the complete stack dump.
Comment 3 Krzysztof Daniel CLA 2010-04-21 03:20:32 EDT
Created attachment 165530 [details]
Javacore + Jobs tracing
Comment 4 Krzysztof Daniel CLA 2010-04-21 03:33:59 EDT
If this is not a deadlock caused by the fact that two WorkspaceModifyOperations are trying to use the same rule, than I'd expect it is VM issue.

I believe that any comment from resources team may put more light on the cause.
Comment 5 Dani Megert CLA 2010-04-21 10:44:18 EDT
It's indeed a deadlock: when the user requests to save file F, the modify rule for F is taken. While the save is still in progress, a progress monitor spins the event loop. Parallel to that Debug launches in a separate thread which posts a runnable into the UI thread (save all editors). Since the event loop is running, this gets executed while save is still in progress and there it tries to acquire the same modify rule for F which now blocks the UI thread.

Moving to Debug where org.eclipse.debug.internal.ui.sourcelookup.Prompter lives.

John, any good advice for Debug?
Comment 6 Dani Megert CLA 2010-04-21 10:48:19 EDT
See bug 126630 - which obviously didn't get fully fixed.
Comment 7 Krzysztof Daniel CLA 2010-04-23 02:59:11 EDT
This issue is slightly different from the bug 126630 - there was a problem with sync/async exec and the workspace rule (two threads, two synchronization mechanisms).

Here we have only one thread that locks the resource, spins event loop and tries to lock the same resource for the second time. It is difficult to come up with a fix because all components work correctly except the fact that the output is wrong.

Debug cannot know beforehand if it is safe to be run or not, because it wait only for an UI event to be executed.

We cannot prevent EventLoopProgressMonitor from processing UI events.

We cannot allow for proceeding with debug save, even if this is the same rule and the same thread, because the later save may be different from the first one, or even it does not always have to be a save operation.

On the other hand, it should be possible to discover this kind of deadlock and break the second save operation. This may have unpredictable consequences if debug relies on 'Save' behavior.

Any suggestions are welcome.
Comment 8 Krzysztof Daniel CLA 2010-04-26 06:40:51 EDT
Created attachment 166059 [details]
First approach to solve the problem

I am thinking about modifying JobManager#runNow to allow running a job which is a type of ThreadJob, has compatible scheduling rule and is called from the same thread. Not sure if this will work, though. Maybe JobManager.findBlockingJob could be a better place to solve this.
Comment 9 Krzysztof Daniel CLA 2010-05-04 07:01:14 EDT
Setting the severity to something more appropriate.
Comment 10 Krzysztof Daniel CLA 2010-05-04 07:09:38 EDT
Created attachment 166933 [details]
CorrectedPatch - still first attempt to solve the problem

Previous patch was completely wrong
Comment 11 Krzysztof Daniel CLA 2010-05-04 07:10:20 EDT
Created attachment 166934 [details]
Plugin for Eclipse 3.4.2 for testing
Comment 12 John Arthorne CLA 2010-05-04 09:53:11 EDT
The problem must be more complex. When the UI thread is blocked waiting for a rule, it allows syncExecs to run. The thread I'm looking at blocks on:

at org/eclipse/core/internal/jobs/ThreadJob.joinRun(Bytecode PC:277(Compiled Code))

This method sits in a loop waiting for the rule. If interrupted, it calls UILockListener which processes any pending syncExecs. The code in the attached patch should not be required for this simple kind of deadlock.
Comment 13 Krzysztof Daniel CLA 2010-05-05 05:43:44 EDT
John is right. I have written a small test case and it is possible to lock the same rule twice using workspace.prepareOperation.

Let's compare then 4 stacktraces

1. 
Worker-24: Waiting on condition (sleeps)
Worker-26: Blocked by main, waiting on lock, LastReferenceProvider#ReadJob-> ThreadJob.joinRun
main: Waiting on condition, Owns the lock, ThreadJob.joinRun

2. 
Worker-24: Blocked, StringPoolJob.run->ThreadJob.joinRun
Worker-26: Blocked by main, waiting on lock, ThreadJob.joinRun
main: Waiting on condition, Owns the lock, ThreadJob.joinRun

3.
Worker-24: Waiting on condition, Owns the lock, ThreadJob.joinRun
Worker-26: Waiting on condition, waiting on lock, ThreadJob.joinRun
main: Blocked by Worker-24, waiting on lock, ThreadJob.joinRun

4.
Worker-24: Blocked, ThreadJob.joinRun
Worker-26: Waiting on condition, Owns the lock, ThreadJob.joinRun
main: Blocked by Worker-26, waiting on lock, ThreadJob.joinRun

It looks like the thread that owns a lock waits.  Other threads cannot run then. After the lock is released, another thread acquires it and starts waiting. Is it possible that the rule is transferred between those threads, and we are in fact affected by the bug 283449? I can imagine a situation in which we wake always a thread without a rule, than transfer the rule, miss it and keep spinning. Does it sound sane?
Comment 14 John Arthorne CLA 2010-05-07 16:18:52 EDT
Krzysztof, what version of Eclipse are you running? The bug says 3.4.2, but you mention possibly being affected by bug 283449, which is a new feature that was added in Eclipse 3.6.
Comment 15 Krzysztof Daniel CLA 2010-05-10 02:55:08 EDT
I am using 3.4.2.

Bug 283449 description says: 
"[...] a *thread* can not transfer a rule to Job that is trying to acquire
it. [...] if a *different* Job wanted to transfer its rule to another
Job, it will not work."

I know it was fixed in 3.6. I am just wondering if it is the same issue.
Comment 16 Krzysztof Daniel CLA 2010-05-14 08:09:45 EDT
I have parsed the jobs tracing to find which rules are locked, but are never unlocked. The results are:

main :
 18:26:35.320 L/software/src/com/C1.java
 18:30:21.617 L/software/src/com/C1.java
Worker-26 :
 18:26:26.758 L/software/src/com/C1.java
Worker-24 :
 18:31:20.758 R/

The first javacore is already taken when Worker-24 tries to lock the workspace, so the deadlock had already happened.

The stacktrace says that main acquires the first lock (successfully), and proceeds with the execution. In the meantime Worker-26 tries to lock the file (waits) and then again main (waits). Is it possible that main is not able to detect that it has the rule, because other thread is waiting for it?.
Comment 17 Krzysztof Daniel CLA 2010-05-20 07:14:49 EDT
New stacktraces gathered after reproducing the issue on pure Eclipse 36M7 + Tomcat:


"Worker-20" prio=6 tid=0x077fe400 nid=0xfb4 in Object.wait() [0x0b38f000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at org.eclipse.ui.internal.Semaphore.acquire(Semaphore.java:43)
        - locked <0x1bb35b90> (a org.eclipse.ui.internal.Semaphore)
        at org.eclipse.ui.internal.UISynchronizer.syncExec(UISynchronizer.java:168)
        at org.eclipse.swt.widgets.Display.syncExec(Display.java:4584)
        at org.eclipse.debug.internal.ui.sourcelookup.Prompter.handleStatus(Prompter.java:79)
        at org.eclipse.debug.core.model.LaunchConfigurationDelegate.saveBeforeLaunch(LaunchConfigurationDelegate.java:256)
        at org.eclipse.debug.core.model.LaunchConfigurationDelegate.preLaunchCheck(LaunchConfigurationDelegate.java:208)
        at org.eclipse.jdt.launching.AbstractJavaLaunchConfigurationDelegate.preLaunchCheck(AbstractJavaLaunchConfigurationDelegat
e.java:921)
        at org.eclipse.debug.internal.core.LaunchConfiguration.launch(LaunchConfiguration.java:808)
        at org.eclipse.debug.internal.core.LaunchConfiguration.launch(LaunchConfiguration.java:702)
        at org.eclipse.debug.internal.core.LaunchConfiguration.launch(LaunchConfiguration.java:695)
        at org.eclipse.wst.server.core.internal.Server.startImpl2(Server.java:3160)
        at org.eclipse.wst.server.core.internal.Server.startImpl(Server.java:3110)
        at org.eclipse.wst.server.core.internal.Server$StartJob.run(Server.java:358)
        at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)




"Worker-15" prio=6 tid=0x05038800 nid=0x12f0 runnable [0x0a78f000]
   java.lang.Thread.State: RUNNABLE
        at org.eclipse.ui.internal.progress.ProgressManager.internalGetJobInfo(ProgressManager.java:694)
        at org.eclipse.ui.internal.progress.ProgressManager$JobMonitor.isCanceled(ProgressManager.java:256)
        at org.eclipse.core.internal.jobs.ThreadJob.isCanceled(ThreadJob.java:146)
        at org.eclipse.core.internal.jobs.ThreadJob.waitForRun(ThreadJob.java:235)
        at org.eclipse.core.internal.jobs.ThreadJob.joinRun(ThreadJob.java:199)
        at org.eclipse.core.internal.jobs.ImplicitJobs.begin(ImplicitJobs.java:92)
        at org.eclipse.core.internal.jobs.JobManager.beginRule(JobManager.java:285)
        at org.eclipse.core.internal.utils.StringPoolJob.run(StringPoolJob.java:99)
        at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)





"main" prio=6 tid=0x00a46800 nid=0x780 runnable [0x0012d000]
   java.lang.Thread.State: RUNNABLE
        at org.eclipse.ui.internal.UILockListener.isUI(UILockListener.java:183)
        at org.eclipse.ui.internal.UILockListener.aboutToWait(UILockListener.java:117)
        at org.eclipse.core.internal.jobs.LockManager.aboutToWait(LockManager.java:123)
        at org.eclipse.core.internal.jobs.ThreadJob.waitForRun(ThreadJob.java:258)
        at org.eclipse.core.internal.jobs.ThreadJob.joinRun(ThreadJob.java:199)
        at org.eclipse.core.internal.jobs.ImplicitJobs.begin(ImplicitJobs.java:92)
        at org.eclipse.core.internal.jobs.JobManager.beginRule(JobManager.java:285)
        at org.eclipse.core.internal.resources.WorkManager.checkIn(WorkManager.java:117)
        at org.eclipse.core.internal.resources.Workspace.prepareOperation(Workspace.java:1914)
        at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:1970)
        at org.eclipse.ui.actions.WorkspaceModifyOperation.run(WorkspaceModifyOperation.java:118)
        - locked <0x12c40098> (a org.eclipse.ui.actions.WorkspaceModifyDelegatingOperation)
        at org.eclipse.ui.internal.editors.text.WorkspaceOperationRunner.run(WorkspaceOperationRunner.java:75)
        at org.eclipse.ui.internal.editors.text.WorkspaceOperationRunner.run(WorkspaceOperationRunner.java:65)
        at org.eclipse.ui.editors.text.TextFileDocumentProvider.executeOperation(TextFileDocumentProvider.java:456)
        at org.eclipse.ui.editors.text.TextFileDocumentProvider.saveDocument(TextFileDocumentProvider.java:772)
        at org.eclipse.ui.texteditor.AbstractTextEditor.performSave(AbstractTextEditor.java:4868)
        at org.eclipse.jdt.internal.ui.javaeditor.CompilationUnitEditor.performSave(CompilationUnitEditor.java:1230)
        at org.eclipse.jdt.internal.ui.javaeditor.CompilationUnitEditor.doSave(CompilationUnitEditor.java:1283)
        - locked <0x1b151a68> (a org.eclipse.jdt.internal.core.CompilationUnit)
        at org.eclipse.ui.texteditor.AbstractTextEditor$TextEditorSavable.doSave(AbstractTextEditor.java:6991)
        at org.eclipse.ui.Saveable.doSave(Saveable.java:214)
        at org.eclipse.ui.internal.SaveableHelper.doSaveModel(SaveableHelper.java:349)
        at org.eclipse.ui.internal.EditorManager$8.run(EditorManager.java:1239)
        at org.eclipse.ui.internal.SaveableHelper$5.run(SaveableHelper.java:277)
        at org.eclipse.jface.operation.ModalContext.runInCurrentThread(ModalContext.java:464)
        at org.eclipse.jface.operation.ModalContext.run(ModalContext.java:372)
        at org.eclipse.jface.window.ApplicationWindow$1.run(ApplicationWindow.java:759)
        at org.eclipse.swt.custom.BusyIndicator.showWhile(BusyIndicator.java:70)
        at org.eclipse.jface.window.ApplicationWindow.run(ApplicationWindow.java:756)
        at org.eclipse.ui.internal.WorkbenchWindow.run(WorkbenchWindow.java:2600)
        at org.eclipse.ui.internal.SaveableHelper.runProgressMonitorOperation(SaveableHelper.java:285)
        at org.eclipse.ui.internal.EditorManager.saveAll(EditorManager.java:1249)
        at org.eclipse.ui.internal.Workbench$19.run(Workbench.java:1174)
        at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42)
        at org.eclipse.ui.internal.Workbench.saveAllEditors(Workbench.java:1123)
        at org.eclipse.debug.internal.ui.DebugUIPlugin.saveAllEditors(DebugUIPlugin.java:720)
        at org.eclipse.debug.internal.ui.DebugUIPlugin.preLaunchSave(DebugUIPlugin.java:903)
        at org.eclipse.debug.internal.ui.launchConfigurations.SaveScopeResourcesHandler.handleStatus(SaveScopeResourcesHandler.jav
a:205)
        at org.eclipse.debug.internal.ui.sourcelookup.Prompter$1.run(Prompter.java:70)
        at org.eclipse.ui.internal.UILockListener.doPendingWork(UILockListener.java:164)
        at org.eclipse.ui.internal.UILockListener.aboutToWait(UILockListener.java:126)
        at org.eclipse.core.internal.jobs.LockManager.aboutToWait(LockManager.java:123)
        at org.eclipse.core.internal.jobs.ThreadJob.waitForRun(ThreadJob.java:258)
        at org.eclipse.core.internal.jobs.ThreadJob.joinRun(ThreadJob.java:199)
        at org.eclipse.core.internal.jobs.ImplicitJobs.begin(ImplicitJobs.java:92)
        at org.eclipse.core.internal.jobs.JobManager.beginRule(JobManager.java:285)
        at org.eclipse.core.internal.resources.WorkManager.checkIn(WorkManager.java:117)
        at org.eclipse.core.internal.resources.Workspace.prepareOperation(Workspace.java:1914)
        at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:1970)
        at org.eclipse.ui.actions.WorkspaceModifyOperation.run(WorkspaceModifyOperation.java:118)
        - locked <0x1b8a6348> (a org.eclipse.ui.actions.WorkspaceModifyDelegatingOperation)
        at org.eclipse.ui.internal.editors.text.WorkspaceOperationRunner.run(WorkspaceOperationRunner.java:75)
        at org.eclipse.ui.internal.editors.text.WorkspaceOperationRunner.run(WorkspaceOperationRunner.java:65)
        at org.eclipse.ui.editors.text.TextFileDocumentProvider.executeOperation(TextFileDocumentProvider.java:456)
        at org.eclipse.ui.editors.text.TextFileDocumentProvider.saveDocument(TextFileDocumentProvider.java:772)
        at org.eclipse.ui.texteditor.AbstractTextEditor.performSave(AbstractTextEditor.java:4868)
        at org.eclipse.jdt.internal.ui.javaeditor.CompilationUnitEditor.performSave(CompilationUnitEditor.java:1230)
        at org.eclipse.jdt.internal.ui.javaeditor.CompilationUnitEditor.doSave(CompilationUnitEditor.java:1283)
        - locked <0x1b151a68> (a org.eclipse.jdt.internal.core.CompilationUnit)
        at org.eclipse.ui.texteditor.AbstractTextEditor$TextEditorSavable.doSave(AbstractTextEditor.java:6991)
        at org.eclipse.ui.Saveable.doSave(Saveable.java:214)
        at org.eclipse.ui.internal.SaveableHelper.doSaveModel(SaveableHelper.java:349)
        at org.eclipse.ui.internal.SaveableHelper$3.run(SaveableHelper.java:195)
        at org.eclipse.ui.internal.SaveableHelper$5.run(SaveableHelper.java:277)
        at org.eclipse.jface.operation.ModalContext.runInCurrentThread(ModalContext.java:464)
        at org.eclipse.jface.operation.ModalContext.run(ModalContext.java:372)
        at org.eclipse.jface.window.ApplicationWindow$1.run(ApplicationWindow.java:759)
        at org.eclipse.swt.custom.BusyIndicator.showWhile(BusyIndicator.java:70)
        at org.eclipse.jface.window.ApplicationWindow.run(ApplicationWindow.java:756)
        at org.eclipse.ui.internal.WorkbenchWindow.run(WorkbenchWindow.java:2600)
        at org.eclipse.ui.internal.SaveableHelper.runProgressMonitorOperation(SaveableHelper.java:285)
        at org.eclipse.ui.internal.SaveableHelper.runProgressMonitorOperation(SaveableHelper.java:264)
        at org.eclipse.ui.internal.SaveableHelper.saveModels(SaveableHelper.java:207)
        at org.eclipse.ui.internal.SaveableHelper.savePart(SaveableHelper.java:144)
        at org.eclipse.ui.internal.EditorManager.savePart(EditorManager.java:1369)
        at org.eclipse.ui.internal.WorkbenchPage.savePart(WorkbenchPage.java:3334)
        at org.eclipse.ui.internal.WorkbenchPage.saveEditor(WorkbenchPage.java:3347)
        at org.eclipse.ui.internal.SaveAction.run(SaveAction.java:76)
        at org.eclipse.jface.action.Action.runWithEvent(Action.java:498)
        at org.eclipse.ui.commands.ActionHandler.execute(ActionHandler.java:185)
        at org.eclipse.ui.internal.handlers.LegacyHandlerWrapper.execute(LegacyHandlerWrapper.java:109)
        at org.eclipse.core.commands.Command.executeWithChecks(Command.java:476)
        at org.eclipse.core.commands.ParameterizedCommand.executeWithChecks(ParameterizedCommand.java:508)
        at org.eclipse.ui.internal.handlers.HandlerService.executeCommand(HandlerService.java:169)
        at org.eclipse.ui.internal.keys.WorkbenchKeyboard.executeCommand(WorkbenchKeyboard.java:468)
        at org.eclipse.ui.internal.keys.WorkbenchKeyboard.press(WorkbenchKeyboard.java:786)
        at org.eclipse.ui.internal.keys.WorkbenchKeyboard.processKeyEvent(WorkbenchKeyboard.java:885)
        at org.eclipse.ui.internal.keys.WorkbenchKeyboard.filterKeySequenceBindings(WorkbenchKeyboard.java:567)
        at org.eclipse.ui.internal.keys.WorkbenchKeyboard.access$3(WorkbenchKeyboard.java:508)
        at org.eclipse.ui.internal.keys.WorkbenchKeyboard$KeyDownFilter.handleEvent(WorkbenchKeyboard.java:123)
        at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
        at org.eclipse.swt.widgets.Display.filterEvent(Display.java:1253)
        at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1051)
        at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1076)
        at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1061)
        at org.eclipse.swt.widgets.Widget.sendKeyEvent(Widget.java:1102)
        at org.eclipse.swt.widgets.Widget.sendKeyEvent(Widget.java:1098)
        at org.eclipse.swt.widgets.Widget.wmChar(Widget.java:1507)
        at org.eclipse.swt.widgets.Control.WM_CHAR(Control.java:4267)
        at org.eclipse.swt.widgets.Canvas.WM_CHAR(Canvas.java:345)
        at org.eclipse.swt.widgets.Control.windowProc(Control.java:4159)
        at org.eclipse.swt.widgets.Canvas.windowProc(Canvas.java:341)
        at org.eclipse.swt.widgets.Display.windowProc(Display.java:4873)
        at org.eclipse.swt.internal.win32.OS.DispatchMessageW(Native Method)
        at org.eclipse.swt.internal.win32.OS.DispatchMessage(OS.java:2459)
        at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3655)
        at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2601)
        at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2565)
        at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2399)
        at org.eclipse.ui.internal.Workbench$7.run(Workbench.java:669)
        at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332)
        at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:662)
        at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)
        at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:115)
        at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
        at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
        at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
        at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:369)
        at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:619)
        at org.eclipse.equinox.launcher.Main.basicRun(Main.java:574)
        at org.eclipse.equinox.launcher.Main.run(Main.java:1407)
Comment 18 Krzysztof Daniel CLA 2010-05-20 07:59:29 EDT
Now it looks more reasonable.

Thread 20 owns the resource rule and tries to syncExec, while main keeps the UI thread occupied and waits for the resource rule.
Comment 19 Darin Wright CLA 2010-05-20 14:39:39 EDT
(In reply to comment #18)
> Now it looks more reasonable.
> Thread 20 owns the resource rule and tries to syncExec, while main keeps the UI
> thread occupied and waits for the resource rule.

I must be missing something... I can't see that Thread 20 owns the resource rule... it's just performing a launch and attempting a synchExec. So it will wait for the UI thread. It owns monitor 0x1bb35b90, but I don't see where the thread has obtained any resource locks with beginRule(...), etc..
Comment 20 Krzysztof Daniel CLA 2010-05-24 08:30:14 EDT
Results of debuging session

0. The dialog shows that 'Starting Tomcat v6.0 Server at localhost' is running, while the workbench says that it is 'Saving modifications'.

1. Thread-7 is a server StartJob with the rule P/Server. The rule is set inside job constructor. The job is waiting inside UISynchronizer.syncExec.

2. Main, in JobManager#findReturnJob(ThreadJob) gets 
   Starting Tomcat v6.0 Server at localhost(366)  (Server$StartJob) - no conflict
   previous: Updating status for Tomcat v6.0 Server at localhost...(529) (Server$ResourceChangedJob) - really conflicts with ThreadJob,

so, Thread-7 is running StartJob, and it must have P/Server rule locked, although I have no idea why it is not shown in the stack trace. This threads also tries to execute Display.syncExec and is waiting for the event to be processed.

On the other hand, Server$ResourceChangedJob is scheduled and does not have a thread field set. More, it cannot be run until Server$StartJob is finished. It is also blocking the main from proceeding.
Comment 21 Krzysztof Daniel CLA 2010-05-24 08:34:19 EDT
Created attachment 169670 [details]
A temporal fix

The patch prevents reacting to changes if the server is restarting. Changes will not be noticed unless they happen after the server has been started. That's still better then deadlock.
Comment 22 Krzysztof Daniel CLA 2010-05-24 08:45:46 EDT
Created attachment 169671 [details]
Patch that defers republishing until the server is started

I am not sure however if it is really good approach.
Comment 23 Krzysztof Daniel CLA 2010-05-25 04:10:24 EDT
Created attachment 169780 [details]
Patch for 3.0 stream
Comment 24 Krzysztof Daniel CLA 2010-05-25 04:11:17 EDT
Created attachment 169781 [details]
Plugin for testing (3.0.5 stream)
Comment 25 Angel Vera CLA 2010-05-27 11:52:10 EDT
I had a long nice explanation of the thoughts on my head about this issue, but my browser crashed. So I will give try a more condensed approach and see where it takes us.

From my observation, and I could be wrong on my thinking so please correct me if I am wrong, 

1. I think this situation can be manifested if any job that has a rule on a workspace resource and then makes a calls to launchConfig.launch. Thus, other jobs: Server.PublishJob or Server.StopJob, can potentially have similar problems, the only reason why they currently don't is because they don't need a launch. The reason why I bring this up is because recently someone reported to me some similar hang with the PublishJob. There is no bugzilla for that problem, but I wonder if this was some adopter doing some sort of similar syncExec call or some other scheduling rule that can make the code fall on this trap.

2. From our side of things in Server.startImpl2(), we create a launchConfig and save it before calling launchConfig.launch, so why do they need to save? is it possible for them to do a check to see if something has change, prompt to save, otherwise continue normal execution.

3. During your debugging did check if the editor.save was trying to adquire a rule on the entire workspace? I saw some code that sets the workspace as a rule when is only editing one file. It would appear to me like a wide scope for a small change.

4. In comment #17, I see that "Worker-20" is inside of

> org.eclipse.debug.internal.ui.sourcelookup.Prompter.handleStatus(Prompter.java:79)

and I also see "main" in: 

> org.eclipse.debug.internal.ui.launchConfigurations.SaveScopeResourcesHandler.handleStatus(SaveScopeResourcesHandler.java:205) at
> org.eclipse.debug.internal.ui.sourcelookup.Prompter$1.run(Prompter.java:70)

is not clear to me if this is what you meant in comment #18 as "now it looks more resonable"
Comment 26 Angel Vera CLA 2010-05-27 17:15:39 EDT
I am re-routing this to the platform, although the patch resolved the problem it only resolves the problem for the StartJob, there is other places where the problem can happen as described on bullet 1 comment #25.
Comment 27 Krzysztof Daniel CLA 2010-05-28 05:06:11 EDT
I think you can safely ignore everything what I wrote before comment 20 (except the stacktraces) - this was rather long way to discover what actually was happening.

The Server.StartJob holds the rule "P/Servers" and tries to ask the user if the file should be saved. This is because at the moment of check the file was dirty. It executes Display.syncExec and waits until this method returns. The debug result from comment 16 says that this job owns also the rule "File" however I have no idea where it was obtained.

Unfortunately the UI thread is at this moment busy, because the user has pressed Ctrl-S. In the response to resource change, the Server.PublishJob is scheduled with rules "P/Servers, Module". UI thread is not able to acquire the "File" rule to actually save the file. At this point the deadlock is bound to happen and nothing else can be done with it. In a second the UI thread will try to process the remaining UI event from the Server.StartJob, but it cannot success and will lead to UI hang.

The only question which is open is how it is possible that Server.StartJob blocks the save from happening. The only explanation which I have is that Server.StartJob blocks the Server.PublishJob and therefore is seen as blocking the rule (this is actually confirmed because my patch works).

Given that, the issue cannot happen for StopJob - because stop job will never acquire any rule other then "P/Servers". The PublishJob also looks safe, because if it runs in auto mode it does not prompt for save. If it is invoked manually, it prompts for save before the job is scheduled.

For me, the root cause is this rule transferring. On the other hand, changing the order of jobs could result in postponing very demanding jobs forever. This kind of deadlock should be discoverable but I do not think this is something that can be done for Eclipse 3.6.x.
Comment 28 Krzysztof Daniel CLA 2010-06-03 06:54:39 EDT
Created attachment 170930 [details]
Simplified test case

This testcase illustrates my understanding of the situation.
Comment 29 Krzysztof Daniel CLA 2010-06-03 08:27:38 EDT
The test case proves in my opinion that the issue cannot be fixed in Eclipse Platform, because it would require changing the order of scheduled operations. This sounds tempting, but we would have to guarantee that the job will be finally executed, which may be difficult (how long can we postpone a job?) and may give unpredictable results (we could f.e. switch server start and publish).

Even if it change will turn up to be logically correct, such a change cannot be released into Eclipse 3.6.

Right now I am fully convinced that my patch is a proper approach.
Comment 30 Darin Wright CLA 2010-06-03 11:05:30 EDT
Created attachment 170967 [details]
modified project (zip)
Comment 31 Darin Wright CLA 2010-06-03 11:10:50 EDT
I modified the example to avoid the deadlock by using a job in the sync exec rather than a workspace modify operation. Of course, that allows the sync exec to complete... if I try to join on "job3" I still get the deadlock.

John, I'm not sure why job3 is not allowed to proceed, since the running job1 has a lock on project1, and a queued job needs a lock on project1 & project2, but the job3 only needs a lock on project2. I would expect that job3 could proceed with a lock on project2?
Comment 32 Darin Wright CLA 2010-06-03 11:12:30 EDT
Created attachment 170969 [details]
deadlock example 2

This example still causes the deadlock, using all jobs rather than a workspace runnable.
Comment 33 John Arthorne CLA 2010-06-03 11:45:14 EDT
(In reply to comment #31)
> John, I'm not sure why job3 is not allowed to proceed, since the running job1
> has a lock on project1, and a queued job needs a lock on project1 & project2,
> but the job3 only needs a lock on project2. I would expect that job3 could
> proceed with a lock on project2?

From the javadoc of Job#schedule:

 * Jobs of equal priority and <code>delay</code> with conflicting scheduling 
 * rules are guaranteed to run in the order they are scheduled. 

If we allowed jobs to leapfrog ahead of others it would violate this principle. This guarantee is very important for clients because there might be an ordering dependency between two jobs (say for example a first job creates a resource and second one modifies it).

As with Krzysztof I'm not seeing a platform bug here in either of these two examples.
Comment 34 Darin Wright CLA 2010-06-03 11:55:33 EDT
(In reply to comment #33)
>  * Jobs of equal priority and <code>delay</code> with conflicting scheduling 
>  * rules are guaranteed to run in the order they are scheduled. 

Thanks, that explains it. I think this bug should be moved back to WST.
Comment 35 Krzysztof Daniel CLA 2010-06-04 08:49:12 EDT
moving to webtools as it is confirmed it is not platform bug.
Comment 36 Angel Vera CLA 2010-06-07 16:59:44 EDT
Christoper, 
I had a chance to seat down and understand the details on this bug today. From comment #27, I just Is there anyway to find out on what File is the StartJob locked on? Not to say that it will resolve the problem if we identify which file, but I would like to understand all the details.
Comment 37 Krzysztof Daniel CLA 2010-06-08 01:41:37 EDT
All locks are either on modified file (save) or on a project containing the file (publish).
Comment 38 Krzysztof Daniel CLA 2010-06-08 11:36:52 EDT
Clear explanation of the cause:

server is being started. P/Server is locked. 
the user modifies the file and saves it. Resource change is generated, and in the result PublishJob is started. PublishJob will lock the whole parent project of the file.  
Now if the save will appear again, from different thread, it cannot complete, because PublishJob is locking the parent project. 
PublishJob cannot complete until StarJob completes. StartJob is waiting for Save, which is waiting for PublishJob 
The save happens twice. Once triggered by user, and once scheduled by debug.

Patch justification:

Very theoretically would it be possible to patch debug to check if the file is really dirty in the ui thread before triggering the save for the second time. However this does not solve the issue completely, because the user could modify the file again (WebSphere needs almost 5 minutes to start), and the scope of changes would be unnacceptable. That's why patch for WTP has been proposed, which postpones scheduling publishjob until the startserver is done, and the save is no longer blocked.
Comment 39 Angel Vera CLA 2010-07-13 15:29:55 EDT
The suggested changes causes a second publish that we don't fully agree with, so this should be fixed from the Debug side by removing the saveAll that occurs since that save is not required in this launch because there is nothing that has been modified.

Also is this related to Bugzilla# 318996, if so I will pass this to the platform?

The scenario with the patch on goes something like: 
RCJ=ResourceChangeJob

1) StartJob.run (server is now locked)
2) change happens, but start is not yet completed 
3) Operation change listener is created but waiting on StartJob.done to create the RCJ
4) server.publishAfterStart schedules a publish (waiting for server to be free, first in line for the Server lock)
5) StartJob.done
6) RCJ is now scheduled (but waiting on server to be free, second in line for the Server lock)
7) PublishJob.run
8) RCJ.run
9) publish occurs (second publish)

In the code as it is today, the publishAfterStart will pickup the changes because the ResourceChangeJob will be scheduled after 'the start operation' but 'before the publish from publishAfterStart' occurs.
Here is the sequence of events from the scenario without the suggested patch.

1) StartJob.run (server is now locked)
2) change happens, but start is not yet completed
3) RCJ is now scheduled waiting for server to be free)
4) server.publishAfterStart schedules a publish (waiting for server to be free, second in line)
5) StartJob.done
6) RCJ.run (PublisJob is still waiting on server to be free)
7) RCJ.done
8) PublishJob.run
Comment 40 Darin Wright CLA 2010-08-09 17:08:47 EDT
I inserted breakpoints into the launching and editor saving code to simulate the stack trace shown in the original bug report comment. However, I was unable to force a deadlock. I still don't see a platform bug here.
Comment 41 Darin Wright CLA 2010-08-10 15:49:40 EDT
Created attachment 176274 [details]
patch for 3.7 or 3.6

Although I cannot reproduce the deadlock, this patch implements the desired behavior. When the user presses run or debug (or DebugUITools.launch(...) is called), the current set of dirty editors is cached and assoicated with the configuration being launched. Later, when the user is prompted to save editors, only the cached set of editors is considered for saving. This way only editors that were dirty at the time the launch was initiated are considered.
Comment 42 Darin Wright CLA 2010-08-10 15:50:55 EDT
Created attachment 176275 [details]
patch for 3.4.x
Comment 43 Dani Megert CLA 2010-08-13 07:14:57 EDT
The suggested patches are not good:
- If the same launch config is started several times but with a different
   set of currently open editors the last one will overwrite the cached
   info from the previous launches (which still run).

- The set of dirty files can change after being cached which results in a
  wrong dialog and wrong files being saved. As  a test case simply do this:
  0. add breakpoint to Prompter.handleStatus(IStatus, Object) line 79
  1. start workspace and open a file and make it dirty
  2. launch ==> breakpoint is hit in non-UI thread
  3. close the file (with or without saving)
  4. resume
  ==> dialog asks to save that file


Also, I can reproduce this very rarely with these steps:
1. breakpoint as above, set preference to save all without prompting
2. make file dirty and launch ==> breakpoint hit in non-UI thread
3. add breakpoints during save so that you end up in the progress monitor that
   drives the event loop
4. resume the first breakpoint
5. resume the other breakpoint
==> in case of bad luck the event loop processes the Save All before saving the file ==> lock due to same scheduling rule on file.
Comment 44 Darin Wright CLA 2010-08-13 11:53:01 EDT
Created attachment 176566 [details]
alternate patch for 3.7

Here's an alternate patch for 3.7. This patch immediately saves dirty editors before invoking a launch based on user preferences. The launch delegate no longer triggers the save.

Backporting this one is a little more tricky as it introduces new API on LaunchConfigurationDelegate to make the project scope visible. The new method #getProjectScope(...) delegates to an existing protected method #getBuildOrder(...) in order to save relevant projects. However, delegates may want to re-implement #getProjectScope(...) to be more efficient (since the order of projects is not important).
Comment 45 Darin Wright CLA 2010-08-18 14:52:50 EDT
Dani and I discussed the above patch, but decided it was a risky change - it removes the save behavior from the ILaunchConfiguration#launch(...) API and relies on implementation details of launch delegates #getBuildOrder(...) to return a set of projects associated with the launch.
Comment 46 Darin Wright CLA 2010-08-18 14:55:52 EDT
Created attachment 176931 [details]
simpler patch

This is a simpler fix that keeps track of any editors saved between pressing the launch button (or API calling DebugUITools.launch(...)), and when the debug platform prompts to save dirty editors. Any editors that have been saved by the user (or have initiated a save operation) via the "Save" or "Save All" commands are not considered when the debug platform saves - this avoids triggering a secondary save on a editor that is already being saved / been saved by the user. This patch is for 3.7.
Comment 47 Darin Wright CLA 2010-08-18 15:08:52 EDT
Created attachment 176932 [details]
eqivalent patch for 3.4.x

This is the same simpler patch for 3.4.x
Comment 48 Dani Megert CLA 2010-08-19 09:35:32 EDT
The general idea will work, but your patch is not good. You need to keep the list of saved editors per launch otherwise this will end up in a mess (at least that's what I think from looking at the code). Even worse, 'fLaunchCounter' will never reach 0 again once a 'Build in Progress' prompt has been canceled. Also, since the execution listener is invoked for every command execution in the workbench, we should first check for the two command IDs before starting anything else due have least impact on performance.
Comment 49 Darin Wright CLA 2010-08-19 12:59:22 EDT
I attempted to fabricate the problem by writing a bundle that has the following:

* A "Start Server" job that runs with a SERVER rule.
* The start server job creates a launch configuration and calls launch()
* A resource listener starts a "Publish" job whenever a file is modified. The Publish job is scheduled with the associated PROJECT and SERVER rules.
* The launch method delays for 5 seconds before the user is prompted to save resources, such that they can dirty/save a file during this time.

I found that this does cause a deadlock, but that the "User Operation is Waiting" dialog appears and the save (user operation) can be cancelled allowing the lanuch to proceed. Since the user save operation occurs in the UI thread, and the debug save operation occurs in the UI thread, only *one* save can occur at once.

The deadlock only happens when there are two dirty editors and the user saves one, and the launch saves the other. The second save triggers the problem (but it has to be saving a different editor, since the first was already saved).

The approach we were taking in the previous patches does not addres this problem. The patch avoided saving the same editor twice - but I don't see how that could ever happen since the saves are serial in the UI thread. As well, the patch was relying on the fact that the client was using the Run/Debug buttons, but in this case, the client is calling ILaunchConfiguration.launch (...) themselves.

My question is - does the user see the "User Operation is Waiting" dialog in this case? It has never been mentioned. I will attach my example bundle as well.
Comment 50 Darin Wright CLA 2010-08-19 12:59:57 EDT
Created attachment 177025 [details]
screen shot of "Waiting" dialog
Comment 51 Darin Wright CLA 2010-08-19 13:01:44 EDT
Created attachment 177026 [details]
project that can be imported to reproduce the problem
Comment 52 Darin Wright CLA 2010-08-19 13:31:03 EDT
My conclusion: I do not see how debug can be triggering a save for something that was already saved. Saving occurrs in the UI thread serially (user and then debug platform). The debug platform's save can not proceed until the previous (user) save has completed and the UI thread is free. However, debug can trigger a save of another resource in the same project if there are two dirty editors, and this does cause a deadlock (that can be averted by cancelling the save in the dialog that pops up).

The problem is that the start server job holds SERVER lock, while needing a PROJECT lock (to save a file). The publish job holds the PROJECT lock and needs the SERVER lock.

I do not see a way for the platform to fix this. Perhaps the publish job locking is too agressive? Could the publish job obtain the server lock before locking the project? This would allow resource modifications to proceed until the server is free.
Comment 53 Darin Wright CLA 2010-08-19 13:39:10 EDT
For those interested, here are steps to using the sample bundle to simulate a deadlock:

* Import the zip as an existing project, launch target workbench
* Have (or create) a project with 2 files in it (the files must be in the same project)
* Open an editor on each file
* Make each file dirty, but don't save them
* Trigger the launch by using the "Sample Menu > Sample Action"
* The message "LAUNCHING - SAVE A FILE NOW" will appear in the host console
* Use Ctrl-S to save one of the dirty editors (you have 5 seconds to do this)
* You will be prompted to save the next (or it will be saved automatically depending on preferences). Save it when (if) prompted
* "User Operation is Waiting" dialog appears
Comment 54 Krzysztof Daniel CLA 2010-08-19 13:54:22 EDT
The root cause of this bug is bug 318996, which in my opinion is not fixable in 3.x stream, and should be accepted as an API limitation. 

Only hacks can improve Eclipse behavior. Proper patch for debug or wtp may not exist, because problem has been lying in UI since Eclipse 2.0. We can only choose which component we want to hack. The fix for WTP seems to be much safer.
Comment 55 Dani Megert CLA 2010-08-20 07:02:28 EDT
Darin, I only quickly looked at the code and that will not produce a deadlock. I'm pretty sure you can stop one of the jobs in the waiting dialog and things go on.

Christopher, in the steps you mention in comment 0, did you try to stop one of the jobs in the dialog? If you are not sure, can you please retry it. In case it is a deadlock, when does it happen for your customer? Does it happen randomly or only when the 'Save' action is invoked?
Comment 56 Krzysztof Daniel CLA 2010-08-20 09:33:18 EDT
This is not a deadlock, but busy wait in ThreadJob.joinRun.
I am on vacation right now and I do not have access to the test machine, but I am pretty sure that the job will not get a chance to check if it was cancelled.
Comment 57 Dani Megert CLA 2010-08-20 09:36:01 EDT
>This is not a deadlock,
What do you mean by "this", Darin's example or this bug (which says "Deadlock...")?
Comment 58 Darin Wright CLA 2010-08-20 09:42:40 EDT
(In reply to comment #55)
> Darin, I only quickly looked at the code and that will not produce a deadlock.
> I'm pretty sure you can stop one of the jobs in the waiting dialog and things
> go on.

Correct - you can cancel the save operation and things will proceed.
Comment 59 Krzysztof Daniel CLA 2010-09-07 03:56:20 EDT
Ok, I am back, time to work.(In reply to comment #57)

Darin's example indeed produces a stacktrace very similar to the original report from comment 1. One file is also sufficient for reproducing the issue, although the modification-save cycle must be repeated twice.
When the 'The user operation is waiting for "publish" to complete' dialog is open, the busy cursor is also shown after fraction of the second, and therefore I could not terminate any save job.

My understanding of the issue is:

First thread processes one job and locks certain resource. Second thread request access to that resource. But the first thread stops processing the first job, and starts processing the third job, which also needs access to the same resource. 

First job cannot be processed anymore (although it owns the lock), because the thread is busy with the third job. The third job cannot be executed before the second job, which in turn waits for the resource owned by the first thread.

The same scenario, rewritten with Save, Server and ResourceChangeJob (rcj).

(1) StartServer notices the unsaved editor. It tries to save it using syncExec. It waits until the UI thread processes the event.
(2) At that time UI thread is busy, because a user invoked 'Save'. The 'Save' completes, the RCJ is scheduled.
(3) UI thread tries to proceed the save from StartServer. It cannot be done, because there is RCJ waiting for the resource and for the StartServer to complete.

The root cause is that Eclipse somehow allowed one thread to stop executing one job and execute another one. This is not something that can be workarounded in debug component, because even if you will check if the editor is saved, the user can modify/save the file many times and leave it dirty. The RCJ will be scheduled, and the StartJob will not be able to save the dirty file.
Comment 60 Darin Wright CLA 2010-09-07 09:40:56 EDT
Moving back to WST
Comment 61 Angel Vera CLA 2010-10-05 17:21:25 EDT
Created attachment 180289 [details]
v1.0

After several offline talks with the parties. We have decided on a possible hack into the servertools code to improve the usability of the product. 

I am attaching my suggested changes and marking a few of the patches attached as obsolete as they don't apply anymore.
Comment 62 Angel Vera CLA 2010-10-05 17:22:38 EDT
Daniel or Darin, 

Will this problem be fixed in the platform for 3.6x?
Comment 63 Darin Wright CLA 2010-10-05 21:47:42 EDT
(In reply to comment #62)
> Daniel or Darin, 
> Will this problem be fixed in the platform for 3.6x?

There is currently no plan item to fix this.

To fix this in the platform would require new API to properly separate saving and launching. Clients would need to migrate to the new API to get new behavior. As new API is required, this would not be a candidate for a maintenance release (i.e. 3.6.x).
Comment 64 Dani Megert CLA 2010-10-06 02:40:07 EDT
(In reply to comment #63)
> (In reply to comment #62)
> > Daniel or Darin, 
> > Will this problem be fixed in the platform for 3.6x?
> 
> There is currently no plan item to fix this.
> 
> To fix this in the platform would require new API to properly separate saving
> and launching. Clients would need to migrate to the new API to get new
> behavior. As new API is required, this would not be a candidate for a
> maintenance release (i.e. 3.6.x).

I agree with Darin: this is out of scope for 3.6.x and there are currently no plans to work on this for 3.7.
Comment 65 Carl Anderson CLA 2010-10-06 12:25:40 EDT
Committed patch v1.0 to R3_0_5_patches