Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 318996

Summary: Detect deadlock that involves syncExec
Product: [Eclipse Project] Platform Reporter: Krzysztof Daniel <krzysztof.daniel>
Component: UIAssignee: Platform UI Triaged <platform-ui-triaged>
Status: CLOSED WONTFIX QA Contact:
Severity: normal    
Priority: P3 CC: daniel_megert, jamesblackburn+eclipse, john.arthorne, kazm, krzysztof.daniel, remy.suen, sptaszkiewicz, tjbishop
Version: 3.4.2   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard: stalebug
Attachments:
Description Flags
Minimal test case none

Description Krzysztof Daniel CLA 2010-07-06 08:07:00 EDT
The scenario is the same:

(a) UIJob (1) tries to lock the rule. 
(b) Another job (2) tries to lock the same rule.
(c) EventLoopProgressMonitor.isCanceled is called (a part of UIJob processing). As a result, another UIJob (3) may try to lock the same rule.

The job (3) cannot complete, because (2) tried to lock the rule earlier.
(2) cannot complete, because (1) is running.
(1) cannot complete, because it is waiting for (3) to complete. 

This deadlock is undetectable by taking regular javacores.
Comment 1 Krzysztof Daniel CLA 2010-07-12 04:41:44 EDT
Fixing this bug will not be trivial. 
EventLoopProgressMonitor was introduced in order to improve the user experience during file save, which happens in the ui thread.

I believe that there are only two possible approaches here:
* change the way in which files are written (entirely get rid off the EventLoopProgressMonitor)
* detect the deadlock and find a way to proceeding.
Comment 2 James Blackburn CLA 2010-09-29 14:14:26 EDT
See also Bug 262032 and Bug 312179
Comment 3 Prakash Rangaraj CLA 2011-03-23 07:07:09 EDT
Not for this release. Will revisit later.
Comment 4 Szymon Ptaszkiewicz CLA 2013-03-22 12:17:02 EDT
Maybe I'm missing the point here, but how do you actually get UIJob with EventLoopProgressMonitor?
Comment 5 Krzysztof Daniel CLA 2013-03-22 12:54:28 EDT
Szymon, 

this bugs is 3 years old and it had some PMR attached to it ;-).

UIJob got into dialog when save was taking more time then expected f.e. a build is running or save is on filesystem... I don't remember exactly :-(, but https://www.cct.lsu.edu/~rguidry/eclipse-doc36/org/eclipse/ui/internal/dialogs/EventLoopProgressMonitor.html javadoc says that "Although save operations should be written to do the work in the non-UI thread, this was not done for 1.0, so this was added to keep the UI live (including allowing the cancel button to work). ".
Comment 6 Szymon Ptaszkiewicz CLA 2013-03-22 13:17:31 EDT
(In reply to comment #5)
> Szymon, 
> 
> this bugs is 3 years old and it had some PMR attached to it ;-).
> 
> UIJob got into dialog when save was taking more time then expected f.e. a
> build is running or save is on filesystem... I don't remember exactly :-(,
> but
> https://www.cct.lsu.edu/~rguidry/eclipse-doc36/org/eclipse/ui/internal/
> dialogs/EventLoopProgressMonitor.html javadoc says that "Although save
> operations should be written to do the work in the non-UI thread, this was
> not done for 1.0, so this was added to keep the UI live (including allowing
> the cancel button to work). ".

Yes, but that means deadlock can occur only during save not during any UIJob as stated in comment 0.
Comment 7 Szymon Ptaszkiewicz CLA 2013-03-22 14:54:16 EDT
It would be easier if you provided steps and/or stack traces where you see/saw deadlocks. It is possible that it is not related to EventLoopProgressMonitor and save operation at all. There are a few bugs (bug 312179, bug 269121, bug 311021) which describe more general problems related to acquiring scheduling rules inside syncExec or asyncExec. This bug may be a dup of one of these bugs.
Comment 8 Krzysztof Daniel CLA 2013-03-25 07:19:38 EDT
This bug is a generic bug describing a problem discovered first in WTP and fixed there with a hack (bug 309781). Comments 30 & 59 contains some analysis, 61 & 63 acknowledge the generic problem.

There are even some steps how to reproduce it (comment 1).

Your move!
Comment 9 Szymon Ptaszkiewicz CLA 2013-03-27 09:23:20 EDT
(In reply to comment #8)
> This bug is a generic bug describing a problem discovered first in WTP and
> fixed there with a hack (bug 309781). Comments 30 & 59 contains some
> analysis, 61 & 63 acknowledge the generic problem.
> 
> There are even some steps how to reproduce it (comment 1).
> 
> Your move!

If you had provided this information when reporting the bug, I wouldn't have had to ask all these questions ;-)

I have downloaded deadlock example project from bug 309781 comment 51 and I was able to reproduce the deadlock after repeating modify-save cycle twice on one file (thanks for the hint in bug 309781 comment 59). This deadlock doesn't happen if you refactor the code to use beginRule/endRule inside Job.run instead of setRule before scheduling the job. There are two jobs to be changed in the sample project: StartServer job and the publish job inside dead.lock.DeadlockLaunchDelegate.Visitor.visit(IResourceDelta). Deadlock does not happen with these changes.

If the sample project from bug 309781 comment 51 serves as a reproduction scenario for this bug, please try to apply the above changes and verify if it solves the deadlock. The project is of course correct so the general deadlock problem still exists, but I would say it is no longer critical as there is an easy workaround to refactor the code to get deadlock-free functionally equivalent code.
Comment 10 Szymon Ptaszkiewicz CLA 2013-03-28 13:06:15 EDT
Created attachment 229161 [details]
Minimal test case

There are two tests here: testBug_2_deadlock mimics behaviour of the project from bug 309781 comment 51, testBug_1_ok is a modified version of testBug_2_deadlock that avoids using Job.setRule method. You can see that running testBug_2_deadlock gives the same deadlock dialog as shown in bug 309781 comment 50.
Comment 11 Szymon Ptaszkiewicz CLA 2013-03-28 13:10:21 EDT
As John said in bug 309781 comment 33, the order of scheduled jobs must be preserved, so I'd say that deadlock in testBug_2_deadlock is expected since there is no way to proceed. Therefore, this could be considered as a bug to write such code and expect it be deadlock-free.

Unless your bug description in comment 0 is about something different than project from bug 309781 comment 51, I would opt for changing this from critical to enhancement since deadlock is expected not because of EventLoopProgressMonitor.isCanceled but because of unsolvable dependency cycle between jobs involved as shown in the test.
Comment 12 Krzysztof Daniel CLA 2013-03-29 03:25:05 EDT
(In reply to comment #11)
> As John said in bug 309781 comment 33, the order of scheduled jobs must be
> preserved, so I'd say that deadlock in testBug_2_deadlock is expected since
> there is no way to proceed.

Sure the deadlock is expected in this situation, but that's the point! A job should not be allowed to be paused to process other jobs in a synchronous way, and this is exactly what EventLoopProgressMonitor is doing.
Comment 13 Szymon Ptaszkiewicz CLA 2013-03-29 04:03:15 EDT
(In reply to comment #12)
> Sure the deadlock is expected in this situation, but that's the point! A job
> should not be allowed to be paused to process other jobs in a synchronous
> way, and this is exactly what EventLoopProgressMonitor is doing.

Not true. This has nothing to do with EventLoopProgressMonitor. You can modify ProgressManager.getDefaultMonitor() to return NullProgressMonitor instead of EventLoopProgressMonitor and the same deadlock will occur. The only difference is that Eclipse will not be responsive.
Comment 14 Szymon Ptaszkiewicz CLA 2013-03-29 04:10:07 EDT
The "thing" that you are referring to is a tandem of UILockListener and UISynchronizer which serves as a bridge between SWT and Jobs framework.
Comment 15 Szymon Ptaszkiewicz CLA 2013-03-29 07:28:00 EDT
Changing bug summary to something more accurate. This bug can serve as a UI extension for our deadlock detection story (bug 296562). I will take a look if there is anything that we can do here.
Comment 16 Eclipse Genie CLA 2020-04-12 15:43:27 EDT
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're closing this bug.

If you have further information on the current state of the bug, please add it and reopen this bug. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

--
The automated Eclipse Genie.