Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 293995

Summary: [Widgets] Deadlock while UI thread displaying/computing a change proposal and non-UI thread creating image
Product: [Eclipse Project] Platform Reporter: Stephan Herrmann <stephan.herrmann>
Component: SWTAssignee: Platform-SWT-Inbox <platform-swt-inbox>
Status: CLOSED WONTFIX QA Contact:
Severity: major    
Priority: P3 CC: carsten.pfeiffer, daniel_megert, ericwill, irbull, markus.kell.r, pinnamur, Silenio_Quarti, steffen.pingel
Version: 3.6Keywords: triaged
Target Milestone: ---   
Hardware: PC   
OS: Linux-GTK   
Whiteboard: stalebug
Bug Depends on:    
Bug Blocks: 312189    
Attachments:
Description Flags
Workaround for JDT/UI
none
Patch using SWT internals to release lock none

Description Stephan Herrmann CLA 2009-11-02 16:35:27 EST
3.6 M3

During display of completion proposals my workbench just deadlocked
with in the below threads. This happens due to the fix for bug 195834:

Thread "main" waits for the lock on ChangeCorrectionProposal.getChange(),
however, thread "Worker-52" which owns the lock needs the UI thread
to create an image.

Obviously, any stack frame beginning with org.objectteams is not part
of the JDT ;-)

So if the contract of getChange() should actually be specified to 
NEVER access the UI thread, than my plugin needs to be changed 
(the change is pretty straight forward in this case: precompute images).

However, I figured the same problem might occur in JDT proposals, too?

FYI: the code that triggers Image creation simply tries to create
a LinkedProposalPositionGroup, invoking addProposal(String,Image,int)



"Worker-52" prio=10 tid=0x0af39000 nid=0x679d in Object.wait() [0xae272000..0xae272e74]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x58ed9bc0> (a org.eclipse.swt.internal.Lock)
        at java.lang.Object.wait(Object.java:502)
        at org.eclipse.swt.internal.Lock.lock(Lock.java:34)
        - locked <0x58ed9bc0> (a org.eclipse.swt.internal.Lock)
        at org.eclipse.swt.internal.gtk.OS.g_utf16_to_utf8(OS.java:2874)
        at org.eclipse.swt.internal.Converter.wcsToMbcs(Converter.java:63)
        at org.eclipse.swt.graphics.Image.initNative(Image.java:541)
        at org.eclipse.swt.graphics.Image.<init>(Image.java:531)
        at org.eclipse.jface.resource.URLImageDescriptor.createImage(URLImageDescriptor.java:162)
        at org.eclipse.jface.resource.ImageDescriptor.createImage(ImageDescriptor.java:227)
        at org.eclipse.jface.resource.ImageDescriptor.createImage(ImageDescriptor.java:205)
        at org.eclipse.jdt.internal.ui.viewsupport.ImageDescriptorRegistry.get(ImageDescriptorRegistry.java:70)
        at org.objectteams.otdt.internal.ui.util.Images.getImage(Images.java:29)
        at org.objectteams.otdt.ui.assist.CompletionAdaptor$__OT__CalloutCompletionProposal.setupRewrite(CalloutCompletionProposal.java:158)
        at org.objectteams.otdt.ui.assist.CompletionAdaptor$__OT__MethodMappingCompletionProposal.getRewrite(MethodMappingCompletionProposal.java:132)
        at org.eclipse.jdt.internal.ui.text.correction.proposals.ASTRewriteCorrectionProposal.addEdits(ASTRewriteCorrectionProposal.java:93)
        at org.eclipse.jdt.internal.ui.text.correction.proposals.CUCorrectionProposal.createTextChange(CUCorrectionProposal.java:380)
        at org.eclipse.jdt.internal.ui.text.correction.proposals.CUCorrectionProposal.createChange(CUCorrectionProposal.java:389)
        at org.eclipse.jdt.internal.ui.text.correction.proposals.ChangeCorrectionProposal.getChange(ChangeCorrectionProposal.java:301)
        - locked <0x6005e450> (a org.objectteams.otdt.ui.assist.CompletionAdaptor$__OT__CalloutCompletionProposal)
        at org.eclipse.jdt.internal.ui.text.correction.proposals.CUCorrectionProposal.getTextChange(CUCorrectionProposal.java:399)
        at org.eclipse.jdt.internal.ui.text.correction.proposals.CUCorrectionProposal.getAdditionalProposalInfo(CUCorrectionProposal.java:151)
        at org.eclipse.jface.text.contentassist.AdditionalInfoController$3.run(AdditionalInfoController.java:106)
        at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)



"main" prio=10 tid=0x097e5400 nid=0x4c76 waiting for monitor entry [0xbf9b4000..0xbf9b61b0]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.eclipse.jdt.internal.ui.text.correction.proposals.ChangeCorrectionProposal.getChange(ChangeCorrectionProposal.java:300)
        - waiting to lock <0x6005e450> (a org.objectteams.otdt.ui.assist.CompletionAdaptor$__OT__CalloutCompletionProposal)
        at org.eclipse.jdt.internal.ui.text.correction.proposals.ChangeCorrectionProposal.performChange(ChangeCorrectionProposal.java:119)
        at org.eclipse.jdt.internal.ui.text.correction.proposals.CUCorrectionProposal.performChange(CUCorrectionProposal.java:323)
        at org.eclipse.jdt.internal.ui.text.correction.proposals.CUCorrectionProposal.apply(CUCorrectionProposal.java:301)
        at org.eclipse.jface.text.contentassist.CompletionProposalPopup.insertProposal(CompletionProposalPopup.java:933)
        at org.eclipse.jface.text.contentassist.CompletionProposalPopup.insertSelectedProposalWithMask(CompletionProposalPopup.java:879)
        at org.eclipse.jface.text.contentassist.CompletionProposalPopup.verifyKey(CompletionProposalPopup.java:1306)
        at org.eclipse.jface.text.contentassist.ContentAssistant$InternalListener.verifyKey(ContentAssistant.java:806)
        at org.eclipse.jface.text.TextViewer$VerifyKeyListenersManager.verifyKey(TextViewer.java:489)
        at org.eclipse.swt.custom.StyledTextListener.handleEvent(StyledTextListener.java:63)
        at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
        at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1193)
        at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1217)
        at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1202)
        at org.eclipse.swt.widgets.Widget.notifyListeners(Widget.java:1042)
        at org.eclipse.swt.custom.StyledText.handleKeyDown(StyledText.java:5899)
        at org.eclipse.swt.custom.StyledText$7.handleEvent(StyledText.java:5598)
        at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
        at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1193)
        at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1217)
        at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1202)
        at org.eclipse.swt.widgets.Widget.sendKeyEvent(Widget.java:1229)
        at org.eclipse.swt.widgets.Widget.gtk_key_press_event(Widget.java:715)
        at org.eclipse.swt.widgets.Control.gtk_key_press_event(Control.java:2817)
        at org.eclipse.swt.widgets.Composite.gtk_key_press_event(Composite.java:703)
        at org.eclipse.swt.widgets.Widget.windowProc(Widget.java:1664)
        at org.eclipse.swt.widgets.Control.windowProc(Control.java:4566)
        at org.eclipse.swt.widgets.Display.windowProc(Display.java:4206)
        at org.eclipse.swt.internal.gtk.OS._gtk_main_do_event(Native Method)
        at org.eclipse.swt.internal.gtk.OS.gtk_main_do_event(OS.java:7643)
        at org.eclipse.swt.widgets.Display.eventProc(Display.java:1204)
        at org.eclipse.swt.internal.gtk.OS._g_main_context_iteration(Native Method)
        at org.eclipse.swt.internal.gtk.OS.g_main_context_iteration(OS.java:1871)
        at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3110)
        at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2404)
Comment 1 Dani Megert CLA 2009-11-03 04:51:03 EST
For now simply use 'null' as the image.

This looks like an SWT issue where we wait on a UI lock in the non-UI thread. I think creating an image in the non-UI thread should be OK and not block.

SWT team: maybe the lock is hold because we call Shell.setEnabled(false), but again, it should still be able to create a new image in a non-UI thread.
Comment 2 Markus Keller CLA 2009-11-03 05:42:56 EST
> SWT team: maybe the lock is hold because we call Shell.setEnabled(false), but
> again, it should still be able to create a new image in a non-UI thread.

That was the other change for bug 195834, but on closer look, I don't think that's the issue here. The problem is really that the main loop (OS.g_main_context_iteration(..)) takes the global lock, and the constructor of Image assumes the lock will become available eventually. But the main loop is now blocked since we had to make ChangeCorrectionProposal.getChange() synchronized.
Comment 3 Markus Keller CLA 2010-03-29 05:07:02 EDT
This is actually quite a severe problem. We only didn't see this earlier because the images we use in JDT quick fix popups are pre-initialized in the image registry.

The general pattern is:
- main thread holds org.eclipse.swt.internal.Platform.lock
- non-UI thread holds a monitor
- main thread tries to acquire the same monitor (but can't, so it waits)
- non-UI thread calls a constructor of Image, which tries to acquire Platform.lock (but can't, so it waits)
=> deadlock

This problem can only be prevented at the client side if we stipulate that new Image(..) may only be called in a non-UI thread if the thread does not hold any monitors that could also be requested in the UI thread. But that would be quite hard to guarantee in existing code. In the end, it would not allow the use of Java object monitors in any case where anywhere in the execution stack, an image is created in a non-UI thread.
Comment 4 Silenio Quarti CLA 2010-03-29 14:00:26 EDT
I do not think there is a fix for this in the SWT side.  I believe the only option here is for client code to create the image in the ui-thread or to avoid doing synchronization between the UI thread and other threads.
Comment 5 Markus Keller CLA 2010-04-14 14:32:05 EDT
*** Bug 309182 has been marked as a duplicate of this bug. ***
Comment 6 Markus Keller CLA 2010-04-14 14:37:16 EDT
*** Bug 302079 has been marked as a duplicate of this bug. ***
Comment 7 Markus Keller CLA 2010-04-14 15:07:26 EDT
I thought this wouldn't happen in the SDK, but bug 309182 proved that we need to fix this for 3.6. The trace in that dump shows an example where we compute images on demand. Pre-caching all the images would be possible for that case, but I doubt it's the last one. Bug 276438 also looks like the same problem.

The last call on this issue was WONTFIX in bug 236149. If that's going to stay the answer, then this must be documented in Image and needs an entry in the Readme. Even then, clients won't be able to fix this everywhere, since that breaking change ripples through a huge number of APIs (basically every method is affected that could create an Image and take a lock *anywhere in the call stack*).

So at least the constructors of Image must be protected such that they detect image creations in a non-UI thread and throw an illegal thread access exception instead of deadlocking.
Comment 8 Silenio Quarti CLA 2010-04-15 12:17:00 EDT
We cannot start throwing an exception in the image constructors. This is a breaking API change. Applications that do not perfom synchronization between the UI thread and the thread creating the image are still valid (and work).

We could add warnings in the java doc saying that causing the UI thread to wait in a look may cause deadlocks.
Comment 9 Markus Keller CLA 2010-04-25 10:48:38 EDT
Created attachment 166018 [details]
Workaround for JDT/UI

I proposed to detect the problematic cases and only throw an exception in those cases, but I guess that's not possible: Thread.holdsLock(..) could be used to detect whether the UI thread holds the Platform.lock, but there's no way to detect (e.g. in the constructor of Image) whether a given thread holds any lock that the UI thread is waiting for.

We can't ship 3.6 with a potential deadlock, so I've released this workaround to HEAD. It opens bug 195834 again for GTK, but that's much less severe than risking a deadlock.

I still think this should be fixed in SWT. We can't detect the problem in SWT, and it's almost impossible to ensure that client code never holds a lock and waits for it in the UI thread. So the only solution I see is to avoid using 'synchronized' to implement the Platform.lock on GTK. I don't have a silver bullet, but a solution could be:
- detect that an Image is created in a non-UI thread
- in that, case try to take the lock, and if you don't get it in the next 5 seconds, assume that something's wrong and throw an exception. That's a heuristic, but it's still better than risking a deadlock.
Comment 10 Ian Bull CLA 2010-04-25 11:58:11 EDT
(In reply to comment #9)
Thanks Markus,

If you tag this for tonight's IBuild (8:00pm EST) then I'll hammer on content assist for GTK during the test pass tomorrow.
Comment 11 Markus Keller CLA 2010-04-25 19:01:46 EDT
(In reply to comment #10)
> If you tag this for tonight's IBuild (8:00pm EST) then I'll hammer on content
> assist for GTK during the test pass tomorrow.

It's released for I20100425-2000, but there's not much to test via "hammering" ;-). The deadlock situation is well understood. In JDT proposals, it happens when an image for a Java element is created in a background thread. Those composite images are cached when they are first created, so this most often happens shortly after starting Eclipse. To provoke the problem you need exact timing or breakpoints in the right places, and you need to use an example where the image is used for the first time in the non-UI thread. I'll test this during the M7 test pass.

The downside of my fix is that it reverted bug 195834 on GTK, so if you try hard (apply the quick fix exactly at the time when the additional info hover with the preview is computed), you can see the BadLocationExceptions from bug 195834 again.

Anyway, thanks for adding this comment! It made me look at my fix again, and I discovered that I forgot to remove the 'synchronized' modifier from the getChange() method. I've fixed this for I20100425-2000.
Comment 12 Markus Keller CLA 2010-04-26 06:31:54 EDT
Verified in I20100425-2000 that the workaround in ChangeCorrectionProposal
#getChange() "works" (i.e. just throws an exception with the steps below).

Steps to reproduce the deadlock without the workaround:
- set method entry breakpoint in ChangeCorrectionProposal#getChange()
- disable Prefs > Debug > Activate workbench when breakpoint is hit
- launch clean runtime workbench
- paste this snippet:
public class C {
    public static void main(String[] args) {
        new Cloneable() {};
    }
}
- set caret before "new" (make sure you don't hover over "Cloneable", since that would destroy the effect)
- press Ctrl+1
- wait at least 1 second
- press Enter
- in host workbench, resume all suspended threads
=> deadlock
Comment 13 Markus Keller CLA 2010-04-26 07:11:55 EDT
Created attachment 166061 [details]
Patch using SWT internals to release lock

There's another solution: If we don't want to guess when the problem could happen, then we could put this burden on the client, but provide a way for the client to resolve the problem. The idea is to protect code that runs in the UI thread and that needs to synchronize, such that the GTK lock is released at the time the UI thread enters the synchronized block.

This pattern is already used in Display#sleep(). AFAICS, that's the reason why "new Image(..)" can work at all in a non-UI thread. In the attached patch, I've copied that code, and it worked nice (does not deadlock, and properly synchronizes such that bug 195834 doesn't happen). But I can't refer to internals.

If you could offer this as API, e.g. Display#runWithoutOSLock(Runnable), then client code that knows it can run into this problem can use this API and make sure it only synchronizes in the Runnable, but not outside it.
Comment 14 Stephan Herrmann CLA 2010-05-02 16:47:48 EDT
(In reply to comment #11)
> The downside of my fix is that it reverted bug 195834 on GTK, so if you try
> hard (apply the quick fix exactly at the time when the additional info hover
> with the preview is computed), you can see the BadLocationExceptions from bug
> 195834 again.

FWIW, yes, in M7 bug 195834 is alive. Even without trying very hard I just
saw it happen again during normal operation.
Comment 15 Dani Megert CLA 2010-05-02 23:51:17 EDT
(In reply to comment #14)
> FWIW, yes, in M7 bug 195834 is alive. Even without trying very hard I just
> saw it happen again during normal operation.

According to comment 13 this should not happen. Please file a new bug report against JDT UI.
Comment 16 Stephan Herrmann CLA 2010-05-03 02:38:50 EDT
(In reply to comment #15)
> (In reply to comment #14)
> > FWIW, yes, in M7 bug 195834 is alive. Even without trying very hard I just
> > saw it happen again during normal operation.
> 
> According to comment 13 this should not happen. 

AFAICS the patch from comment 13 hasn't been committed yet.
So I was more like encouraging Markus to keep going with that :)
Comment 17 Markus Keller CLA 2010-05-03 05:27:39 EDT
(In reply to comment #13)
> [..] But I can't refer to internals.
> [..]
> If you could offer this as API, e.g. Display#runWithoutOSLock(Runnable), then
> client code that knows it can run into this problem can use this API and make
> sure it only synchronizes in the Runnable, but not outside it.

I can't release this patch, since it refers to internals from SWT. If the SWT team agrees with the approach, I can implement the proposed API. I'm confident that the PMC will approve the API addition, since it's the only solution we have that prevents deadlocks (a blocker) and bug 195834 (a major inconvenience with many dups).
Comment 18 Dani Megert CLA 2010-05-03 10:40:55 EDT
I agree with the previous comment. However, we should either have a new bug for  bug 195834 or reopen it, since it is indeed not fixed on Linux.
Comment 19 Silenio Quarti CLA 2010-05-04 13:51:36 EDT
(In reply to comment #17)
> > If you could offer this as API, e.g. Display#runWithoutOSLock(Runnable), 
> > sure it only synchronizes in the Runnable, but not outside it.
> I can't release this patch, since it refers to internals from SWT. If the SWT
> team agrees with the approach, I can implement the proposed API. I'm confident
> that the PMC will approve the API addition, since it's the only solution we
> have that prevents deadlocks (a blocker) and bug 195834 (a major inconvenience
> with many dups).

Is this what you have in mind?

void runWithoutOSLock (Runnable runnable) {
	Lock lock = Platform.lock;
	int count = lock.lock ();
	for (int i = 0; i < count; i++) lock.unlock ();
	try {
		runnable.run();
	} finally {
		for (int i = 0; i < count; i++) lock.lock ();
		lock.unlock ();
	}
}

I do not think the API make sense. The OS lock is needed to avoid multiple threads in OS libraries which are not thread-safe. If we give such API, we are basically opening the lock up without any control. I believe this would defeat its purpose and we would start getting random crashes.  Display.sleep() is the only place where the UI thread can safely leave the lock and let other threads call OS libraries because we know there is nothing else happening in the UI thread.
Comment 20 Markus Keller CLA 2010-05-07 09:57:58 EDT
Yes, that's about what I had in mind, with the important difference that the first line should be:
    checkDevice ();

I agree that all hell breaks loose when clients can do that from any thread, but is it also a problem when they are in the UI thread? It's similar to what happens when a client decides to call Display#sleep(), but with the difference that no other events are being processed when #runWithoutOSLock(Runnable) is called.

The usage of the Platform.lock ensures that the UI thread only proceeds when it got the lock back, but while the Runnable is executed, the lock is free for another process to take (and release). The constructor of Image needs the lock, and if it's called while the UI thread is inside the runnable of Display#runWithoutOSLock(Runnable), then it can do its business without being locked.
Comment 21 Markus Keller CLA 2010-05-09 16:12:59 EDT
I've filed bug 312189 for the BadLocationException from bug 195834 that can now happen again on GTK.
Comment 22 Markus Keller CLA 2010-05-20 11:01:53 EDT
*** Bug 313739 has been marked as a duplicate of this bug. ***
Comment 23 Silenio Quarti CLA 2010-05-26 17:22:49 EDT
Reducing severity since there is a workaround in JDT.
Comment 24 Stephan Herrmann CLA 2011-06-28 13:51:58 EDT
I must admit I lost track what part is fixed and what remains.

Is it safe now to create images during ChangeCorrectionProposal.getChange()?
Comment 25 Markus Keller CLA 2011-06-28 14:10:22 EDT
(In reply to comment #24)
> Is it safe now to create images during ChangeCorrectionProposal.getChange()?

Yes.

For 3.6, we removed the deadlock but reopened the door for BadLocationExceptions on GTK. Since 3.6.2, the workaround from bug 312189 also fixes that for the ChangeCorrectionProposal.

However, the general problem outlined in comment 3 is still present (that's the subject of this bug).
Comment 26 Stephan Herrmann CLA 2011-06-28 14:28:25 EDT
(In reply to comment #25)

Thanks for clarifying!
Comment 27 Eric Williams CLA 2017-07-25 16:54:43 EDT
Looks like there is a patch pending for this but the bug is not resolved. Moving to triaged.
Comment 28 Eclipse Genie CLA 2020-03-16 10:33:49 EDT
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're closing this bug.

If you have further information on the current state of the bug, please add it and reopen this bug. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

--
The automated Eclipse Genie.