Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 476989

Summary: Hang terminating debug session waiting for input after step into selection
Product: [Tools] CDT Reporter: Marc-André Laperle <malaperle>
Component: cdt-debug-dsf-gdbAssignee: Project Inbox <cdt-debug-dsf-gdb-inbox>
Status: CLOSED FIXED QA Contact: Marc Khouzam <marc.khouzam>
Severity: normal    
Priority: P3 CC: cdtdoug, marc.khouzam, marcel.bruch, teodor.madan
Version: 8.8.0   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
See Also: https://git.eclipse.org/r/#/c/55772/
Whiteboard:
Attachments:
Description Flags
Stack trace none

Description Marc-André Laperle CLA 2015-09-09 11:07:39 EDT
Created attachment 256465 [details]
Stack trace

Using CDT 8.8.0.201509021823 (master nightly)
Ubuntu 15.04

1. Build and start debugging the following code:
 
#include <stdio.h>

int main() {
  char c = getchar();
}

2. Use step into selection on getchar (Ctrl-F5 or hyperlink)

3. The program should be waiting for input. Terminate the program with the Terminate button.

Eclipse hangs. See attached stack trace.
Comment 1 Marc Khouzam CLA 2015-09-09 11:21:28 EDT
Looking at the stack trace it looks like the stepIntoSelection is logging an error at GDBRunControl.java:566 on the DSF executor.  The logging needs the UI thread it seems.  At the same time, the UI thread is waiting for the DSF executor to return from  GdbSteppingModeTarget.supportsInstructionStepping()

Deadlock.

The simplest thing to do would be to put a timeout on the query in GdbSteppingModeTarget.  We should probably do that anyway.

But then I wonder about the logging call.  Should we not do that on the DSF executor if it needs the UI thread?
Comment 2 Teodor Madan CLA 2015-09-09 12:36:40 EDT
The org.eclipse.core.runtime.ILog implementation returned by  org.eclipse.core.runtime.Plugin.getLog() were thread safe 'till Mars release. Requiring for it to be called from UI thread is a big change and not just for DSF/CDT codebase. Meaning to be transformed as "Display.asyncExec'

I think it is more of a bug in EPP logger component ( AERI support). Any implementation of org.osgi.service.log.LogListener.logged(LogEntry) should not require any locks including UI lock. 

At first glance it seems to be a side effect of Bug 472704.
Comment 3 Marc Khouzam CLA 2015-09-09 13:02:36 EDT
(In reply to Teodor Madan from comment #2)
> The org.eclipse.core.runtime.ILog implementation returned by 
> org.eclipse.core.runtime.Plugin.getLog() were thread safe 'till Mars
> release. Requiring for it to be called from UI thread is a big change and
> not just for DSF/CDT codebase. Meaning to be transformed as
> "Display.asyncExec'

Are you saying this is a change in platform that was introduced in mars?


> I think it is more of a bug in EPP logger component ( AERI support). Any
> implementation of org.osgi.service.log.LogListener.logged(LogEntry) should
> not require any locks including UI lock. 
> 
> At first glance it seems to be a side effect of Bug 472704.

Is this the correct bug number?  I don't see much of a relationship.
Comment 4 Teodor Madan CLA 2015-09-10 03:29:49 EDT
(In reply to Marc Khouzam from comment #3)
> 
> Are you saying this is a change in platform that was introduced in mars?
> 
Mars introduced AERI. http://eclipsesource.com/blogs/2015/06/23/error-reporting-top-eclipse-mars-feature-2/ 

> 
> > I think it is more of a bug in EPP logger component ( AERI support). Any
> > implementation of org.osgi.service.log.LogListener.logged(LogEntry) should
> > not require any locks including UI lock. 
> > 
> > At first glance it seems to be a side effect of Bug 472704.
> 
> Is this the correct bug number?  I don't see much of a relationship.

The patch contains the Display.syncExec (Shells#hasActiveWindow())  that indirectly introduced a dependency of not having UI thread locked.

http://git.eclipse.org/c/epp/org.eclipse.epp.logging.git/commit/?id=34ad9b1cdea7837d8d087352eb1434cec10571b8
Comment 5 Marcel Bruch CLA 2015-09-11 09:08:06 EDT
Marc-Andre,
is this problem always reproducible for you? 
Is the behavior different in Mars.0 and Mars.1?


After looking at the thread dump, I don't see (yet) how AERI causes the "Potential deadlock in AERI" as mentioned by Teodor in bug 477185.

The thread "org.eclipse.cdt.dsf.gdb - 1" is running in a background thread. Somewhere near the top, AERI requests a sync call but waits for getting the lock. This is per-se not a problem. So, what is actually causing the lock? All locks acquired by AERI have different ids then those in the main threads.

Both the main thread and "Process monitor" #764 access org.eclipse.cdt.dsf.concurrent.Query.get(Query.java:102). Apparently accessing this method requires a lock from org.eclipse.cdt.dsf.concurrent.Query$QueryRm. Then, query.get calls Object.wait. I don't know what Query.get is waiting for. 

Is that somehow related to the work happening in org.eclipse.cdt.dsf.gdb thread?
Comment 6 Teodor Madan CLA 2015-09-11 09:25:54 EDT
Hi Macel, 

The stack of interest is:

"org.eclipse.cdt.dsf.gdb - 1" #751 prio=5 os_prio=0 tid=0x00007f4e742a6000 nid=0x3dab in Object.wait() [0x00007f4e798ca000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:502)
	at org.eclipse.swt.widgets.Synchronizer.syncExec(Synchronizer.java:199)
	- locked <0x00000000ec0cb680> (a org.eclipse.swt.widgets.RunnableLock)
	at org.eclipse.ui.internal.UISynchronizer.syncExec(UISynchronizer.java:145)
	at org.eclipse.swt.widgets.Display.syncExec(Display.java:4634)
	at org.eclipse.epp.internal.logging.aeri.ui.utils.Shells.hasActiveWindow(Shells.java:68)
	at org.eclipse.epp.internal.logging.aeri.ui.ReportingController.on(ReportingController.java:123)
	- locked <0x00000000b6dc3ef0> (a org.eclipse.epp.internal.logging.aeri.ui.ReportingController)

.....

org.eclipse.epp.internal.logging.aeri.ui.log.LogListener.logging(LogListener.java:120)

....

org.eclipse.core.internal.runtime.RuntimeLog.logToListeners(RuntimeLog.java:161)
	at org.eclipse.core.internal.runtime.PlatformLogWriter.logged(PlatformLogWriter.java:103)
	at org.eclipse.osgi.internal.log.ExtendedLogServiceImpl.log(ExtendedLogServiceImpl.java:87)
	at org.eclipse.osgi.internal.log.LoggerImpl.log(LoggerImpl.java:54)
	at org.eclipse.core.internal.runtime.Log.log(Log.java:65)
	at org.eclipse.cdt.dsf.gdb.service.GDBRunControl$6.handleFailure(GDBRunControl.java:566)
	at org.eclipse.cdt.dsf.concurrent.RequestMonitor.handleCompleted(RequestMonitor.java:387)

I have also added a test plugin always reproducible that isolates from DSF.
Comment 7 Marcel Bruch CLA 2015-09-11 09:50:00 EDT
(In reply to Teodor Madan from comment #6)

Thank you. I imported your project and execute you action + job in there. Indeed I get a ui freeze (and a NPE before because the activator.getDefault returns null) 

BUT:

I don't have any AERI plugins in my runtime:


WARNING: This console is connected to the current running instance of Eclipse!
osgi> ss aeri
"Framework is launched."


id	State       Bundle
osgi> 

It's not yet clear to me what this is showing. But I've (atm) hard times to relate this to AERI.
Comment 8 Teodor Madan CLA 2015-09-11 10:11:50 EDT
You should use a standard product (e.g. Eclipse IDE for Eclipse Committers 4.5.0) . In my case: 
osgi> ss aeri
"Framework is launched."
id	State       Bundle
591	RESOLVED    org.eclipse.epp.logging.aeri.ide_1.0.1.v20150828-1308
	            Master=893
893	ACTIVE      org.eclipse.epp.logging.aeri.ui_1.0.1.v20150819-1436
	            Fragments=591
osgi> 


The threads deadlocked in your case due to activator.getDefault() being null (strange that test plugin was not loaded) should not happen. 

The test code is quite simple. The snippet bellow if placed in a command handler would deadlock (after replacing Activator.getDefault() with appropriate plugin reference)

=================

		final Object lock = new Object();
		Job longRunningJob = Job.create("Test Log running job", new IJobFunction() {			
			@Override
			public IStatus run(IProgressMonitor monitor) {
				Activator.getDefault().getLog().log(new Status(IStatus.ERROR, Activator.PLUGIN_ID, "Test logging errors"));
				synchronized (lock) {
					lock.notifyAll();					
				}
				return Status.OK_STATUS;
			}
		});
		longRunningJob.schedule();
		synchronized (lock) {
			try {
				lock.wait();
			} catch (InterruptedException e) {
				Activator.getDefault().getLog().log(Status.CANCEL_STATUS);
			}
		}
Comment 9 Marcel Bruch CLA 2015-09-11 10:15:50 EDT
(In reply to Teodor Madan from comment #8)
I can produce a freeze with AERI enabled. Investigating.
Comment 10 Marcel Bruch CLA 2015-09-11 14:15:35 EDT
(In reply to Marcel Bruch from comment #9)
> (In reply to Teodor Madan from comment #8)
> I can produce a freeze with AERI enabled. Investigating.


I think there is a reasonable way around the syncExec by using

 IWorkbench wb = PlatformUI.getWorkbench();
 wb.addWindowListener(new IWindowListener() {...});


I'm confident we get this under control on AERI side before RC4. Thanks for providing the sample plugin to illustrate the issue.
Comment 11 Marc-André Laperle CLA 2015-09-11 17:36:37 EDT
(In reply to Marcel Bruch from comment #5)
> Marc-Andre,
> is this problem always reproducible for you? 
> Is the behavior different in Mars.0 and Mars.1?

Sorry for the slow reply. I guess you probably don't need this information now but yes it's reproducible.
I'm using primarily:
Eclipse SDK 4.6.0.I20150901-0800
Eclipse Automated Error Reporting 1.0.1.v20150828-1308

But I also reproduced the hang with
Eclipse IDE for C/C++ Developers 20150910-0603 (Mars.1)
But I cannot reproduce the issue with
Eclipse IDE for C/C++ Developers 20150621-1200 (Mars.0)
Comment 12 Marcel Bruch CLA 2015-09-12 03:01:18 EDT
(In reply to Marcel Bruch from comment #10)

The WindowListener approach does not work well when dialogs pop up. The only way around the ui freeze (and knowing whether Eclipse currently has focus) is a listener registered on the workbench's display as follows:

    private static final class ShellTracker implements Runnable, Listener {
        @Override
        public void run() {
            IWorkbench workbench = PlatformUI.getWorkbench();

            Display display = workbench.getDisplay();
            display.addFilter(SWT.Deactivate, this);
            display.addFilter(SWT.Activate, this);

            IWorkbenchWindow activeWindow = workbench.getActiveWorkbenchWindow();
            if (activeWindow != null) {
                active = activeWindow.getShell();
            }
        }

        @Override
        public void handleEvent(Event event) {
            switch (event.type) {
            case SWT.Activate:
                if (isShell(event)) {
                    active = (Shell) event.widget;
                }
                break;
            case SWT.Deactivate:
                if (isShell(event)) {
                    active = null;
                }
            default:
                break;
            }
        }

        private boolean isShell(Event event) {
            return !(event.widget instanceof Shell);
        }
    }

Does anyone know a better solution? I'm concerned about potential performance regressions (but I don't know if that listener would cause them)
Comment 13 Marcel Bruch CLA 2015-09-13 03:41:15 EDT
I've changed the code as outlined in comment 12.
It's available for testing on the Milestones update site [1].

I still have concerns about using a display listener. I've asked platform-ui-dev for an alternate solution. If there is one, the code for RC4 might change.


Teodor,
your test case does not freeze with the current approach.


Marc-Andre,
I'd be glad if you could try out the milestone version.



[1] http://download.eclipse.org/technology/epp/logging/milestones/
Comment 14 Marc-André Laperle CLA 2015-09-13 13:41:40 EDT
(In reply to Marcel Bruch from comment #13)
> Marc-Andre,
> I'd be glad if you could try out the milestone version.
> [1] http://download.eclipse.org/technology/epp/logging/milestones/

I can't reproduce the hang anymore. I installed/updated this on top of Mars.1 RC3 (C/C++ EPP). Maybe thanks for fixing this!
Comment 15 Marc-André Laperle CLA 2015-09-13 16:11:09 EDT
(In reply to Marc-Andre Laperle from comment #14)
> Maybe thanks for fixing this!

*Many thanks