Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 326365

Summary: Workbench hang when run a program to termination while expression being displayed
Product: [Tools] PTP Reporter: Xuan Chen <xuanchen>
Component: Debug UIAssignee: Greg Watson <g.watson>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: P3 CC: justinko, xuanchen
Version: 5.0   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Attachments:
Description Flags
Patch g.watson: iplog+

Description Xuan Chen CLA 2010-09-27 19:31:48 EDT
I saw this kind of hanging quite often, and now I was able to recreate it using a simple C program.


And this is the source for the C program:

************************************
#include <stdio.h>




int
main()
{
int debug = 1;
int a1[10];
a1[0] = 0;
a1[1] = 1;
a1[2] = 2;
a1[3] = 3;
a1[4] = 4;
a1[5] = 5;

struct MappingStruct0 {
		int intOne;
		int intTwo;
		int* intPtr;
		long longVal;
	} myStruct0;

	myStruct0.intOne = 5;
	myStruct0.intTwo = 6;
	myStruct0.longVal = 155;

struct MappingStruct {
		int intOne;
		int intTwo;
		int a[10];
		int* intPtr;
		long longVal;
	} myStruct;
	myStruct.a[0] = 0;
	myStruct.a[1] = 1;
	myStruct.a[2] = 2;
	myStruct.a[3] = 3;
	myStruct.a[4] = 4;
	myStruct.a[5] = 5;
printf("Hello World1!\n");

	return 0;
}
************************************

I set a breakpoint at the printf line at the end, and run into it.
Then I put structure myStruct into the Expression view.  Expand myStruct, and then a under it.
Run this program to termination.
I can see the job was end (red stop icon beside it), but the thread and call stack are still displayed in the Debug view.
Right click on the job in Parallel Debug view, and select Remote Terminated.  The workbench hang.

The following is part is the trace I saw in the console of my development workspace:
*********************************************
Msg: EventRequestManager - addEventRequest(): Request: Partial expression evaluation for tasks {0}, exp: *(((myStruct).a)+0)@10, id: null, is list children: true, is expression: false
**** Msg: AbstractEventManager - registerEventRequest(): Request: Partial expression evaluation for tasks {0}, exp: *(((myStruct).a)+0)@10, id: null, is list children: true, is expression: false
Msg: SDMEventManager - update(): Event: EVENT_DBG_DATA transid=0 1:01 org.eclipse.ptp.proxy.debug.client.ProxyDebugAIF@1963c7b, name: var2
Msg: SDMEventManager - fireEvent(): event EVENT_DBG_DATA transid=0 1:01 org.eclipse.ptp.proxy.debug.client.ProxyDebugAIF@1963c7b, name: var2
        ======= int: is4
        ======== range: r0,10is4
        ======= int: is4
        ======= array: [r0,10is4]is4
**** Msg: AbstractEventManager - notifyEventRequest(): Request: Partial expression evaluation for tasks {}, exp: *(((myStruct).a)+0)@10, id: null, is list children: true, is expression: false
Msg: EventRequestManager - addEventRequest(): Request: Resume request in status [UNKNOWN] for {0}.
**** Msg: AbstractEventManager - registerEventRequest(): Request: Resume request in status [UNKNOWN] for {0}.
Msg: SDMEventManager - update(): Event: EVENT_DBG_OK transid=0 1:01
Msg: SDMEventManager - fireEvent(): event EVENT_DBG_OK transid=0 1:01
Msg: SDMEventManager - fireEvent(): added PDIEvent: org.eclipse.ptp.debug.internal.core.pdi.event.ResumedEvent@37f1db
Msg: NotifyJob - size of runnables: 1
**** Msg: AbstractEventManager - notifyEventRequest(): Request: Resume request in status [DONE] for {}.
Msg: NotifyJob - size of runnables: 22
Msg: SDMEventManager - update(): Event: EVENT_DBG_EXIT transid=0 1:01 0
Msg: SDMEventManager - fireEvent(): event EVENT_DBG_EXIT transid=0 1:01 0
Msg: EventRequestManager - addEventRequest(): Request: Delete partial expression request in status [UNKNOWN] for {0}.
================= TIME RESUME: 1285618426976
**** Msg: AbstractEventManager - registerEventRequest(): Request: Delete partial expression request in status [UNKNOWN] for {0}.
Msg: SDMEventManager - fireEvent(): added PDIEvent: org.eclipse.ptp.debug.internal.core.pdi.event.DestroyedEvent@1c3dcdc
Msg: EventRequestManager - addEventRequest(): Request: Stop debugger request in status [UNKNOWN] for {}.
**** Msg: AbstractEventManager - notifyEventRequest(): Request: Stop debugger request in status [DONE] for {}.
Msg: SDMEventManager - fireEvent(): added PDIEvent: org.eclipse.ptp.debug.internal.core.pdi.event.DisconnectedEvent@97388b
Msg: SDMEventManager - update(): Event: EVENT_DBG_OUTPUT transid=0 1:01 Hello World1!

Msg: SDMEventManager - fireEvent(): event EVENT_DBG_OUTPUT transid=0 1:01 Hello World1!

Msg: SDMEventManager - fireEvent(): added PDIEvent: org.eclipse.ptp.debug.internal.core.pdi.event.OutputEvent@189eebb
*******************************************************************

And these are the stack trace of two threads which could be related to this problem:

Thread [main] (Suspended)	
	PDebugTarget(PDebugElement).resetStatus() line: 385	
	PDebugTarget.cleanup() line: 824	
	PDebugTarget.disconnected() line: 1120	
	PDebugTarget.launchRemoved(ILaunch) line: 614	
	LaunchManager$LaunchNotifier.run() line: 450	
	SafeRunner.run(ISafeRunnable) line: 42	
	LaunchManager$LaunchNotifier.notify(ILaunch, int) line: 435	
	LaunchManager.fireUpdate(ILaunch, int) line: 997	
	LaunchManager.removeLaunch(ILaunch) line: 2207	
	UIDebugManager(JobManager).removeJob(IPJob) line: 404	
	UIDebugManager.removeJob(IPJob) line: 495	
	ParallelJobsView$QueueChildListener.handleEvent(IRemoveJobEvent) line: 228	
	PQueue.fireRemoveJobs(Collection<IPJob>) line: 191	
	PQueue.removeJobs(Collection<IPJobControl>) line: 300	
	GenericResourceManager(AbstractRuntimeResourceManager).doRemoveTerminatedJobs(IPQueueControl) line: 339	
	GenericResourceManager(AbstractResourceManager).removeTerminatedJobs(IPQueue) line: 1030	
	UIDebugManager(JobManager).removeAllStoppedJobs() line: 384	
	RemoveAllTerminatedAction.run() line: 52	
	RemoveAllTerminatedAction(Action).runWithEvent(Event) line: 498	
	ActionContributionItem.handleWidgetSelection(Event, boolean) line: 584	
	ActionContributionItem.access$2(ActionContributionItem, Event, boolean) line: 501	
	ActionContributionItem$5.handleEvent(Event) line: 411	
	EventTable.sendEvent(Event) line: 84	
	MenuItem(Widget).sendEvent(Event) line: 1053	
	Display.runDeferredEvents() line: 4066	
	Display.readAndDispatch() line: 3657	
	Workbench.runEventLoop(Window$IExceptionHandler, Display) line: 2629	
	Workbench.runUI() line: 2593	
	Workbench.access$4(Workbench) line: 2427	
	Workbench$7.run() line: 670	
	Realm.runWithDefault(Realm, Runnable) line: 332	
	Workbench.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 663	
	PlatformUI.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 149	
	IDEApplication.start(IApplicationContext) line: 115	
	EclipseAppHandle.run(Object) line: 196	
	EclipseAppLauncher.runApplication(Object) line: 110	
	EclipseAppLauncher.start(Object) line: 79	
	EclipseStarter.run(Object) line: 369	
	EclipseStarter.run(String[], Runnable) line: 179	
	NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]	
	NativeMethodAccessorImpl.invoke(Object, Object[]) line: not available	
	DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: not available	
	Method.invoke(Object, Object...) line: not available	
	Main.invokeFramework(String[], URL[]) line: 619	
	Main.basicRun(String[]) line: 574	
	Main.run(String[]) line: 1407	
	Main.main(String[]) line: 1383	

	
	Thread [Worker-7] (Suspended)	
	Object.wait(long) line: not available [native method]	
	DeletePartialExpressionRequest(AbstractEventResultRequest).lockRequest(long) line: 194	
	DeletePartialExpressionRequest(AbstractEventResultRequest).waiting() line: 230	
	DeletePartialExpressionRequest(AbstractEventResultRequest).waitUntilCompleted(TaskSet) line: 127	
	ExpressionManager.deleteVariable(IPDIVariable) line: 178	
	Expression.dispose() line: 54	
	PExpression.dispose() line: 88	
	PStackFrame.disposeExpressions() line: 917	
	PStackFrame.dispose() line: 893	
	PThread.disposeStackFrames() line: 718	
	PThread.resumedByTarget(int, List<DebugEvent>) line: 859	
	PDebugTarget.resumeThreads(List<DebugEvent>, int) line: 1366	
	PDebugTarget.handleResumedEvent(IPDIResumedEvent) line: 969	
	PDebugTarget.handleDebugEvents(IPDIEvent[]) line: 495	
	AbstractEventManager$1.run() line: 137	
	Session$NotifyJob.run(IProgressMonitor) line: 965	
	Worker.run() line: 54
Comment 1 Greg Watson CLA 2010-09-29 10:58:32 EDT
It looks like there is some kind of race condition involving the expressions view as I'm seeing strange behavior when this view is active. I would suggest using the variables view until I can work out the problem.
Comment 2 Xuan Chen CLA 2010-09-29 10:59:42 EDT
I tried to debug this problem to have a better understanding of it.  And I set
a breakpoint at EventRequestManager#addEventRequest.

When I removed all the breakpoints in the PTP debug session, and do a run (so
the program will run to complete), I can see the breakpoint in
EventRequestManager#addEventRequest() was hit, and the request wass partial
evaluation expression:
Thread [Worker-10] (Suspended (breakpoint at line 182 in EventRequestManager))  
    EventRequestManager.addEventRequest(IPDIEventRequest) line: 182    
    LocalVariable(Variable).getChildren() line: 130    
    ExpressionManager.deleteVariable(IPDIVariable) line: 185    
    Expression.dispose() line: 54    
    PExpression.dispose() line: 88    
    PStackFrame.disposeExpressions() line: 917    
    PStackFrame.dispose() line: 893    
    PThread.disposeStackFrames() line: 718    
    PThread.resumedByTarget(int, List<DebugEvent>) line: 859    
    PDebugTarget.resumeThreads(List<DebugEvent>, int) line: 1366    
    PDebugTarget.handleResumedEvent(IPDIResumedEvent) line: 969    
    PDebugTarget.handleDebugEvents(IPDIEvent[]) line: 495    
    AbstractEventManager$1.run() line: 137    
    Session$NotifyJob.run(IProgressMonitor) line: 965    
    Worker.run() line: 54    

    This request was invoke because of the deleteVariable() call, which I think
should not be sent.

    And then I can see this thread was hang on this line in
Variable#getChildren:
    IAIF aif = request.getPartialAIF(getTasks());

    And this is the call stack at this point:
    Object.wait(long) line: not available [native method]    
   
SDMEvaluatePartialExpressionRequest(AbstractEventResultRequest).lockRequest(long)
line: 194    
    SDMEvaluatePartialExpressionRequest(AbstractEventResultRequest).waiting()
line: 230    
   
SDMEvaluatePartialExpressionRequest(AbstractEventResultRequest).waitUntilCompleted(TaskSet)
line: 127    
   
SDMEvaluatePartialExpressionRequest(AbstractEvaluatePartialExpressionRequest).getPartialAIF(TaskSet)
line: 80    
    LocalVariable(Variable).getChildren() line: 131    
    ExpressionManager.deleteVariable(IPDIVariable) line: 185    
    Expression.dispose() line: 54    
    PExpression.dispose() line: 88    
    PStackFrame.disposeExpressions() line: 917    
    PStackFrame.dispose() line: 893    
    PThread.disposeStackFrames() line: 718    
    PThread.resumedByTarget(int, List<DebugEvent>) line: 859    
    PDebugTarget.resumeThreads(List<DebugEvent>, int) line: 1366    
    PDebugTarget.handleResumedEvent(IPDIResumedEvent) line: 969    
    PDebugTarget.handleDebugEvents(IPDIEvent[]) line: 495    
    AbstractEventManager$1.run() line: 137    
    Session$NotifyJob.run(IProgressMonitor) line: 965
Comment 3 Xuan Chen CLA 2010-09-29 11:06:36 EDT
Thanks for the information.

Yeah, I tried the similar scenario in Variables view, and did not run into this issue.
Comment 4 Justin Kong CLA 2010-10-20 20:48:11 EDT
I tried to get a better understanding of this problem, and it appears to be occurring due to threading issues from processing both a ProxyDebugOKEvent from the Resume command and a ProxyDebugExitEvent. 


I'm going to refer to three threads throughout my description:
- SDMEventManager Thread 
- PDebugTarget Thread 
- EventRequestManager Thread


Here are the sequence of events of how the problem occurs after the Resume command is sent:
1. The SDMEventManager Thread receives a ProxyDebugOKEvent from the Resume command, which prompts it to fire this event off to all listeners, one of which is the PDebugTarget, and this event is of course handled in what I'm calling the PDebugTarget Thread.  

2. The PDebugTarget Thread will call PDebugTarget.resumeThreads (a synchronized method), which will eventually call ExpressionManger.deleteVariable. This method will send a DeletePartialExpressionRequest to the EventRequestManager Thread, and will then wait until the request is complete. 

3. At the same time, because the program has run to completion, a ProxyDebugExitEvent is being processed in the SDMEventManager Thread. This thread will eventually call the EventRequestManager Thread to perform a StopDebuggerRequest. 

4. For this problem to occur, the timing has to be such that:

a) The DeletePartialExpressionRequest is performed before the StopDebuggerRequest 
b) The StopDebuggerRequest runs before the SDMEventManager Thread processes the ProxyDebugOKEvent for the DeletePartialExpressionRequest

This is because when the StopDebuggerRequest is performed, SDMEventManager.notifyEventRequest is called, which removes the DeletePartialExpressionRequest from the requestList. Then, when the ProxyDebugOKEvent for the DeletePartialExpression is handled in SDMEventManager.update, it will not receive any request when it calls #getCurrentRequest, and thus #verifyEvent will not set the request's status to completed. This causes the PDebugTarget Thread to believe that the request is still running and forces the thread to remain there indefinitely.

5. Because the PDebugTarget Thread is stuck waiting indefinitely within a synchronized method, when the user performs the "Remove All Terminated" command, the UI thread will be stuck waiting until it can enter the sychronized method. The thread and call stack are also not removed due to the PDebugTarget Thread hang, as it blocks other listeners from performing their proper shutdown methods.


There may be other scenarios in which this problem occurs, but I found that this was the one that was causing the main freeze
Comment 5 Justin Kong CLA 2010-10-27 10:53:40 EDT
Created attachment 181840 [details]
Patch

To address the previous issues I mentioned, I've written up a patch. Here is the reasoning behind the major changes:

EventRequestManager
- #addEventRequest: If the session has a status of EXITING or EXITED, requests are no longer added as there doesn't seem to be any reason to process them.
- EventRequestDispatchJob.run: The request's status is set to RUNNING prior to entering the execute method. This is to provide more accurate status of the request for other methods.
- EventRequestDispatchJob.flushEventRequests: Instead of simply clearing all of the requests, now only those requests that aren't currently in the RUNNING status are removed. This is because if a request is already running, it makes sense to have it simply continue running and complete (or run into an error) normally, as opposed to having it simply removed. If the request is not in the RUNNING status, then the request is canceled, so that any threads waiting on the request are properly informed.

AbstractEventManager
- #notifyEventRequest: No elements are added to requestList if the request is IPDIStopDebuggerRequest, as such, no elements should be removed either.
- #removeAllRegisteredEventRequests: This method will now only remove those requests that are not currently in the RUNNING status. I think this is the proper behavior because if a request is added to requestList, that means it's already being processed and should continue running and complete (or run into an error) normally. If a request is removed from requestList, it can no longer be processed properly by SDMEventManager.verifyEvent, and in the case of the Expressions view, this means a thread is left waiting forever for an update on the request's status.


One area that I haven't looked at in depth yet, is when IPDIInternalEventRequest.waitUntilCompleted is called for any request, and that request is canceled, there doesn't seem to be any indication that the request was canceled and not just completed normally (other than having to check the request's status).
Comment 6 Greg Watson CLA 2010-11-01 10:49:02 EDT
Applied to HEAD. I've also applied this to 4.0. Even though I can't reproduce the problem, the same code is being used.