| Summary: | Workbench hang when run a program to termination while expression being displayed | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Tools] PTP | Reporter: | Xuan Chen <xuanchen> | ||||
| Component: | Debug UI | Assignee: | Greg Watson <g.watson> | ||||
| Status: | RESOLVED FIXED | QA Contact: | |||||
| Severity: | major | ||||||
| Priority: | P3 | CC: | justinko, xuanchen | ||||
| Version: | 5.0 | ||||||
| Target Milestone: | --- | ||||||
| Hardware: | PC | ||||||
| OS: | Windows XP | ||||||
| Whiteboard: | |||||||
| Attachments: |
|
||||||
It looks like there is some kind of race condition involving the expressions view as I'm seeing strange behavior when this view is active. I would suggest using the variables view until I can work out the problem. I tried to debug this problem to have a better understanding of it. And I set
a breakpoint at EventRequestManager#addEventRequest.
When I removed all the breakpoints in the PTP debug session, and do a run (so
the program will run to complete), I can see the breakpoint in
EventRequestManager#addEventRequest() was hit, and the request wass partial
evaluation expression:
Thread [Worker-10] (Suspended (breakpoint at line 182 in EventRequestManager))
EventRequestManager.addEventRequest(IPDIEventRequest) line: 182
LocalVariable(Variable).getChildren() line: 130
ExpressionManager.deleteVariable(IPDIVariable) line: 185
Expression.dispose() line: 54
PExpression.dispose() line: 88
PStackFrame.disposeExpressions() line: 917
PStackFrame.dispose() line: 893
PThread.disposeStackFrames() line: 718
PThread.resumedByTarget(int, List<DebugEvent>) line: 859
PDebugTarget.resumeThreads(List<DebugEvent>, int) line: 1366
PDebugTarget.handleResumedEvent(IPDIResumedEvent) line: 969
PDebugTarget.handleDebugEvents(IPDIEvent[]) line: 495
AbstractEventManager$1.run() line: 137
Session$NotifyJob.run(IProgressMonitor) line: 965
Worker.run() line: 54
This request was invoke because of the deleteVariable() call, which I think
should not be sent.
And then I can see this thread was hang on this line in
Variable#getChildren:
IAIF aif = request.getPartialAIF(getTasks());
And this is the call stack at this point:
Object.wait(long) line: not available [native method]
SDMEvaluatePartialExpressionRequest(AbstractEventResultRequest).lockRequest(long)
line: 194
SDMEvaluatePartialExpressionRequest(AbstractEventResultRequest).waiting()
line: 230
SDMEvaluatePartialExpressionRequest(AbstractEventResultRequest).waitUntilCompleted(TaskSet)
line: 127
SDMEvaluatePartialExpressionRequest(AbstractEvaluatePartialExpressionRequest).getPartialAIF(TaskSet)
line: 80
LocalVariable(Variable).getChildren() line: 131
ExpressionManager.deleteVariable(IPDIVariable) line: 185
Expression.dispose() line: 54
PExpression.dispose() line: 88
PStackFrame.disposeExpressions() line: 917
PStackFrame.dispose() line: 893
PThread.disposeStackFrames() line: 718
PThread.resumedByTarget(int, List<DebugEvent>) line: 859
PDebugTarget.resumeThreads(List<DebugEvent>, int) line: 1366
PDebugTarget.handleResumedEvent(IPDIResumedEvent) line: 969
PDebugTarget.handleDebugEvents(IPDIEvent[]) line: 495
AbstractEventManager$1.run() line: 137
Session$NotifyJob.run(IProgressMonitor) line: 965
Thanks for the information. Yeah, I tried the similar scenario in Variables view, and did not run into this issue. I tried to get a better understanding of this problem, and it appears to be occurring due to threading issues from processing both a ProxyDebugOKEvent from the Resume command and a ProxyDebugExitEvent. I'm going to refer to three threads throughout my description: - SDMEventManager Thread - PDebugTarget Thread - EventRequestManager Thread Here are the sequence of events of how the problem occurs after the Resume command is sent: 1. The SDMEventManager Thread receives a ProxyDebugOKEvent from the Resume command, which prompts it to fire this event off to all listeners, one of which is the PDebugTarget, and this event is of course handled in what I'm calling the PDebugTarget Thread. 2. The PDebugTarget Thread will call PDebugTarget.resumeThreads (a synchronized method), which will eventually call ExpressionManger.deleteVariable. This method will send a DeletePartialExpressionRequest to the EventRequestManager Thread, and will then wait until the request is complete. 3. At the same time, because the program has run to completion, a ProxyDebugExitEvent is being processed in the SDMEventManager Thread. This thread will eventually call the EventRequestManager Thread to perform a StopDebuggerRequest. 4. For this problem to occur, the timing has to be such that: a) The DeletePartialExpressionRequest is performed before the StopDebuggerRequest b) The StopDebuggerRequest runs before the SDMEventManager Thread processes the ProxyDebugOKEvent for the DeletePartialExpressionRequest This is because when the StopDebuggerRequest is performed, SDMEventManager.notifyEventRequest is called, which removes the DeletePartialExpressionRequest from the requestList. Then, when the ProxyDebugOKEvent for the DeletePartialExpression is handled in SDMEventManager.update, it will not receive any request when it calls #getCurrentRequest, and thus #verifyEvent will not set the request's status to completed. This causes the PDebugTarget Thread to believe that the request is still running and forces the thread to remain there indefinitely. 5. Because the PDebugTarget Thread is stuck waiting indefinitely within a synchronized method, when the user performs the "Remove All Terminated" command, the UI thread will be stuck waiting until it can enter the sychronized method. The thread and call stack are also not removed due to the PDebugTarget Thread hang, as it blocks other listeners from performing their proper shutdown methods. There may be other scenarios in which this problem occurs, but I found that this was the one that was causing the main freeze Created attachment 181840 [details]
Patch
To address the previous issues I mentioned, I've written up a patch. Here is the reasoning behind the major changes:
EventRequestManager
- #addEventRequest: If the session has a status of EXITING or EXITED, requests are no longer added as there doesn't seem to be any reason to process them.
- EventRequestDispatchJob.run: The request's status is set to RUNNING prior to entering the execute method. This is to provide more accurate status of the request for other methods.
- EventRequestDispatchJob.flushEventRequests: Instead of simply clearing all of the requests, now only those requests that aren't currently in the RUNNING status are removed. This is because if a request is already running, it makes sense to have it simply continue running and complete (or run into an error) normally, as opposed to having it simply removed. If the request is not in the RUNNING status, then the request is canceled, so that any threads waiting on the request are properly informed.
AbstractEventManager
- #notifyEventRequest: No elements are added to requestList if the request is IPDIStopDebuggerRequest, as such, no elements should be removed either.
- #removeAllRegisteredEventRequests: This method will now only remove those requests that are not currently in the RUNNING status. I think this is the proper behavior because if a request is added to requestList, that means it's already being processed and should continue running and complete (or run into an error) normally. If a request is removed from requestList, it can no longer be processed properly by SDMEventManager.verifyEvent, and in the case of the Expressions view, this means a thread is left waiting forever for an update on the request's status.
One area that I haven't looked at in depth yet, is when IPDIInternalEventRequest.waitUntilCompleted is called for any request, and that request is canceled, there doesn't seem to be any indication that the request was canceled and not just completed normally (other than having to check the request's status).
Applied to HEAD. I've also applied this to 4.0. Even though I can't reproduce the problem, the same code is being used. |
I saw this kind of hanging quite often, and now I was able to recreate it using a simple C program. And this is the source for the C program: ************************************ #include <stdio.h> int main() { int debug = 1; int a1[10]; a1[0] = 0; a1[1] = 1; a1[2] = 2; a1[3] = 3; a1[4] = 4; a1[5] = 5; struct MappingStruct0 { int intOne; int intTwo; int* intPtr; long longVal; } myStruct0; myStruct0.intOne = 5; myStruct0.intTwo = 6; myStruct0.longVal = 155; struct MappingStruct { int intOne; int intTwo; int a[10]; int* intPtr; long longVal; } myStruct; myStruct.a[0] = 0; myStruct.a[1] = 1; myStruct.a[2] = 2; myStruct.a[3] = 3; myStruct.a[4] = 4; myStruct.a[5] = 5; printf("Hello World1!\n"); return 0; } ************************************ I set a breakpoint at the printf line at the end, and run into it. Then I put structure myStruct into the Expression view. Expand myStruct, and then a under it. Run this program to termination. I can see the job was end (red stop icon beside it), but the thread and call stack are still displayed in the Debug view. Right click on the job in Parallel Debug view, and select Remote Terminated. The workbench hang. The following is part is the trace I saw in the console of my development workspace: ********************************************* Msg: EventRequestManager - addEventRequest(): Request: Partial expression evaluation for tasks {0}, exp: *(((myStruct).a)+0)@10, id: null, is list children: true, is expression: false **** Msg: AbstractEventManager - registerEventRequest(): Request: Partial expression evaluation for tasks {0}, exp: *(((myStruct).a)+0)@10, id: null, is list children: true, is expression: false Msg: SDMEventManager - update(): Event: EVENT_DBG_DATA transid=0 1:01 org.eclipse.ptp.proxy.debug.client.ProxyDebugAIF@1963c7b, name: var2 Msg: SDMEventManager - fireEvent(): event EVENT_DBG_DATA transid=0 1:01 org.eclipse.ptp.proxy.debug.client.ProxyDebugAIF@1963c7b, name: var2 ======= int: is4 ======== range: r0,10is4 ======= int: is4 ======= array: [r0,10is4]is4 **** Msg: AbstractEventManager - notifyEventRequest(): Request: Partial expression evaluation for tasks {}, exp: *(((myStruct).a)+0)@10, id: null, is list children: true, is expression: false Msg: EventRequestManager - addEventRequest(): Request: Resume request in status [UNKNOWN] for {0}. **** Msg: AbstractEventManager - registerEventRequest(): Request: Resume request in status [UNKNOWN] for {0}. Msg: SDMEventManager - update(): Event: EVENT_DBG_OK transid=0 1:01 Msg: SDMEventManager - fireEvent(): event EVENT_DBG_OK transid=0 1:01 Msg: SDMEventManager - fireEvent(): added PDIEvent: org.eclipse.ptp.debug.internal.core.pdi.event.ResumedEvent@37f1db Msg: NotifyJob - size of runnables: 1 **** Msg: AbstractEventManager - notifyEventRequest(): Request: Resume request in status [DONE] for {}. Msg: NotifyJob - size of runnables: 22 Msg: SDMEventManager - update(): Event: EVENT_DBG_EXIT transid=0 1:01 0 Msg: SDMEventManager - fireEvent(): event EVENT_DBG_EXIT transid=0 1:01 0 Msg: EventRequestManager - addEventRequest(): Request: Delete partial expression request in status [UNKNOWN] for {0}. ================= TIME RESUME: 1285618426976 **** Msg: AbstractEventManager - registerEventRequest(): Request: Delete partial expression request in status [UNKNOWN] for {0}. Msg: SDMEventManager - fireEvent(): added PDIEvent: org.eclipse.ptp.debug.internal.core.pdi.event.DestroyedEvent@1c3dcdc Msg: EventRequestManager - addEventRequest(): Request: Stop debugger request in status [UNKNOWN] for {}. **** Msg: AbstractEventManager - notifyEventRequest(): Request: Stop debugger request in status [DONE] for {}. Msg: SDMEventManager - fireEvent(): added PDIEvent: org.eclipse.ptp.debug.internal.core.pdi.event.DisconnectedEvent@97388b Msg: SDMEventManager - update(): Event: EVENT_DBG_OUTPUT transid=0 1:01 Hello World1! Msg: SDMEventManager - fireEvent(): event EVENT_DBG_OUTPUT transid=0 1:01 Hello World1! Msg: SDMEventManager - fireEvent(): added PDIEvent: org.eclipse.ptp.debug.internal.core.pdi.event.OutputEvent@189eebb ******************************************************************* And these are the stack trace of two threads which could be related to this problem: Thread [main] (Suspended) PDebugTarget(PDebugElement).resetStatus() line: 385 PDebugTarget.cleanup() line: 824 PDebugTarget.disconnected() line: 1120 PDebugTarget.launchRemoved(ILaunch) line: 614 LaunchManager$LaunchNotifier.run() line: 450 SafeRunner.run(ISafeRunnable) line: 42 LaunchManager$LaunchNotifier.notify(ILaunch, int) line: 435 LaunchManager.fireUpdate(ILaunch, int) line: 997 LaunchManager.removeLaunch(ILaunch) line: 2207 UIDebugManager(JobManager).removeJob(IPJob) line: 404 UIDebugManager.removeJob(IPJob) line: 495 ParallelJobsView$QueueChildListener.handleEvent(IRemoveJobEvent) line: 228 PQueue.fireRemoveJobs(Collection<IPJob>) line: 191 PQueue.removeJobs(Collection<IPJobControl>) line: 300 GenericResourceManager(AbstractRuntimeResourceManager).doRemoveTerminatedJobs(IPQueueControl) line: 339 GenericResourceManager(AbstractResourceManager).removeTerminatedJobs(IPQueue) line: 1030 UIDebugManager(JobManager).removeAllStoppedJobs() line: 384 RemoveAllTerminatedAction.run() line: 52 RemoveAllTerminatedAction(Action).runWithEvent(Event) line: 498 ActionContributionItem.handleWidgetSelection(Event, boolean) line: 584 ActionContributionItem.access$2(ActionContributionItem, Event, boolean) line: 501 ActionContributionItem$5.handleEvent(Event) line: 411 EventTable.sendEvent(Event) line: 84 MenuItem(Widget).sendEvent(Event) line: 1053 Display.runDeferredEvents() line: 4066 Display.readAndDispatch() line: 3657 Workbench.runEventLoop(Window$IExceptionHandler, Display) line: 2629 Workbench.runUI() line: 2593 Workbench.access$4(Workbench) line: 2427 Workbench$7.run() line: 670 Realm.runWithDefault(Realm, Runnable) line: 332 Workbench.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 663 PlatformUI.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 149 IDEApplication.start(IApplicationContext) line: 115 EclipseAppHandle.run(Object) line: 196 EclipseAppLauncher.runApplication(Object) line: 110 EclipseAppLauncher.start(Object) line: 79 EclipseStarter.run(Object) line: 369 EclipseStarter.run(String[], Runnable) line: 179 NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method] NativeMethodAccessorImpl.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: not available Method.invoke(Object, Object...) line: not available Main.invokeFramework(String[], URL[]) line: 619 Main.basicRun(String[]) line: 574 Main.run(String[]) line: 1407 Main.main(String[]) line: 1383 Thread [Worker-7] (Suspended) Object.wait(long) line: not available [native method] DeletePartialExpressionRequest(AbstractEventResultRequest).lockRequest(long) line: 194 DeletePartialExpressionRequest(AbstractEventResultRequest).waiting() line: 230 DeletePartialExpressionRequest(AbstractEventResultRequest).waitUntilCompleted(TaskSet) line: 127 ExpressionManager.deleteVariable(IPDIVariable) line: 178 Expression.dispose() line: 54 PExpression.dispose() line: 88 PStackFrame.disposeExpressions() line: 917 PStackFrame.dispose() line: 893 PThread.disposeStackFrames() line: 718 PThread.resumedByTarget(int, List<DebugEvent>) line: 859 PDebugTarget.resumeThreads(List<DebugEvent>, int) line: 1366 PDebugTarget.handleResumedEvent(IPDIResumedEvent) line: 969 PDebugTarget.handleDebugEvents(IPDIEvent[]) line: 495 AbstractEventManager$1.run() line: 137 Session$NotifyJob.run(IProgressMonitor) line: 965 Worker.run() line: 54