Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 333284 - Thread state is shown incorrectly after attaching to an app in non-stop mode
Summary: Thread state is shown incorrectly after attaching to an app in non-stop mode
Status: RESOLVED FIXED
Alias: None
Product: CDT
Classification: Tools
Component: cdt-debug-dsf-gdb (show other bugs)
Version: 8.0   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 8.0   Edit
Assignee: Marc Khouzam CLA
QA Contact: Marc Khouzam CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-28 17:53 EST by Sergey Prigogin CLA
Modified: 2012-01-30 10:12 EST (History)
3 users (show)

See Also:
marc.khouzam: review? (eclipse.sprigogin)


Attachments
Prototype for a new approach to attaching in non-stop mode (6.73 KB, patch)
2011-01-14 21:22 EST, Marc Khouzam CLA
marc.khouzam: iplog-
Details | Diff
Fix (6.11 KB, patch)
2011-04-03 21:23 EDT, Marc Khouzam CLA
marc.khouzam: iplog-
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sergey Prigogin CLA 2010-12-28 17:53:16 EST
To reproduce:

1. Compile and link the following program and run it from a terminal window:

#include <assert.h>
#include <pthread.h>
#include <unistd.h>
#include <cstdio>

void* RunInThread(void* arg) {
  for (int i = 0; i < 3600; i++) {
    sleep(1);
    printf("secondary thread: %d\n", i);
  }
  return NULL;
}

int main() {
  pthread_attr_t attr;
  assert(pthread_attr_init(&attr) == 0);
  pthread_t tid;
  assert(pthread_create(&tid, &attr, &RunInThread, NULL) == 0);
  assert(pthread_attr_destroy(&attr) == 0);

  for (int i = 0; i < 3600; i++) {
    sleep(1);
    printf("main thread: %d\n", i);
  }
  return 0;
}

2. Create a C/C++ Attach to Application debug configuration and enable non-stop mode. 
3. Start a debugging session. The Debug window shows that the main thread is suspended and the second thread is still running.
  test attach [C/C++ Attach to Application]	
    tmp/test [11329] [cores: 0]	
      Thread [2] 11332 [core: 0] (Running)	
      Thread [1] 11329 [core: 0] (Suspended : User Request)	
        nanosleep() at 0x7fe170f6effd	
        sleep() at 0x7fe170ffe9e0	
        main() at test.cc:22 0x460642	
      gdb	

The state of the secondary thread is shown incorrectly since the thread doesn't print anything to the terminal window and therefore must be suspended.

4. Resume the main thread. Debug window shows that both threads are running, but, in fact, the secondary thread is still suspended.
5. Detach from the program. Now both threads are running again.
6. Repeat step #3
7. Select the program node (tmp/test) in the Debug view and click Resume. Both threads are resumed as indicated by their output to the terminal window.

Attaching to a program without the non-stop mode appears to work fine.
Comment 1 Sergey Prigogin CLA 2010-12-28 18:02:24 EST
Here is a fragment of gdb trace that indicates that gdb correctly reported both thread as "stopped":

887,383 17^done,groups=[{id="i2",type="process",pid="11979",executable="tmp/test",cores=["0","1"]},{id="i1",type="process"}]
887,383 (gdb) 
887,417 18-list-thread-groups
887,417 18^done,groups=[{id="i2",type="process",pid="11979",executable="tmp/test",cores=["0","1"]},{id="i1",type="process"}]
887,417 (gdb) 
887,424 19-list-thread-groups i2
887,425 19^done,threads=[{id="2",target-id="Thread 0x7f0bc8f50710 (LWP 11980)",frame={level="0",addr\
="0x00007f0bc918affd",func="nanosleep",args=[],from="/usr/lib64/libc.so.6"},state="stopped",\
core="1"},{id="1",target-id="Thread 0x7f0bca0c1740 (LWP 11979)",frame={level="0",addr="0x00007f0bc91\
8affd",func="nanosleep",args=[],from="/usr/lib64/libc.so.6"},state="stopped",core="0"}]
887,426 (gdb) 
887,434 20-stack-info-depth --thread 1 11
887,434 20^done,depth="3"
887,435 (gdb) 
887,435 21-stack-list-frames --thread 1
887,436 21^done,stack=[frame={level="0",addr="0x00007f0bc918affd",func="nanosleep",from="/usr/grte/v\
2/lib64/libc.so.6"},frame={level="1",addr="0x00007f0bc921a9e0",func="sleep",from="/usr/lib64\
/libc.so.6"},frame={level="2",addr="0x0000000000460642",func="main",file="test.cc",line="22"}]
887,436 (gdb) 
887,457 22-thread-info 1
887,458 22^done,threads=[{id="1",target-id="Thread 0x7f0bca0c1740 (LWP 11979)",frame={level="0",addr\
="0x00007f0bc918affd",func="nanosleep",args=[],from="/usr/lib64/libc.so.6"},state="stopped",\
core="0"}]
887,458 (gdb) 
887,587 23-thread-info 2
887,588 23^done,threads=[{id="2",target-id="Thread 0x7f0bc8f50710 (LWP 11980)",frame={level="0",addr\
="0x00007f0bc918affd",func="nanosleep",args=[],from="/usr/lib64/libc.so.6"},state="stopped",\
core="1"}]
887,588 (gdb) 
887,590 24-stack-list-frames --thread 1 0 2
887,591 24^done,stack=[frame={level="0",addr="0x00007f0bc918affd",func="nanosleep",from="/usr/grte/v\
2/lib64/libc.so.6"},frame={level="1",addr="0x00007f0bc921a9e0",func="sleep",from="/usr/lib64\
/libc.so.6"},frame={level="2",addr="0x0000000000460642",func="main",file="test.cc",line="22"}]
887,591 (gdb)
Comment 2 Sergey Prigogin CLA 2011-01-04 13:00:42 EST
Does anybody with debugging foo have spare cycles to take a look at this bug? This bug is pretty serious since it's almost guaranteed to confuse the hell out of an unsuspecting user.
Comment 3 Marc Khouzam CLA 2011-01-06 14:26:40 EST
I'll have a look sometime next week.

Did you see the problem with CDT 7.0.1?
Which GDB version where you using?
Comment 4 Sergey Prigogin CLA 2011-01-06 14:36:08 EST
(In reply to comment #3)
> I'll have a look sometime next week.
> 
> Did you see the problem with CDT 7.0.1?

I haven't tried with 7.0.1.

> Which GDB version where you using?

7.2
Comment 5 Marc Khouzam CLA 2011-01-12 16:27:31 EST
The problem seems to be that we don't get a *stopped event for each of the threads, when we attach to the process.  Currently, we rely solely on the *stopped events to mark a thread as suspended.

We really should take into consideration the state reported by gdb in list-thread-groups i2 and in -thread-info to make sure we are aware of real current state.

The problem never came up before because *stopped events are reliable, but for the first time of an attach, they seem to not be.

I'll look into a solution.
Comment 6 Marc Khouzam CLA 2011-01-12 16:30:06 EST
Note that we may only need to fix this for the non-stop case, since it is the only case where threads don't all have the same state.  That would be in GDBRunControl_7_0_NS (for information purpose).
Comment 7 Marc Khouzam CLA 2011-01-14 21:22:42 EST
Created attachment 186869 [details]
Prototype for a new approach to attaching in non-stop mode

Instead of a fix of the current problem, I wondered if we should take a different approach altogether.

In non-stop mode, GDB allows the use of an asynchronous attach 
-target-attach <pid>&

This would cause all threads to remain running when we attach to an application.  This is much less intrusive than the current behavior.  It would allow the application to only stop once it hits a breakpoint, which looks nicer.

This patch illustrates the result.  I _must_ be run with assertion off (no -ea flag) because there is a problem with setting the first breakpoints when launching such a session.  But if this approach is the way we want to go, we can fix this problem.

Sergey, what do you think of this idea?
Anyone else?

P.S. this asynchronous attach is even allowed in all-stop mode if we enable target-async mode.  We may want to do that if we like this approach.
Comment 8 Sergey Prigogin CLA 2011-01-14 21:27:18 EST
(In reply to comment #7)

I really like this approach. Non-stop is supposed to be non-stop, isn't it :-)?
Comment 9 Marc Khouzam CLA 2011-01-14 21:43:50 EST
(In reply to comment #8)
> (In reply to comment #7)
> 
> I really like this approach. Non-stop is supposed to be non-stop, isn't it :-)?

Right :-)

But if you think about it, even in all-stop, when the user attaches to a process, why would they want to interrupt it immediately?  Doesn't it make more sense to only stop the process once we actually reach a point of interest (a breakpoint)?  But that is a separate enhancement bug.

This will actually be the approach for 'global breakpoints', where the user will be able to set a breakpoint in a piece of code, without even attaching to a process at all.
Comment 10 Marc Khouzam CLA 2011-04-03 21:23:48 EDT
Created attachment 192429 [details]
Fix

Now that Bug 337893 is resolved, which was causing breakpoint problems in this case, we can fix the current bug.

This patch uses 
  -target-attach <pid> & 
when in non-stop mode, to avoid interrupting the target when attaching.  This is of course for GDB >= 7.0 since that is the only ones that support non-stop.

A side-effect of this fix is that the thread states are now show properly, which was the real issue with this bug.

We cannot use this form of -target-attach for all-stop mode because we currently don't use 'target-async on' when using all-stop, which is a prerequisite.  But that is ok, because we didn't have any thread-state issues for all-stop.

There are some minor backwards-compatible API changes which I feel are worth adding to fix this.

Committed to HEAD.
Comment 11 Marc Khouzam CLA 2011-04-03 21:24:23 EDT
Fixed.

Sergey, can you review?
Comment 13 Sergey Prigogin CLA 2011-04-05 20:44:10 EDT
I can't judge the code, but attaching to a multi-threaded process in non-stop mode now works flawlessly.

Few issues popped up in a subsequent debugging session. Please let me know if I should file separate bugs for them.

1. Attempts to step over function calls behaved as Step Into and triggered error messages like:
    Warning:
    Cannot insert breakpoint 0.
    Error accessing memory address 0x6ac2287: Input/output error.

2. Over time all worker threads stopped on
    (Suspended : Signal : SIGPWR:Power fail/restart)

Suspension on SIGPWR does not happen until the first stepping action.	

Is there a way to disable thread suspension on SIGPWR?
Comment 14 Marc Khouzam CLA 2011-04-05 21:26:45 EDT
(In reply to comment #13)
> I can't judge the code, but attaching to a multi-threaded process in non-stop
> mode now works flawlessly.

Excellent.  I'm happy about the bug fix, but also about the new behavior.  I think it is much nicer for non-stop to not interrupt the process.

> Few issues popped up in a subsequent debugging session. Please let me know if I
> should file separate bugs for them.
> 
> 1. Attempts to step over function calls behaved as Step Into and triggered
> error messages like:
>     Warning:
>     Cannot insert breakpoint 0.
>     Error accessing memory address 0x6ac2287: Input/output error.

Hm, setting a breakpoint when doing a step over?  That may be GDB trying to set the breakpoint implicitly to step past the function.  If that is the case, it would be a GDB error.

Can you write a new bug, attach the 'gdb traces' console logs and, if possible a  way to simply reproduce the problem?

> 2. Over time all worker threads stopped on
>     (Suspended : Signal : SIGPWR:Power fail/restart)
> 
> Suspension on SIGPWR does not happen until the first stepping action.    
> 
> Is there a way to disable thread suspension on SIGPWR?

I never heard of SIGPWR.  Why are threads getting that signal?
Comment 15 Sergey Prigogin CLA 2011-04-05 21:42:35 EDT
(In reply to comment #14)
> I never heard of SIGPWR.  Why are threads getting that signal?

This could be caused by the application shutting down itself because it was not happy with something. Is there a way to disable suspension on signals in general?
Comment 16 Marc Khouzam CLA 2011-04-05 22:21:57 EDT
(In reply to comment #15)
> (In reply to comment #14)
> > I never heard of SIGPWR.  Why are threads getting that signal?
> 
> This could be caused by the application shutting down itself because it was not
> happy with something. Is there a way to disable suspension on signals in
> general?

http://sourceware.org/gdb/onlinedocs/gdb/Signals.html

(gdb) help handle
Specify how to handle a signal.
Args are signals and actions to apply to those signals.
Symbolic signals (e.g. SIGSEGV) are recommended but numeric signals
from 1-15 are allowed for compatibility with old versions of GDB.
Numeric ranges may be specified with the form LOW-HIGH (e.g. 1-5).
The special arg "all" is recognized to mean all signals except those
used by the debugger, typically SIGTRAP and SIGINT.
Recognized actions include "stop", "nostop", "print", "noprint",
"pass", "nopass", "ignore", or "noignore".
Stop means reenter debugger if this signal happens (implies print).
Print means print a message if this signal happens.
Pass means let program see this signal; otherwise program doesn't know.
Ignore is a synonym for nopass and noignore is a synonym for pass.
Pass and Stop may be combined.


Note that this is not supported in DSF-GDB but you can type it in the gdb console.  It could be an enhancement request.
Comment 17 Marc Khouzam CLA 2011-04-05 23:25:36 EDT
(In reply to comment #13)

> 1. Attempts to step over function calls behaved as Step Into and triggered
> error messages like:
>     Warning:
>     Cannot insert breakpoint 0.
>     Error accessing memory address 0x6ac2287: Input/output error.

I found a minor problem in my fix to Bug 337893 which I committed a change for just now.  I don't know if it might be the cause of the error you saw or not, but you may want to try it out again after updating your DSF-GDB code.
Comment 18 Tim Jiang CLA 2012-01-29 03:24:13 EST
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > 
> > I really like this approach. Non-stop is supposed to be non-stop, isn't it :-)?
> 
> Right :-)
> 
> But if you think about it, even in all-stop, when the user attaches to a
> process, why would they want to interrupt it immediately?  Doesn't it make more
> sense to only stop the process once we actually reach a point of interest (a
> breakpoint)?  But that is a separate enhancement bug.
> 
> This will actually be the approach for 'global breakpoints', where the user
> will be able to set a breakpoint in a piece of code, without even attaching to
> a process at all.

I do love to see such feature that even in stop-mode, attaching to a process does not interrupt the running process (any threads) unless any threads run to breakpoints.

Is this feature/enhancement in plan or already covered?

Thanks great,

Tim Jiang
Comment 19 Marc Khouzam CLA 2012-01-30 10:12:05 EST
(In reply to comment #18)

> I do love to see such feature that even in stop-mode, attaching to a process
> does not interrupt the running process (any threads) unless any threads run to
> breakpoints.
> 
> Is this feature/enhancement in plan or already covered?

In Eclipse, we could automatically resume all threads after attaching, which would give the user the impression that nothing stopped.

I personally have no plans to work on that since I am waiting for GDB's global breakpoints:
http://sourceware.org/ml/gdb-patches/2011-06/msg00163.html

But if someone else wants to contribute this feature, that would be fine.