Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 194081

Summary: after re-attach, no more data is collected for methods already entered
Product: z_Archived Reporter: Alan Haggarty <haggarty>
Component: TPTPAssignee: Chengrui Deng <chengrui.deng>
Status: CLOSED FIXED QA Contact: Kathy Chan <kathy>
Severity: critical    
Priority: P1 CC: analexee, asaf.yaffe, chengrui.deng, chris.l.elford, ewchan, jgwest, kathy, paulslau, sluiman, yunan.he
Version: unspecifiedKeywords: plan
Target Milestone: ---Flags: kathy: pmc_approved? (oec)
kathy: pmc_approved+
kathy: pmc_approved? (sluiman)
ernest: pmc_approved+
kathy: pmc_approved+
paulslau: pmc_approved+
ewchan: pmc_approved+
asaf.yaffe: review+
jgwest: review+
Hardware: All   
OS: All   
Whiteboard: Milestone3 adopter
Attachments:
Description Flags
patch
none
Shadow Stack Solution Implementation patch. This patch is used for TPTP 4.5.2.1
none
Profilers binary files for TPTP 4.5.2.1 testing usage
none
Shadow stack solution patch to solve reattach and attach problem with CGProf
none
Patch - 4521 - Update Shadow stack solution to solve reattach and attach problem with CGProf
none
Reattach problem patch for TPTP 4.5.2.1
chengrui.deng: review?
screen capture of failure
none
screen capture of failure
none
Java crash dialog detail none

Description Alan Haggarty CLA 2007-06-22 21:50:36 EDT
Build ID: I20070608-1718

Steps To Reproduce:
Noticed behavior running DetachAttachTest? test cases
from the automation framework using tiagent.

Example:
1. Using test class StartStop
2. launch class collecting execution statistics (execution flow details)
3. After launch if I open the statistics I see something like:
Method      Cumulative Time
StartStop   0.00025
main        0.00025
4. I detach the agent and there is no change
5. hit enter in console (advances this test class)
6. reattach
7. hit enter in the console

After this the execution statistics in the new
monitor are:

Method         Cumulative Time
StartStop      0.000000
main           0.000000
methodB        0.001270

methodB was entered after the reattach.

Since the rest of main was run through after this
shouldn't there be an updated data for this?

InOut is a simpler test class where the only other method
methodA is not being monitored because of the detach. In
this case the second monitor shows all 0s.

In a different piagent test run on linux the second 
monitor shows no data (maybe ti shows data for methods 
with 0 calls but pi does not?)

More information:
StartStop and InOut are from the test automation framework in:
/org.eclipse.tptp.ac.testautomation/automation-files/external_classes
Comment 1 Asaf Yaffe CLA 2007-06-25 09:56:22 EDT
Alan,

I cannot reproduce this with the tptp.runtime-TPTP-4.4.0-200706140100C driver. I believe this problem is a manifestation of another bug which was resolved.

Please check again and close if you cannot reproduce.

Thanks,
Asaf
Comment 2 Guru Nagarajan CLA 2007-06-25 13:22:40 EDT
WOrks for me as well.
Comment 3 Alan Haggarty CLA 2007-06-29 16:38:47 EDT
I just retried on XP and get the same behavior. I am pretty sure I was
using C at the time of logging as well.
In this case I am using the GA download:
tptp.runtime-TPTP-4.4.0.zip
and the IAC with a new workspace and jre 1.5.

When you say you cannot reproduce - what behavior/results do you get?
Comment 4 Asaf Yaffe CLA 2007-07-01 04:30:39 EDT
(In reply to comment #3)
The behavior depends on the initial configuration used when launching the CPU Profiler:

If the profiler is launched in "Show execution flow graphical details" mode, the behavior described in this bug is fully reproducible with the GA build.

If the profiler is launched in "Show execution statistics" (aggregated) mode, the behavior is different: after re-attach (step 6), execution statistics for method "main" are correctly updated to reflect the total execution time of the application.

The reason for this is probably related to the fact that when you re-attach to the CPU Profiler, it internally switches to "aggregated" mode regardless of the initial configuration (a built-in limitation, described in comment 19 of bug 190687). This "switch" is probably the cause for this bug.

Guru - more investigation is needed here. Please assign and prioritize this bug.
Comment 5 Guru Nagarajan CLA 2007-07-02 19:36:35 EDT
This needs to be addressed and exposed to the end user appropriately, updating the tabs and presenting a simple message to the end user.
This has to be in the 4.4.1 
Comment 6 Guru Nagarajan CLA 2007-08-09 21:29:50 EDT
This use case has to be revisited. Cannot be contained in 4.4.1
Comment 7 Alexander N. Alexeev CLA 2008-01-24 07:24:16 EST
This defect should be targeted for future since it can be resolved without context reconstruction after attach.
Comment 8 Alexander N. Alexeev CLA 2008-03-04 05:45:00 EST
target for future
Comment 9 Harm Sluiman CLA 2008-03-11 09:28:41 EDT
(In reply to comment #8)
> target for future
> 

Can we have a discussion on this? I don't think it all has to fall on the agent. Although that is certainly one way to do it.

The PI agent had the behavour of caching the last few 100 events before an attach, but it also kept track fo the stack and all associated "defintion" events that had been sent. SO it in effect had a rough idea of what the client knew. This is sort of what you are proposing, but is not completely required.

If the agent assumes the client has no data and just sends the events it has at and after attach, and also sends the definition events it has not sent in that attach session, the client has enough information to recover.
If exit events come with out entry events, the entry time can be default to attach  time (base time) or the time of the last know entry event.. If the definition is not received we use the default "unknown" but if the agent sends the def event the  definition will be created and used or the ID will match and the previous unknown will be updated with the correct info..

I know that some bugs have creeped into the loaders that may cause this to break initially, but we can and should fix those bugs. 

Does this make the implementation more viable? and simpler?
Comment 10 Alexander N. Alexeev CLA 2008-05-27 15:43:50 EDT
Created attachment 102226 [details]
patch

This fix can cover only narrow set of use cases. The main limitation for stack reconstruction only by "method exit" event is JVM instrumentation policy. It doesn't allow to change bytecode for methods already entered. So, if profiler attaches to the application first time it shouldn't obtain any "exit method" events even for methods which covered by filters and should be instrumented. Instrumented methods are used in execution only if they are entered after instrumentation.
The only one case in which this fix works is reattaching to application where instrumentation remains since detach because instrumented methods haven't been left yet.
Even this approach is arguable because filtering and hence set of classes instrumented after attach can be changed. Such mixing of instrumentation's versions can disfigure results.
Probably it is rare situation in common use cases. 
One solution to avoid mixing is add version of instrumentation to events and collect only corresponding to current session, but it can introduce extra overhead. 

With fix profiler builds up call graph if unbound "exit method" event comes.
Comment 11 Alexander N. Alexeev CLA 2008-05-27 15:46:58 EDT
Asaf, could you review this patch?
Comment 12 Asaf Yaffe CLA 2008-05-28 03:46:51 EDT
(In reply to comment #10)
I am not sure about the value of applying this patch vs. the potential risk of introducing regression issues. As Alex pointed out, this patch correctly covers only a handful on scenarios that can be reproduced in testing but will not usually occur with real-world applications.

Harm - the main difference between the PI and the TI agent is that the PI agent was able to keep tracking of JMVPI events during detach (either silently processing or ignoring them). In TI, this is not possible because once detached, the TI Agent removes all BCI from the code and does not get any events from the running application. There is another issue (mentioned by Alex) of the inability to instrument methods currently on-stack (a JVMTI limitation). 

My understanding of Harm's suggestion (in comment #9) is that this can be fixed at the workbench/client side by correctly processing "method exit" events that have no corresponding "method enter" events. If my understanding it correct, then the fix should target the workbench code and not the agent code.
Comment 13 Alexander N. Alexeev CLA 2008-05-28 06:33:40 EDT
My decision to put recovery code to the agent is based on the following:
- Loaders code is common for PI and TI agents but this issue is only actual for
TI 
- Fix for deviation in behavior is reasonable to put to the component which
causes it.
- I think at the end agent should drop out method exit events which are not
corresponds to the current instrumentation, and this fix would be removed.

I agree with Asaf, proposed usecase isn't applicable for TI agent, and the fix
can't help in general.
Comment 14 Alexander N. Alexeev CLA 2008-05-29 04:40:13 EDT
Harm, Joanna, what do you recommend for proceeding with this bug?
Comment 15 Asaf Yaffe CLA 2008-05-30 07:29:09 EDT
Patch is OK.

Comment 16 Chris Elford CLA 2008-05-30 19:19:18 EDT
I would suggest that we discuss this with documentation as a difference between PI and TI and _not_ fix it.

the beauty of TI is changable BCI instead of performance limiting always executing enter/exit events.  However, this comes with certain limitations the most notable of which is that BCI does not take effect for currently executing instances of methods (i.e., only for subsequent method invocations).  As a result, information for methods already entered when instrumentation occurs is ragged and/or non-existant.  This behavior is as designed.

For the reasons discussed in the bug, you're not going to have a simple patch that emulates enterevents when exitevents occur you'll really only further confuse folks.   Thus, I suggest not doing this patch 

I don't think we're going to create an efficient solution that fully closes this PI/TI difference (without forcing TI to emulate PI which I do not think is a worthwhile endeavour).

thx,Chris
Comment 17 Harm Sluiman CLA 2008-05-31 19:07:13 EDT
Unfortunately, at this stage it is hard to debate any fix. We just need to avoid confusion. In March we had some options to explore, now it is a bit late to do much.

This is also not about being PI equivalent, and I thought several months ago we had gotten past that issue. If all the instrumentation is removed at detach then re-attach is the same as any delayed attach. 

The stack does not have to be monitored as with PI, it can be dumped when needed, if needed, at the time if instrumentation to get an understanding of the code that can not be instrumented. There is always going to be something on the stack with a delayed attach. I think we are fortunate that often those classes are in the exclude set anyway.

What is needed is a strategy to instrument things on the stack. For example as they are released from the stack lazily instrument them.

We also discussed recreating the stack entry events using the time stamp of the attach and appropriate ticket #s. The question is then just about how to catch exit events and how to instrument things freed from the stack.

Another possible action is to optionally not remove the instrumentation. Default to the current removal behavior but make it optional. You could even imagine more advanced filtering to allow for permanent instrumentation at some point in our future...

The re-attach is a specific use case that from what I have seen is most commonly used to learn about the code. The user typically triggers various usage patterns and wants to capture activity only during that time, all into one "trace" file. The key being is that the user understands the impact of merging these collections of data versus creating separate traces.
Comment 18 Asaf Yaffe CLA 2008-06-01 02:50:25 EDT
(In reply to comment #17)
> The stack does not have to be monitored as with PI, it can be dumped when
> needed, if needed, at the time if instrumentation to get an understanding of
> the code that can not be instrumented. There is always going to be something on
> the stack with a delayed attach. I think we are fortunate that often those
> classes are in the exclude set anyway.

There is no guarantee that a "stack dump" will match the set "method enter/leave" events that would be generated for the same code sequence. A typical example is inlined methods, which will *not* appear in the stack dump but will generate "enter/leave" events through BCI. Therefore, constructing a correct stack view at the point of attach that is consistent with future enter/leave events is nearly impossible, and the Agent will have to account for the discrepancies. The current implementation didn't go down this path, and therefore the Agent always switch to "aggregated" mode upon attach.

> 
> What is needed is a strategy to instrument things on the stack. For example as
> they are released from the stack lazily instrument them.
> 

There is no limitation on instrumenting methods currently on stack. It is just that this cause the JVM to create an *additional version* of the instrumented method. The copy on stack is not affected, but future copies will be affected. There is no need for lazy instrumentation.

> The re-attach is a specific use case that from what I have seen is most
> commonly used to learn about the code. The user typically triggers various
> usage patterns and wants to capture activity only during that time, all into
> one "trace" file. The key being is that the user understands the impact of
> merging these collections of data versus creating separate traces.

Currently, "reattach" creates a separate trace file. This is the main conceptual difference between attach/detach and pause/resume: the aforementioned scenario is more appropriate for pause/resume than for attach/detach, and is already supported with the existing TI agent.
Comment 19 Harm Sluiman CLA 2008-06-01 16:15:03 EDT
(In reply to comment #18)
I agree the stack dump may not be a replacement, and that is why I said "if needed" ;-) My point was that we don't need to be equal to PI and alternatives could be done. Stack dump was one, I mentioned before that the even loaders could handle this recovery as well, but we could not introduce such code in this release anymore. 
As I mentioned to Alex at one point the event loaders actually handled these conditions as error recovery but that support was removed in error at some point.

> 
> There is no limitation on instrumenting methods currently on stack. It is just
> that this cause the JVM to create an *additional version* of the instrumented
> method. The copy on stack is not affected, but future copies will be affected.
> There is no need for lazy instrumentation.
> 
> > The re-attach is a specific use case that from what I have seen is most
> > commonly used to learn about the code. The user typically triggers various
> > usage patterns and wants to capture activity only during that time, all into
> > one "trace" file. The key being is that the user understands the impact of
> > merging these collections of data versus creating separate traces.
> 
> Currently, "reattach" creates a separate trace file. This is the main
> conceptual difference between attach/detach and pause/resume: the
> aforementioned scenario is more appropriate for pause/resume than for
> attach/detach, and is already supported with the existing TI agent.
> 

So if a re-attach is going to a new trace file, there is no issue about instrumenting loaded classes, and all old instrumentation is removed at detach time, I have lost why there is a problem. This make sit sound like re-attach and attach should behave the same at the agent end, and there would not be any lost data.
I guess this is why I suggested a discussion back in March ;-) It seems I get easily confused.
Comment 20 Alexander N. Alexeev CLA 2008-06-03 08:24:48 EDT
> So if a re-attach is going to a new trace file, there is no issue about
> instrumenting loaded classes, and all old instrumentation is removed at detach
> time, I have lost why there is a problem.
Sorry, but could you clarify what problem do you mean?
Original statement of this bug was caused by incorrect interpretation of displayed information. The fact that method "main" appears in statistic can mislead user about certain point at call stack where attach happened and amount of date which profiler should provide, but anyway data is correct. Possibly it would be worth to remove all entities with zeros from views. But it isn't so urgent. 

So since we are not going to eliminate difference in behavior between PI and TI agent I would close this issue as WONTFIX.
Do you have any objections?

Thanks,
Alex.

Comment 21 Alexander N. Alexeev CLA 2008-06-04 05:03:50 EDT
closed since no objections
Comment 22 Harm Sluiman CLA 2008-06-06 08:50:48 EDT
(In reply to comment #21)
> closed since no objections
> 

Sorry for not being connect while traveling and being able to respond within 21 hours ;-)

I think the base premise in this case is that after the re-attach, time was spent in main and it is not reflected in the trace. regardless of the agent technology used, it seems reasonable to expect this data to be collected.

If Alan can confirm that this is the basic issue I suggested we not close this defect and consider it a restriction until fixed.
Comment 23 Alexander N. Alexeev CLA 2008-06-06 09:35:22 EDT
> Sorry for not being connect while traveling and being able to respond within 21
> hours ;-)
Harm, sorry, anyway you always can reopen it ;)

> I think the base premise in this case is that after the re-attach, time was
> spent in main and it is not reflected in the trace. regardless of the agent
> technology used, it seems reasonable to expect this data to be collected.
We can't receive any information about time spent in "main" because at the moment of attach execution is already in this method without instrumentation and it can't be introduced at this point. So there isn't any code which can generate leave event and record rest of time spent in "main". Among that I think partial time of method execution ("main" in our case) is meaningless, we event can't guess about proportion.
Comment 24 Harm Sluiman CLA 2008-06-06 13:36:15 EDT
I think we need to carry this problem as an open issue and resolve it later. We won't resolve it in 4.5.0 (or in a dialog via bugzilla ;-) ) but there are some serious use cases that are impacted and I believe some viable ways to address the issue without adopting the PI approach.
Comment 25 Alexander N. Alexeev CLA 2008-07-08 03:32:21 EDT
Harm, since we couldn't find consensus about this defect so long before, I think we should start discussion now to be in time for 4.5.1. :)

So, could you provide description of valuable scenarios? Base on that we can try to find appropriate solutions, or maybe you already have some. 

Thanks, 
Alex.
Comment 26 Eugene Chan CLA 2008-07-16 13:33:47 EDT
This is related to bug 194385.
Comment 27 Chengrui Deng CLA 2008-11-11 04:32:18 EST
The bug has been reproduced on Build 200810210100. The test result is the same with Comment#4 by Asaf. I will continue to work on the problem and report the latest progress.

Thanks,
Chengrui
Comment 28 Eugene Chan CLA 2008-12-09 17:14:26 EST
Cheungrui, reassigning to you for investigation.
Comment 29 Raheel Ashraf CLA 2008-12-31 10:50:35 EST
Cheungrui, any  updates on this?
Comment 30 Chengrui Deng CLA 2009-01-04 02:23:53 EST
(In reply to comment #29)
> Cheungrui, any  updates on this?

 After investigating this bug carefully and referring to discussions above, two possible solutions are listed as below:

1) Light weight byte code instrumentation at methods' entry and exit with the help of conditional branch for execution time analysis in enabled mode (i.e. implement in Java code, do not call JNI to check the condition). It¡¯s  similar to ¡°pause/resume¡±. However, this will affect original meaning of ¡°enabled mode¡±. Whether users apply attach operations or not, additional byte code at methods' entry and exit will be added.

2) Use JVMTI events MethodEnter and MethodExit to solve the problem. This will introduce big overhead and we should note that not all JVMs implement these two events. If this solution is used, we can discuss how to optimize and decrease the overhead. 

  Please give some suggestions and other better solutions are welcomed.

Thanks,
Chengrui
Comment 31 Asaf Yaffe CLA 2009-01-04 03:21:54 EST
(In reply to comment #30)
> 1) Light weight byte code instrumentation at methods' entry and exit with the
> help of conditional branch for execution time analysis in enabled mode (i.e.
> implement in Java code, do not call JNI to check the condition). It¡¯s  similar
> to ¡°pause/resume¡±. However, this will affect original meaning of ¡°enabled
> mode¡±. Whether users apply attach operations or not, additional byte code at
> methods' entry and exit will be added.

The attach/detach feature in the JVMTI Profiler was designed to support a true attach scenario as defined by the Java 6 JVMTI specification (see http://java.sun.com/javase/6/docs/platform/jvmti/jvmti.html#onattach). In this scenario, the Agent may be loaded when the application is already running. There's no JVMTI Agent in the JVM process up to this point. Any solution based on the assumption that you can load a "lightweight BCI" agent on JVM start-up cannot support true attach/detach scenarios.

Additionally, even a "lightweight BCI agent" that only maintains a stack-context will have between 1.5X to 4X overhead, depending on whether JNI callbacks are being used or not.

I therefore restate my original objection on solving this issue by implementing a lightweight BCI mechanism of some sort (but see some other ideas below).

> 
> 2) Use JVMTI events MethodEnter and MethodExit to solve the problem. This will
> introduce big overhead and we should note that not all JVMs implement these two
> events. If this solution is used, we can discuss how to optimize and decrease
> the overhead. 
> 

As far as I know, NONE of the JVMs TPTP supports (from IBM, Sun and BEA) support the MethodEnter/Leave capability. This is not a viable option.

IMHO, the correct way to resolve this issue is to have the Agent use the JVMTI stack dump mechanism upon attach to create an execution context (i.e. shadow-stack) for each thread. As previously discussed, this solution must take into account and resolve any discrepancies between the stack dump and the BCI-generated enter/leave notifications (see comment 18). The resolution can be at the agent side, data loader side or both. 

Thanks,
Asaf
Comment 32 Harm Sluiman CLA 2009-01-04 20:18:54 EST
(In reply to comment #31)
I agree with Asaf. As I have noted earlier there are two parts to the solution.
A stack dump/parse at attach time to trigger the sending of appropriate def/obj/entry/exit events etc., and a clean up of the event loaders to handle unmatched events with "unknown". Also s Asaf notes, this was done once already in the pi agent, and there is likely code to be reused.

There can be a discussion of what values to use for time stamps etc. but this is a reasonable and simple approach to the problem, and actually the only workable one proposed to-date.

An additional variation to the solution is to cover "re-attach" and whne some collection has already taken place, meaning class and method definition events have already be sent and duplicates should be avoided. This implies the ti agent keeps a list of the def events already sent. This is more complex, but can not be addressed until basic attach works.
Comment 33 Asaf Yaffe CLA 2009-01-05 01:11:46 EST
(In reply to comment #32)
> An additional variation to the solution is to cover "re-attach" and whne some
> collection has already taken place, meaning class and method definition events
> have already be sent and duplicates should be avoided. This implies the ti
> agent keeps a list of the def events already sent. This is more complex, but
> can not be addressed until basic attach works.

As far as I know, re-attach opens a new trace file, and therefore all class and method "def" events must be sent (again) for each class and method encountered. Optimizing this is a good idea but probably requires changes at the data layer to enable the reuse of this information between traces.
Comment 34 Harm Sluiman CLA 2009-01-05 09:04:58 EST
(In reply to comment #33)

> As far as I know, re-attach opens a new trace file, and therefore all class and
> method "def" events must be sent (again) for each class and method encountered.
> Optimizing this is a good idea but probably requires changes at the data layer
> to enable the reuse of this information between traces.
> 

In the past there was a pause mode, and in attach (re-attach) the user decided if they wanted a new trace or to extend an existing one. I believe this has been disabled in some scenarios because it obviously will not work with TI right now since attach is not fully working.
 
This style of data collection was also useful when the user could turn collection on and off with triggers in the application being monitored.

However to be clear these are improvements we can consider after we get the basic attach scenario working properly which is a very important use case that has been broken for some time now.
Comment 35 Raheel Ashraf CLA 2009-01-05 10:29:26 EST
Since this is targeted for 4.5.2 and the development phase is about to finish, are we going to target it for the shutdown phase? 
Comment 36 jkubasta CLA 2009-01-08 09:12:36 EST
Not containable in 4.5.2
Comment 37 Harm Sluiman CLA 2009-01-08 09:22:47 EST
As discussed on the PMC call this fix will require a lot of corner case testing so it should be started asap and in the 4.5.3 build asap.

I will try to help with the loader default and error handling fixes. Perhaps this defect should be split into the agent work and the model loader work. Eugene is likely the optimal owner of the model work.
Comment 38 Kathy Chan CLA 2009-03-18 14:57:10 EDT
We would like to target this for TPTP 4.5.2.1 Milestone 2.  

Cheungrui, could you please take a look at this as a priority item?  Thanks!
Comment 39 Chengrui Deng CLA 2009-03-19 05:36:33 EDT
(In reply to comment #38)
> We would like to target this for TPTP 4.5.2.1 Milestone 2.  
> 
> Cheungrui, could you please take a look at this as a priority item?  Thanks!
> 

Ok. I am working on it. Thanks.

Comment 40 Kathy Chan CLA 2009-04-03 14:59:57 EDT
Please note that this defect fix is requested by an adopter product.
Comment 41 Kathy Chan CLA 2009-04-06 16:50:16 EDT
Hi Chengrui,

Have you been able to start lookiing into this problem?  Do you still need anything from us to make progress on this?
Comment 42 Chengrui Deng CLA 2009-04-06 20:41:29 EDT
(In reply to comment #41)
> Hi Chengrui,
> 
> Have you been able to start lookiing into this problem?  Do you still need
> anything from us to make progress on this?
> 
Hi, Kathy,
  I am working on stack dump solution coding part now. The discrepancy problem will be considered later. As Eugene said in comment #26, this bug is also related with bug 194385. I will report the latest progress when initial testing is finished.
  Thanks,
  Chengrui
Comment 43 Chengrui Deng CLA 2009-04-15 11:15:35 EDT
Dear members,
  I am working on shadow stack solution implementation referring to discussions on Bugzilla. Below is a shadow stack solution design brief descriptions. If you have any suggestions, please give me your feedback. When initial coding and testing tasks are finished, the patch will be submitted. 

1. Two typical scenarios
(1.1) Attach happens for the first time in enabled mode
  In this case, methods which already entered in runtime stack are not instrumented. When attach happens for the first time, we need to generate MethodEnter and MethodLeave events for the runtime-stack methods. 

(1.2) Reattach after detach in enabled mode
  The methods in runtime stack when reattach happens can be divided into two categories: Not instrumented and instrumented before.
  For not instrumented methods, it is similar to (1.1), we need to generate MethodEnter and MethodLeave events for them.
  For instrumented methods in runtime stack, we need to handle additional MethodLeave events reported by instrumented methods themselves.

2. Shadow stack solution design description
(2.1) Data structure
  Two stacks are used for each thread. One stack is a shadow stack to trace methods entered before attach (Stack_1), the other is original methods call record in profiler for tracing instrumented methods after attach (Stack_2). (For description simplicity, here just describes how to handle in one thread.)

(2.2) Algorithm

  (a) Data:
    Stack_1 - Shadow stack for tracing methods entered before attach.
    Stack_2 - Stack for tracing instrumented methods after attach.

  (b) Initial condition: 
    Stack_1 and Stack_2 are empty.
  
  (c) Modified MethodEnter and MethodLeave events handlers for attach in enabled mode:

      (c.1) MethodEnter:
        If (Stack_2 is empty)
        {
          Get latest stack snapshot into temp_Stack_1 from JVM stack trace;

          Compare Stack_1 and temp_Stack_1:
             If (Methods exist in temp_Stack_1 but not exist in Stack_1)
             {
               Generate MethodEnter events for such kind of each method according to methods call sequence;
             }
             
             If (Methods exist in Stack_1 but not exist in temp_Stack_1)
             {
               Generate MethodLeave events for such kind of each method according to methods exit sequence;
             }

          Stack_1 = temp_Stack_1;
        }

        Push new instrumented method into Stack_2, its MethodEnter event will be generated automatically by after attach instrumented code.

      (c.2) MethodLeave:
         if (Stack_2 is empty)
         {
           Discard MethodLeave notification, because it is generated by instrumeted code before attach for in-runtime-stack;
           Return;
         }

         Pop method from Stack_2, its MethodLeave event will be generated automatically by after attach instrumented code.

       (c.3) When VM shutdown or detach happens:
          Pop up and generate MethodLeave events for methods left in Stack_1.
  
(2.3) Examples
      ----------------------------
      MethodA()
      {
        MethodB();
        MethodD();
      }

      MethodB()
      {
        // Attach happen here
        MethodC();
      }
      ----------------------------

      Algorithm Execution Process (AEP):
      AEP1: Attach happen. MethodA and MethodB are not instrumented and in runtime-stack. Stack_1 and Stack_2 are empty stacks.
        

      AEP2: Instrumented MethodC is called, MethodEnter handle:
          Stack_1: Empty; temp_Stack_1(From bottom to top): MethodA->MethodB
          Generate MethodEnter events for MethodA and MethodB: 
                    =MethodA_Enter
                       ==MethodB_Enter
          Stack_1 = temp_Stack_1: MethodA->MethodB;

          Push MethodC into Stack_2: MethodC, generate MethodC_Enter event: 
                       ===MethodC_Enter

      AEP3: Instrumented MethodC exit, MethodLeave handle:
          Stack_2: MethodC ,not empty, generate MethodC_Leave event:
                       ===MethodC_Leave
          Pop MethodC, Stack_2 becomes empty.

      AEP4: Not instrumented MethodB exit, no event generate now;
      
      AEP5: Instrumented MethodD is called, Method Enter handle:
          Stack_1: MethodA->MethodB, temp_Stack_1: MethodA
          Generate MethodLeave event for MethodB:
                       ==MethodB_Leave
          Stack1 = temp_Stack_1: MethodA

          Push MethodD into Stack_2: MethodD, generate MethodD_Enter event:
                       ==MethodD_Enter

      AEP6: Instrumented MethodD exit, MethodLeave handle, similar to AEP3:
                       ==MethodD_Leave

      AEP7: When detach or VM shutdown, pop up Stack_1 left methods and generate MethodLeave events for them:
                    =MethodALeave               
           Now Stack_1 and Stack_2 become empty.

      
     Method enter and leave events above summary:
         =MethodA_Enter
            ==MethodB_Enter
               ===MethodC_Enter
               ===MethodC_Leave
            ==MethodB_Leave
            ==MethodD_Enter
            ==MethodD_Leave
         =MethodALeave  

3. Shadow stack solution limitation
  The call graph maybe not the same with actual call process because of inlined methods etc.

  Thanks,
  Chengrui
Comment 44 Eugene Chan CLA 2009-04-15 15:06:56 EDT
Cheungrui,

Thanks first for your proposal. I have the following comment:
1. In (1.1), Def and Obj events should also be sent. Similar applies to (1.2).
2. In (c.2) MethodLeave, method not instrumented should also send out event at this point, Because incorrect time will be sent if there is code execution in MethodA between MethodB and MethodD due to the delay in MethodB_Leave event.

    AEP4: Not instrumented MethodB exit, no event generate now;
    AEP5: Instrumented MethodD is called, Method Enter handle:
          Stack_1: MethodA->MethodB, temp_Stack_1: MethodA
          Generate MethodLeave event for MethodB:
                       ==MethodB_Leave
          Stack1 = temp_Stack_1: MethodA

In your example, MethodB_Leave is not sent until MethodD is called. But there could be code execution between call of MethodB and MethodD.

I think we are in the right direction but have to fine-tune the algorithm.

Thanks,
Eugene 

Comment 45 Chengrui Deng CLA 2009-04-15 20:46:21 EDT
(In reply to comment #44)
Hi, Eugene,
  Thanks for your suggestion. Comments for discussion are as below:

> 1. In (1.1), Def and Obj events should also be sent. Similar applies to (1.2).
I will consider this point in the solution. Thanks.

> 2. In (c.2) MethodLeave, method not instrumented should also send out event at
> this point, Because incorrect time will be sent if there is code execution in
> MethodA between MethodB and MethodD due to the delay in MethodB_Leave event.
> 
>     AEP4: Not instrumented MethodB exit, no event generate now;
>     AEP5: Instrumented MethodD is called, Method Enter handle:
>           Stack_1: MethodA->MethodB, temp_Stack_1: MethodA
>           Generate MethodLeave event for MethodB:
>                        ==MethodB_Leave
>           Stack1 = temp_Stack_1: MethodA
> 
> In your example, MethodB_Leave is not sent until MethodD is called. But there
> could be code execution between call of MethodB and MethodD.
  For not instrumented methods already in stack, we can not get the exact exit time information. But we should assure the Method Call sequence correctly.
  If code execution happens between MethodB and MethodD, it also can be handeled by the algorithm.
----------------
  MethodA()
  {
     MethodB();
     ...
     MethodX();
     MethodD();
  }
----------------
  This case is similar to original case. MethodB_Leave will be sent until MethodX is called. We can not sure when MethodB exits exactly, but we can sure MethodB_Leave event happens before enter follow up MethodX.

Thanks,
Chengrui
Comment 46 Yunan, He CLA 2009-04-16 00:15:49 EDT
(In reply to comment #44)
> Cheungrui,
> Thanks first for your proposal. I have the following comment:
> 1. In (1.1), Def and Obj events should also be sent. Similar applies to (1.2).

Thanks. Yes, MethodDef and ClassDef evens should be sent out. In current codebase, these two events will be sent in MethodEnterHandler. So no extra work for Def events. 
Eugene, what's Obj events?

> 2. In (c.2) MethodLeave, method not instrumented should also send out event at
> this point, Because incorrect time will be sent if there is code execution in
> MethodA between MethodB and MethodD due to the delay in MethodB_Leave event.
>     AEP4: Not instrumented MethodB exit, no event generate now;
>     AEP5: Instrumented MethodD is called, Method Enter handle:
>           Stack_1: MethodA->MethodB, temp_Stack_1: MethodA
>           Generate MethodLeave event for MethodB:
>                        ==MethodB_Leave
>           Stack1 = temp_Stack_1: MethodA
> In your example, MethodB_Leave is not sent until MethodD is called. But there
> could be code execution between call of MethodB and MethodD.

If the method call between B and D is instrumented method, It should be no problem to report the event because in method enter event we will check the stack. 
If the method call between B and D is non-instrumented method (for example: function call in a optimized loop), yes, we may delay send out the event and the time is not accurate. We can not catch the funtion between B and D, you know, we do not have chance to break the execution to get the function name. 


> I think we are in the right direction but have to fine-tune the algorithm.

Yes, there're also some corner cases we need to considerate.

> Thanks,
> Eugene 

Comment 47 Chengrui Deng CLA 2009-04-16 00:20:38 EDT
(Shadow Stack Solution Algorithm Description Update)

2. Shadow stack solution design description
(2.1) Data structure
  Two stacks are used for each thread. One stack is a shadow stack to trace methods entered before attach (Stack_1), the other is original methods call record stack in profiler for tracing instrumented methods after attach (Stack_2). (For simplicity, here just describes how to handle for one thread.)

(2.2) Algorithm

  (a) Data:
    Stack_1 - Shadow stack for tracing methods entered before attach.
    Stack_2 - Stack for tracing instrumented methods after attach.

  (b) Initial condition: 
    Stack_1 is snapshot of VM stack when attach happens and send MethodEnter events for each methods. Stack_2 is empty.
    
  
  (c) Modified MethodEnter and MethodLeave events handlers for attach in enabled mode:

      (c.1) MethodEnter:
        If (Stack_2 is empty)
        {
          Get latest stack snapshot into temp_Stack_1 from JVM stack trace;

          bool isShadowStackChanged = false;
          
          Compare Stack_1 and temp_Stack_1:
          {
             If (Methods exist in temp_Stack_1 but not exist in Stack_1)
             {
               isShadowStackChanged = true;
               Generate MethodEnter events for such kind of each method according to methods call sequence;
             }
             
             If (Methods exist in Stack_1 but not exist in temp_Stack_1)
             {
               isShadowStackChanged = true;
               Generate MethodLeave events for such kind of each method according to methods exit sequence;
             }
          }

          If (isShadowStackChanged) { Stack_1 = temp_Stack_1; }
        }

        Push new instrumented method into Stack_2, its MethodEnter event will be generated automatically by after attach instrumented code.

      (c.2) MethodLeave:
         if (Stack_2 is empty)
         {
           Discard MethodLeave notification, because it is generated by instrumeted code before attach for in-runtime-stack;
           Return;
         }

         Pop method from Stack_2, its MethodLeave event will be generated automatically by after attach instrumented code.

       (c.3) When VM shutdown or detach happens or thread exit:
          Pop up and generate MethodLeave events for methods left in Stack_1.
  
(2.3) Examples
      ----------------------------
      MethodA()
      {
        MethodB();
        MethodD();
      }

      MethodB()
      {
        // Attach happen here
        MethodC();
      }
      ----------------------------

      Algorithm Execution Process (AEP):
      AEP1: Attach happen. MethodA and MethodB are not instrumented and in runtime-stack. Snapshot VM stack trace into Stack_1.
          Stack_1(From bottom to top): MethodA->MethodB. 
          Stack_2 is empty.
          Generate Method_Enter events (MethodDef, ClassDef will be generated in current TPTP code automatically when MethodEnter is called):
                    =MethodA_Enter
                       ==MethodB_Enter            
        

      AEP2: Instrumented MethodC is called, MethodEnter handle:
          Stack_1: MethodA->MethodB; temp_Stack_1: MethodA->MethodB

          Push MethodC into Stack_2: MethodC, generate MethodC_Enter event: 
                       ===MethodC_Enter

      AEP3: Instrumented MethodC exit, MethodLeave handle:
          Stack_2: MethodC ,not empty, generate MethodC_Leave event:
                       ===MethodC_Leave
          Pop MethodC, Stack_2 becomes empty.

      AEP4: Not instrumented MethodB exit, no event generate now;
      
      AEP5: Instrumented MethodD is called, Method Enter handle:
          Stack_1: MethodA->MethodB, temp_Stack_1: MethodA
          Generate MethodLeave event for MethodB:
                       ==MethodB_Leave
          Stack1 = temp_Stack_1: MethodA

          Push MethodD into Stack_2: MethodD, generate MethodD_Enter event:
                       ==MethodD_Enter

      AEP6: Instrumented MethodD exit, MethodLeave handle, similar to AEP3:
                       ==MethodD_Leave

      AEP7: When detach or VM shutdown, pop up Stack_1 left methods and generate MethodLeave events for them:
                    =MethodALeave               
           Now Stack_1 and Stack_2 become empty.

      
     Method enter and leave events above summary:
         =MethodA_Enter
            ==MethodB_Enter
               ===MethodC_Enter
               ===MethodC_Leave
            ==MethodB_Leave
            ==MethodD_Enter
            ==MethodD_Leave
         =MethodALeave
Comment 48 Eugene Chan CLA 2009-04-16 00:24:53 EDT
(In reply to comment #46)

> Eugene, what's Obj events?
I meant Method and ClassDef. sorry for the confusion.
Comment 49 Asaf Yaffe CLA 2009-04-20 09:44:03 EDT
(In reply to comment #47)

Getting a stack dump from JVMTI in the context of "Method Enter" may be too costly. In some cases, Stack_2 may become empty many times (depending on the exact point of attach) and this will cause large number of "stack dumps" which may be detrimental to performance.

Here's another direction worth checking (not a complete solution, there are probably some corner cases as well, needs prototyping):

Upon attach, suspend all threads and for each thread:
1. Get a stack dump
2. Report "Method Enter" (and all related "def" events) for all functions currently on stack.
3. Use the JVMTI "NotifyFramePop" function to mark all active frames. This will cause JVMTI to generate a "FramePop" event for these functions when they go out of stack, and these events can be easily converted to "Method Leave" events.

The main difference with this approach is that it only does a stack dump once (upon attach). If the overhead of "NotifyFramePop" is not very high, it can work pretty well.

BTW - I verified that all supported JVMs support the "can_generate_frame_pop_events" JVNTI capability.

My 2 cents....

Asaf
Comment 50 Chengrui Deng CLA 2009-04-21 21:02:47 EDT
(In reply to comment #49)
> (In reply to comment #47)
> 
> Getting a stack dump from JVMTI in the context of "Method Enter" may be too
> costly. In some cases, Stack_2 may become empty many times (depending on the
> exact point of attach) and this will cause large number of "stack dumps" which
> may be detrimental to performance.
> 
> Here's another direction worth checking (not a complete solution, there are
> probably some corner cases as well, needs prototyping):
> 
> Upon attach, suspend all threads and for each thread:
> 1. Get a stack dump
> 2. Report "Method Enter" (and all related "def" events) for all functions
> currently on stack.
> 3. Use the JVMTI "NotifyFramePop" function to mark all active frames. This will
> cause JVMTI to generate a "FramePop" event for these functions when they go out
> of stack, and these events can be easily converted to "Method Leave" events.
> 
> The main difference with this approach is that it only does a stack dump once
> (upon attach). If the overhead of "NotifyFramePop" is not very high, it can
> work pretty well.
> 
> BTW - I verified that all supported JVMs support the
> "can_generate_frame_pop_events" JVNTI capability.
> 
> My 2 cents....
> 
> Asaf
> 
  If "FramePop" is supported by JVMs, "NotifyFramePop" will decrease stack dump cost. I will try it. Asaf, thanks for your information.
  
  Chengrui


Comment 51 Chengrui Deng CLA 2009-04-30 03:57:30 EDT
Created attachment 133879 [details]
Shadow Stack Solution Implementation patch. This patch is used for TPTP 4.5.2.1 

  This patch is the shadow stack solution implementation. Three modules are modified. They are JPIAgent, CGProf and BaseProf. The design and implementation documents will be submitted later.

  NotifyFramePop is not applied, because it will need modify not only Profilers code, MPI interface and Martini framework code will need be modified too. If the current implementation have efficiency problem, we can improve it later.

  Dear committers, please help to review the patch, I will submit profiler binary files on WindowsXP for testing usage soon.

  Thanks,
  Chengrui
Comment 52 Chengrui Deng CLA 2009-04-30 04:02:35 EDT
Created attachment 133881 [details]
Profilers binary files for TPTP 4.5.2.1 testing usage

Please test StartStop.java with attached profilers binary files.
Comment 53 Eugene Chan CLA 2009-04-30 11:05:40 EDT
(In reply to comment #52)
> Created an attachment (id=133881) [details]
> Profilers binary files for TPTP 4.5.2.1 testing usage
> 
> Please test StartStop.java with attached profilers binary files. 
> 

Hi Chengrui,

I tried the patch in 4521 build with IBM JVM 1.5SR9.
My application (PerformanceExample) crashed when I performed the reattach action. There was no crash report created.
It seems to work for some other application (eg. StartStop), but I see additional profiling data in the collection post reattach. ie. data under packages java.io*,sun.nio.cs*, or.geclipse.tptp.martini*

Would you please take a look a the problem? Thanks.
Comment 54 Chengrui Deng CLA 2009-04-30 12:18:16 EDT
(In reply to comment #53)
> (In reply to comment #52)
> > Created an attachment (id=133881) [details] [details]
> > Profilers binary files for TPTP 4.5.2.1 testing usage
> > 
> > Please test StartStop.java with attached profilers binary files. 
> > 
> 
> Hi Chengrui,
> 
> I tried the patch in 4521 build with IBM JVM 1.5SR9.
> My application (PerformanceExample) crashed when I performed the reattach
> action. There was no crash report created.
> It seems to work for some other application (eg. StartStop), but I see
> additional profiling data in the collection post reattach. ie. data under
> packages java.io*,sun.nio.cs*, or.geclipse.tptp.martini*
> 
> Would you please take a look a the problem? Thanks.
> 

  I will investigate crash problem. It is also reproduced on my machine with Sun's JVM. It should be related with shadow stack MethodEnter track process.

  Additional profiling data problem will be handled too. If any update, I will report it. Eugene, Thanks for your help.

  Regards,
  Chengrui
Comment 55 Chengrui Deng CLA 2009-05-04 06:14:52 EDT
 (In reply to comment #53)
> (In reply to comment #52)
> > Created an attachment (id=133881) [details] [details]
> > Profilers binary files for TPTP 4.5.2.1 testing usage
> > 
> > Please test StartStop.java with attached profilers binary files. 
> > 
> 
> Hi Chengrui,
> 
> I tried the patch in 4521 build with IBM JVM 1.5SR9.
> My application (PerformanceExample) crashed when I performed the reattach
> action. There was no crash report created.

  The failure on PerformanceExample is caused by:
  1) Race condition when track MethodEnter;
  2) Call on original function CProfEnv::GetStackTrace(...). In GetStackTrace,
SMethodInfo* methodInfo = GetMethodData(stackTrace->pStackEntries[i].methodId); Then the pointer methodInfo is used but not check whether it is a null pointer. Because GetMethodData can return a valid pointer or not.

  The crash on PerformanceExample has been fixed now. 

> It seems to work for some other application (eg. StartStop), but I see
> additional profiling data in the collection post reattach. ie. data under
> packages java.io*,sun.nio.cs*, or.geclipse.tptp.martini*
> 
> Would you please take a look a the problem? Thanks.
> 
  It seems some classes are not instrumented in profiling process (CCGAdaptor::CanInstrumentClass). But when we get stack trace, methods belong to these classes are in the runtime stack. If we should not report these methods, similar filter rules as in CCGAdaptor::CanInstrumentClass will be needed in track process. I am working on this problem now. The profilers binary for testing will be provided soon.
  
  Thanks,
  Chengrui

Comment 56 Chengrui Deng CLA 2009-05-14 04:54:51 EDT
Created attachment 135756 [details]
Shadow stack solution patch to solve reattach and attach problem with CGProf

This patch is used to solve re-attach problem caused by JVMTI limitation in TPTP 4.5.2.1. The corresponding patch for TPTP 4.6.0 has been checked into CVS (Please refer to comment 17 of bug 270767). Additional patch code to solve HeapProf crash problem (bug 275321) is also included comparing with patch for TPTP 4.6.0.

  Thanks,
  Chengrui
Comment 57 Chengrui Deng CLA 2009-05-20 00:47:21 EDT
Created attachment 136422 [details]
Patch - 4521 - Update Shadow stack solution to solve reattach and attach problem with CGProf

  Since HeapProf crash problem has been fixed in a separate bug 276921 for TPTP 4.5.2.1, I modified the patch - only including shadow stack solution related code in the patch.
Comment 58 Chengrui Deng CLA 2009-05-20 05:43:24 EDT
The shadow stack design and implementation document has been submitted to Bugzilla in bug 270767.
Comment 59 Chengrui Deng CLA 2009-05-22 02:32:25 EDT
Requesting approval to include the following defect in TPTP 4.5.2.1 branch:

After re-attach, no more data is collected for methods already entered. (https://bugs.eclipse.org/bugs/show_bug.cgi?id=194081)

1. Explain why you believe this is a stop-ship defect. How does the defect manifest itself, and how will users of TPTP / consuming products be affected if the defect is not fixed? 
  After re-attach, no more data is collected for methods already entered for CGProf with execDetals option in enabled mode. This is caused by JVMTI instrumentation profiling method limitation. It will disable attach to CGProf with execDetails option in enabled mode.

2. Is there a work-around? If so, why do you believe the work-around is insufficient? 
  No.

3. Is this a regression or API breakage? Explain. 
  No.

4. Does this require new API? 
  Yes.

5. Who performed the code review? 
  Jonathan.

6. Is there a test case attached to the bugzilla record? 
  No. Since the test cases path in CVS has been recorded. We can get these test cases anytime.

7. What is the nature of the fix? What is the scope of the fix? What is the risk associated with this fix? 
  This fix is a complex solution. It uses stack dump mechanism and a shadow stack algorithm is designed. About 500 lines code has been added. The fix just affects CGProf profiling with execDetals in enabled mode. Risk is medium.

8. Is this fix related to any standards that TPTP adheres to? If so, who has validated that the fix continues to adhere to the standard? 
  No.
Comment 60 Kathy Chan CLA 2009-05-22 10:25:28 EDT
Since we are running out of time for TPTP 4.5.2.1 Milestone 2.  I'm retargetting this to milestone 3.
Comment 61 Chengrui Deng CLA 2009-05-27 03:58:41 EDT
Comment on attachment 136422 [details]
Patch - 4521 - Update Shadow stack solution to solve reattach and attach problem with CGProf

Since a regression is found in TPTP 4.6.0 related this submitted patch, a new updated patch will be needed for bug 194081. Please refer to comment 6 of bug 277431 about details. 

  The updated patch for TPTP 4.5.2.1 milestone 3 will be submitted later.
  Thanks,
  Chengrui
Comment 62 Kathy Chan CLA 2009-06-19 21:08:56 EDT
Please submit an updated patch to back port this fix from TPTP 4.6 to TPTP 4.5.2.1.  We are opening up the 4.5.2.1 stream now for check-ins.
Comment 63 Kathy Chan CLA 2009-06-19 21:10:25 EDT
Please submit an updated patch to back port this fix from TPTP 4.6 to TPTP 4.5.2.1.  We are opening up the 4.5.2.1 stream now for check-ins.  Please also fill in the PMC approval template so that I could bring it forward for PMC approval.  Thanks!
Comment 64 Chengrui Deng CLA 2009-06-21 23:55:44 EDT
Created attachment 139720 [details]
Reattach problem patch for TPTP 4.5.2.1

This patch is a counterpart for reattach problem in TPTP 4.5.2.1. It has been fixed in TPTP 4.6.0.

This submitted patch actually merge the patches code for bug 270767, bug 277431 and bug 278437 in TPTP 4.6.0. Dear committers, please help to review the patch code.

Thanks,
Chengrui
Comment 65 Chengrui Deng CLA 2009-06-21 23:57:50 EDT
PMC approval template will be filled after the submitted patch is reviewed.
Comment 66 Kathy Chan CLA 2009-06-22 16:09:59 EDT
Jonathan,

Could you please take a look at the combined patch for TPTP 4.5.2.1 as well?
Comment 67 Jonathan West CLA 2009-06-24 11:39:38 EDT
Patch looks good.
Comment 68 Kathy Chan CLA 2009-06-24 13:06:47 EDT
In the interest of time, Jonathan, could you please post a PMC approval template to this defect so that we could get PMC approval and check in the code today?
Comment 69 Jonathan West CLA 2009-06-24 13:41:04 EDT
On behalf of Chengrui, I would like to request that the following defect be considered for PMC approval for 4.5.2.1

1.  Explain why you believe this is a stop-ship defect. How does the defect
manifest itself, and how will users of TPTP / consuming products be affected if the defect is not fixed?
After re-attach, no more data is collected for methods already entered. This is an important profiler requirement.

2. Is there a work-around? If so, why do you believe the work-around is
insufficient?
No.

3. Is this a regression or API breakage? Explain.
Regression from behaviour of previous profiler.

4. Does this require new API?
No.

5. Who performed the code review?
Myself, for Chengrui

6. Is there a test case attached to the bugzilla record?
Part of standard JVMTI tests

7. What's the nature of the fix?  What is the risk associated with this fix? 
" This fix is a complex solution. It uses stack dump mechanism and a shadow
stack algorithm is designed. About 500 lines code has been added. The fix just
affects CGProf profiling with execDetals in enabled mode. Risk is medium.
". Fix has already been tested in 4.6.0.

8. Is this fix related to any standards that TPTP adheres to? If so, who has
validated that the fix continues to adhere to the standard?
No.
Comment 70 Chris Elford CLA 2009-06-24 13:52:09 EDT
We have known for a while that IBM would want this in 4.5.2.1.  If we are happy with the solution in 4.6.0, I see no problems with 4.5.2.1 backpatch.
Comment 71 Kathy Chan CLA 2009-06-24 15:13:26 EDT
Putting the list of PMC member in the PMC_approved flag so that they are being notified of this PMC approval request for TPTP 4.5.2.1. Note that Chris has already provided his PMC approval earlier.
Comment 72 Jonathan West CLA 2009-06-24 15:55:52 EDT
Patch checked into 4521 w/ PMC approval.
Comment 73 Eugene Chan CLA 2009-06-25 15:59:49 EDT
Using tptp.sdk-TPTP-4.5.2.1-200906241706 build,
the application crash after re-attach.
Comment 74 Kathy Chan CLA 2009-06-26 14:11:43 EDT
Chengrui and Yunan,

Please take a look to see why the patch in the TPTP-4.5.2.1-200906241706 build did not fix the problem it was intended to fix.
Comment 75 Chengrui Deng CLA 2009-06-27 02:45:06 EDT
(In reply to comment #73)
> Using tptp.sdk-TPTP-4.5.2.1-200906241706 build,
> the application crash after re-attach.
> 

Hi, Eugene,
I tested tptp.sdk-TPTP-4.5.2.1-200906241706 build with test cases StartStop, PerformanceExample, InOut, StatisticsTest and ArgumentsTest. There is no crash happen on my machine. 

Please try again. If the problem is still there, please tell me which test case crash in your testing?

Thx,
Chengrui
  
Comment 76 Eugene Chan CLA 2009-06-29 11:45:01 EDT
The problem is reproducible with PerformanceExample on the same build (tptp.sdk-TPTP-4.5.2.1-200906241706.zip) using IBM JRE 1.5 (There is an IBM JRE shipped in this IBM Download Manager product that you can use for debugging > http://www-01.ibm.com/support/docview.wss?rs=3352&uid=swg24021016). 

I do not see the problem with StartStop and I do not have StatisticsTest and ArgumentsTest for further testing. The problem with PerformanceExample does not happen with Sun JRE.

Attaching screen capture. 
Comment 77 Eugene Chan CLA 2009-06-29 11:46:00 EDT
Created attachment 140395 [details]
screen capture of failure
Comment 78 Eugene Chan CLA 2009-06-29 11:46:20 EDT
Created attachment 140396 [details]
screen capture of failure
Comment 79 Eugene Chan CLA 2009-06-29 11:46:46 EDT
Created attachment 140397 [details]
Java crash dialog detail
Comment 80 Kathy Chan CLA 2009-06-29 12:53:59 EDT
Eugene, could you please see if similar problem exist in the TPTP 4.6 driver?
Comment 81 Eugene Chan CLA 2009-06-29 13:43:04 EDT
(In reply to comment #80)
> Eugene, could you please see if similar problem exist in the TPTP 4.6 driver?
> 

Problem is reproducible in TPTP 4.6 driver.
Comment 82 Chengrui Deng CLA 2009-06-30 07:19:52 EDT
I have installed IBM JVM. It seems the problem is caused by difference between IBM's and SUN's JVM at first glance. I will do some investigation.

Thanks,
Chengrui
Comment 83 Chengrui Deng CLA 2009-07-06 09:08:04 EDT
It is found the crash with PerformanceExample on IBM JVM caused by null pointer access in a self-defined list data structure in Martini profiler part(CThreadSet). 

Besides this problem, the hard-coded filter mechanism for CGProf reattach tracker can be replaced by external module (i.e. JPIAgent) filter. I am preparing the patch code. After finish coding and testing, the incremental patch will be submitted soon.

Thanks,
Chengrui
Comment 84 Kathy Chan CLA 2009-07-06 10:43:18 EDT
Resolving this bug as the patch for fixing the Sun JRE problem has been checked into TPTP-4_5_2_1.  Bug 282531 has been opened to track the remaining problem specific to IBM JRE.
Comment 85 Kathy Chan CLA 2009-10-16 16:47:26 EDT
Verified in consuming product.