| Summary: | after re-attach, no more data is collected for methods already entered | ||
|---|---|---|---|
| Product: | z_Archived | Reporter: | Alan Haggarty <haggarty> |
| Component: | TPTP | Assignee: | Chengrui Deng <chengrui.deng> |
| Status: | CLOSED FIXED | QA Contact: | Kathy Chan <kathy> |
| Severity: | critical | ||
| Priority: | P1 | CC: | analexee, asaf.yaffe, chengrui.deng, chris.l.elford, ewchan, jgwest, kathy, paulslau, sluiman, yunan.he |
| Version: | unspecified | Keywords: | plan |
| Target Milestone: | --- | Flags: | kathy:
pmc_approved?
(oec) kathy: pmc_approved+ kathy: pmc_approved? (sluiman) ernest: pmc_approved+ kathy: pmc_approved+ paulslau: pmc_approved+ ewchan: pmc_approved+ asaf.yaffe: review+ jgwest: review+ |
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | Milestone3 adopter | ||
| Attachments: | |||
|
Description
Alan Haggarty
Alan, I cannot reproduce this with the tptp.runtime-TPTP-4.4.0-200706140100C driver. I believe this problem is a manifestation of another bug which was resolved. Please check again and close if you cannot reproduce. Thanks, Asaf WOrks for me as well. I just retried on XP and get the same behavior. I am pretty sure I was using C at the time of logging as well. In this case I am using the GA download: tptp.runtime-TPTP-4.4.0.zip and the IAC with a new workspace and jre 1.5. When you say you cannot reproduce - what behavior/results do you get? (In reply to comment #3) The behavior depends on the initial configuration used when launching the CPU Profiler: If the profiler is launched in "Show execution flow graphical details" mode, the behavior described in this bug is fully reproducible with the GA build. If the profiler is launched in "Show execution statistics" (aggregated) mode, the behavior is different: after re-attach (step 6), execution statistics for method "main" are correctly updated to reflect the total execution time of the application. The reason for this is probably related to the fact that when you re-attach to the CPU Profiler, it internally switches to "aggregated" mode regardless of the initial configuration (a built-in limitation, described in comment 19 of bug 190687). This "switch" is probably the cause for this bug. Guru - more investigation is needed here. Please assign and prioritize this bug. This needs to be addressed and exposed to the end user appropriately, updating the tabs and presenting a simple message to the end user. This has to be in the 4.4.1 This use case has to be revisited. Cannot be contained in 4.4.1 This defect should be targeted for future since it can be resolved without context reconstruction after attach. target for future (In reply to comment #8) > target for future > Can we have a discussion on this? I don't think it all has to fall on the agent. Although that is certainly one way to do it. The PI agent had the behavour of caching the last few 100 events before an attach, but it also kept track fo the stack and all associated "defintion" events that had been sent. SO it in effect had a rough idea of what the client knew. This is sort of what you are proposing, but is not completely required. If the agent assumes the client has no data and just sends the events it has at and after attach, and also sends the definition events it has not sent in that attach session, the client has enough information to recover. If exit events come with out entry events, the entry time can be default to attach time (base time) or the time of the last know entry event.. If the definition is not received we use the default "unknown" but if the agent sends the def event the definition will be created and used or the ID will match and the previous unknown will be updated with the correct info.. I know that some bugs have creeped into the loaders that may cause this to break initially, but we can and should fix those bugs. Does this make the implementation more viable? and simpler? Created attachment 102226 [details]
patch
This fix can cover only narrow set of use cases. The main limitation for stack reconstruction only by "method exit" event is JVM instrumentation policy. It doesn't allow to change bytecode for methods already entered. So, if profiler attaches to the application first time it shouldn't obtain any "exit method" events even for methods which covered by filters and should be instrumented. Instrumented methods are used in execution only if they are entered after instrumentation.
The only one case in which this fix works is reattaching to application where instrumentation remains since detach because instrumented methods haven't been left yet.
Even this approach is arguable because filtering and hence set of classes instrumented after attach can be changed. Such mixing of instrumentation's versions can disfigure results.
Probably it is rare situation in common use cases.
One solution to avoid mixing is add version of instrumentation to events and collect only corresponding to current session, but it can introduce extra overhead.
With fix profiler builds up call graph if unbound "exit method" event comes.
Asaf, could you review this patch? (In reply to comment #10) I am not sure about the value of applying this patch vs. the potential risk of introducing regression issues. As Alex pointed out, this patch correctly covers only a handful on scenarios that can be reproduced in testing but will not usually occur with real-world applications. Harm - the main difference between the PI and the TI agent is that the PI agent was able to keep tracking of JMVPI events during detach (either silently processing or ignoring them). In TI, this is not possible because once detached, the TI Agent removes all BCI from the code and does not get any events from the running application. There is another issue (mentioned by Alex) of the inability to instrument methods currently on-stack (a JVMTI limitation). My understanding of Harm's suggestion (in comment #9) is that this can be fixed at the workbench/client side by correctly processing "method exit" events that have no corresponding "method enter" events. If my understanding it correct, then the fix should target the workbench code and not the agent code. My decision to put recovery code to the agent is based on the following: - Loaders code is common for PI and TI agents but this issue is only actual for TI - Fix for deviation in behavior is reasonable to put to the component which causes it. - I think at the end agent should drop out method exit events which are not corresponds to the current instrumentation, and this fix would be removed. I agree with Asaf, proposed usecase isn't applicable for TI agent, and the fix can't help in general. Harm, Joanna, what do you recommend for proceeding with this bug? Patch is OK. I would suggest that we discuss this with documentation as a difference between PI and TI and _not_ fix it. the beauty of TI is changable BCI instead of performance limiting always executing enter/exit events. However, this comes with certain limitations the most notable of which is that BCI does not take effect for currently executing instances of methods (i.e., only for subsequent method invocations). As a result, information for methods already entered when instrumentation occurs is ragged and/or non-existant. This behavior is as designed. For the reasons discussed in the bug, you're not going to have a simple patch that emulates enterevents when exitevents occur you'll really only further confuse folks. Thus, I suggest not doing this patch I don't think we're going to create an efficient solution that fully closes this PI/TI difference (without forcing TI to emulate PI which I do not think is a worthwhile endeavour). thx,Chris Unfortunately, at this stage it is hard to debate any fix. We just need to avoid confusion. In March we had some options to explore, now it is a bit late to do much. This is also not about being PI equivalent, and I thought several months ago we had gotten past that issue. If all the instrumentation is removed at detach then re-attach is the same as any delayed attach. The stack does not have to be monitored as with PI, it can be dumped when needed, if needed, at the time if instrumentation to get an understanding of the code that can not be instrumented. There is always going to be something on the stack with a delayed attach. I think we are fortunate that often those classes are in the exclude set anyway. What is needed is a strategy to instrument things on the stack. For example as they are released from the stack lazily instrument them. We also discussed recreating the stack entry events using the time stamp of the attach and appropriate ticket #s. The question is then just about how to catch exit events and how to instrument things freed from the stack. Another possible action is to optionally not remove the instrumentation. Default to the current removal behavior but make it optional. You could even imagine more advanced filtering to allow for permanent instrumentation at some point in our future... The re-attach is a specific use case that from what I have seen is most commonly used to learn about the code. The user typically triggers various usage patterns and wants to capture activity only during that time, all into one "trace" file. The key being is that the user understands the impact of merging these collections of data versus creating separate traces. (In reply to comment #17) > The stack does not have to be monitored as with PI, it can be dumped when > needed, if needed, at the time if instrumentation to get an understanding of > the code that can not be instrumented. There is always going to be something on > the stack with a delayed attach. I think we are fortunate that often those > classes are in the exclude set anyway. There is no guarantee that a "stack dump" will match the set "method enter/leave" events that would be generated for the same code sequence. A typical example is inlined methods, which will *not* appear in the stack dump but will generate "enter/leave" events through BCI. Therefore, constructing a correct stack view at the point of attach that is consistent with future enter/leave events is nearly impossible, and the Agent will have to account for the discrepancies. The current implementation didn't go down this path, and therefore the Agent always switch to "aggregated" mode upon attach. > > What is needed is a strategy to instrument things on the stack. For example as > they are released from the stack lazily instrument them. > There is no limitation on instrumenting methods currently on stack. It is just that this cause the JVM to create an *additional version* of the instrumented method. The copy on stack is not affected, but future copies will be affected. There is no need for lazy instrumentation. > The re-attach is a specific use case that from what I have seen is most > commonly used to learn about the code. The user typically triggers various > usage patterns and wants to capture activity only during that time, all into > one "trace" file. The key being is that the user understands the impact of > merging these collections of data versus creating separate traces. Currently, "reattach" creates a separate trace file. This is the main conceptual difference between attach/detach and pause/resume: the aforementioned scenario is more appropriate for pause/resume than for attach/detach, and is already supported with the existing TI agent. (In reply to comment #18) I agree the stack dump may not be a replacement, and that is why I said "if needed" ;-) My point was that we don't need to be equal to PI and alternatives could be done. Stack dump was one, I mentioned before that the even loaders could handle this recovery as well, but we could not introduce such code in this release anymore. As I mentioned to Alex at one point the event loaders actually handled these conditions as error recovery but that support was removed in error at some point. > > There is no limitation on instrumenting methods currently on stack. It is just > that this cause the JVM to create an *additional version* of the instrumented > method. The copy on stack is not affected, but future copies will be affected. > There is no need for lazy instrumentation. > > > The re-attach is a specific use case that from what I have seen is most > > commonly used to learn about the code. The user typically triggers various > > usage patterns and wants to capture activity only during that time, all into > > one "trace" file. The key being is that the user understands the impact of > > merging these collections of data versus creating separate traces. > > Currently, "reattach" creates a separate trace file. This is the main > conceptual difference between attach/detach and pause/resume: the > aforementioned scenario is more appropriate for pause/resume than for > attach/detach, and is already supported with the existing TI agent. > So if a re-attach is going to a new trace file, there is no issue about instrumenting loaded classes, and all old instrumentation is removed at detach time, I have lost why there is a problem. This make sit sound like re-attach and attach should behave the same at the agent end, and there would not be any lost data. I guess this is why I suggested a discussion back in March ;-) It seems I get easily confused. > So if a re-attach is going to a new trace file, there is no issue about
> instrumenting loaded classes, and all old instrumentation is removed at detach
> time, I have lost why there is a problem.
Sorry, but could you clarify what problem do you mean?
Original statement of this bug was caused by incorrect interpretation of displayed information. The fact that method "main" appears in statistic can mislead user about certain point at call stack where attach happened and amount of date which profiler should provide, but anyway data is correct. Possibly it would be worth to remove all entities with zeros from views. But it isn't so urgent.
So since we are not going to eliminate difference in behavior between PI and TI agent I would close this issue as WONTFIX.
Do you have any objections?
Thanks,
Alex.
closed since no objections (In reply to comment #21) > closed since no objections > Sorry for not being connect while traveling and being able to respond within 21 hours ;-) I think the base premise in this case is that after the re-attach, time was spent in main and it is not reflected in the trace. regardless of the agent technology used, it seems reasonable to expect this data to be collected. If Alan can confirm that this is the basic issue I suggested we not close this defect and consider it a restriction until fixed. > Sorry for not being connect while traveling and being able to respond within 21 > hours ;-) Harm, sorry, anyway you always can reopen it ;) > I think the base premise in this case is that after the re-attach, time was > spent in main and it is not reflected in the trace. regardless of the agent > technology used, it seems reasonable to expect this data to be collected. We can't receive any information about time spent in "main" because at the moment of attach execution is already in this method without instrumentation and it can't be introduced at this point. So there isn't any code which can generate leave event and record rest of time spent in "main". Among that I think partial time of method execution ("main" in our case) is meaningless, we event can't guess about proportion. I think we need to carry this problem as an open issue and resolve it later. We won't resolve it in 4.5.0 (or in a dialog via bugzilla ;-) ) but there are some serious use cases that are impacted and I believe some viable ways to address the issue without adopting the PI approach. Harm, since we couldn't find consensus about this defect so long before, I think we should start discussion now to be in time for 4.5.1. :) So, could you provide description of valuable scenarios? Base on that we can try to find appropriate solutions, or maybe you already have some. Thanks, Alex. This is related to bug 194385. The bug has been reproduced on Build 200810210100. The test result is the same with Comment#4 by Asaf. I will continue to work on the problem and report the latest progress. Thanks, Chengrui Cheungrui, reassigning to you for investigation. Cheungrui, any updates on this? (In reply to comment #29) > Cheungrui, any updates on this? After investigating this bug carefully and referring to discussions above, two possible solutions are listed as below: 1) Light weight byte code instrumentation at methods' entry and exit with the help of conditional branch for execution time analysis in enabled mode (i.e. implement in Java code, do not call JNI to check the condition). It¡¯s similar to ¡°pause/resume¡±. However, this will affect original meaning of ¡°enabled mode¡±. Whether users apply attach operations or not, additional byte code at methods' entry and exit will be added. 2) Use JVMTI events MethodEnter and MethodExit to solve the problem. This will introduce big overhead and we should note that not all JVMs implement these two events. If this solution is used, we can discuss how to optimize and decrease the overhead. Please give some suggestions and other better solutions are welcomed. Thanks, Chengrui (In reply to comment #30) > 1) Light weight byte code instrumentation at methods' entry and exit with the > help of conditional branch for execution time analysis in enabled mode (i.e. > implement in Java code, do not call JNI to check the condition). It¡¯s similar > to ¡°pause/resume¡±. However, this will affect original meaning of ¡°enabled > mode¡±. Whether users apply attach operations or not, additional byte code at > methods' entry and exit will be added. The attach/detach feature in the JVMTI Profiler was designed to support a true attach scenario as defined by the Java 6 JVMTI specification (see http://java.sun.com/javase/6/docs/platform/jvmti/jvmti.html#onattach). In this scenario, the Agent may be loaded when the application is already running. There's no JVMTI Agent in the JVM process up to this point. Any solution based on the assumption that you can load a "lightweight BCI" agent on JVM start-up cannot support true attach/detach scenarios. Additionally, even a "lightweight BCI agent" that only maintains a stack-context will have between 1.5X to 4X overhead, depending on whether JNI callbacks are being used or not. I therefore restate my original objection on solving this issue by implementing a lightweight BCI mechanism of some sort (but see some other ideas below). > > 2) Use JVMTI events MethodEnter and MethodExit to solve the problem. This will > introduce big overhead and we should note that not all JVMs implement these two > events. If this solution is used, we can discuss how to optimize and decrease > the overhead. > As far as I know, NONE of the JVMs TPTP supports (from IBM, Sun and BEA) support the MethodEnter/Leave capability. This is not a viable option. IMHO, the correct way to resolve this issue is to have the Agent use the JVMTI stack dump mechanism upon attach to create an execution context (i.e. shadow-stack) for each thread. As previously discussed, this solution must take into account and resolve any discrepancies between the stack dump and the BCI-generated enter/leave notifications (see comment 18). The resolution can be at the agent side, data loader side or both. Thanks, Asaf (In reply to comment #31) I agree with Asaf. As I have noted earlier there are two parts to the solution. A stack dump/parse at attach time to trigger the sending of appropriate def/obj/entry/exit events etc., and a clean up of the event loaders to handle unmatched events with "unknown". Also s Asaf notes, this was done once already in the pi agent, and there is likely code to be reused. There can be a discussion of what values to use for time stamps etc. but this is a reasonable and simple approach to the problem, and actually the only workable one proposed to-date. An additional variation to the solution is to cover "re-attach" and whne some collection has already taken place, meaning class and method definition events have already be sent and duplicates should be avoided. This implies the ti agent keeps a list of the def events already sent. This is more complex, but can not be addressed until basic attach works. (In reply to comment #32) > An additional variation to the solution is to cover "re-attach" and whne some > collection has already taken place, meaning class and method definition events > have already be sent and duplicates should be avoided. This implies the ti > agent keeps a list of the def events already sent. This is more complex, but > can not be addressed until basic attach works. As far as I know, re-attach opens a new trace file, and therefore all class and method "def" events must be sent (again) for each class and method encountered. Optimizing this is a good idea but probably requires changes at the data layer to enable the reuse of this information between traces. (In reply to comment #33) > As far as I know, re-attach opens a new trace file, and therefore all class and > method "def" events must be sent (again) for each class and method encountered. > Optimizing this is a good idea but probably requires changes at the data layer > to enable the reuse of this information between traces. > In the past there was a pause mode, and in attach (re-attach) the user decided if they wanted a new trace or to extend an existing one. I believe this has been disabled in some scenarios because it obviously will not work with TI right now since attach is not fully working. This style of data collection was also useful when the user could turn collection on and off with triggers in the application being monitored. However to be clear these are improvements we can consider after we get the basic attach scenario working properly which is a very important use case that has been broken for some time now. Since this is targeted for 4.5.2 and the development phase is about to finish, are we going to target it for the shutdown phase? Not containable in 4.5.2 As discussed on the PMC call this fix will require a lot of corner case testing so it should be started asap and in the 4.5.3 build asap. I will try to help with the loader default and error handling fixes. Perhaps this defect should be split into the agent work and the model loader work. Eugene is likely the optimal owner of the model work. We would like to target this for TPTP 4.5.2.1 Milestone 2. Cheungrui, could you please take a look at this as a priority item? Thanks! (In reply to comment #38) > We would like to target this for TPTP 4.5.2.1 Milestone 2. > > Cheungrui, could you please take a look at this as a priority item? Thanks! > Ok. I am working on it. Thanks. Please note that this defect fix is requested by an adopter product. Hi Chengrui, Have you been able to start lookiing into this problem? Do you still need anything from us to make progress on this? (In reply to comment #41) > Hi Chengrui, > > Have you been able to start lookiing into this problem? Do you still need > anything from us to make progress on this? > Hi, Kathy, I am working on stack dump solution coding part now. The discrepancy problem will be considered later. As Eugene said in comment #26, this bug is also related with bug 194385. I will report the latest progress when initial testing is finished. Thanks, Chengrui Dear members,
I am working on shadow stack solution implementation referring to discussions on Bugzilla. Below is a shadow stack solution design brief descriptions. If you have any suggestions, please give me your feedback. When initial coding and testing tasks are finished, the patch will be submitted.
1. Two typical scenarios
(1.1) Attach happens for the first time in enabled mode
In this case, methods which already entered in runtime stack are not instrumented. When attach happens for the first time, we need to generate MethodEnter and MethodLeave events for the runtime-stack methods.
(1.2) Reattach after detach in enabled mode
The methods in runtime stack when reattach happens can be divided into two categories: Not instrumented and instrumented before.
For not instrumented methods, it is similar to (1.1), we need to generate MethodEnter and MethodLeave events for them.
For instrumented methods in runtime stack, we need to handle additional MethodLeave events reported by instrumented methods themselves.
2. Shadow stack solution design description
(2.1) Data structure
Two stacks are used for each thread. One stack is a shadow stack to trace methods entered before attach (Stack_1), the other is original methods call record in profiler for tracing instrumented methods after attach (Stack_2). (For description simplicity, here just describes how to handle in one thread.)
(2.2) Algorithm
(a) Data:
Stack_1 - Shadow stack for tracing methods entered before attach.
Stack_2 - Stack for tracing instrumented methods after attach.
(b) Initial condition:
Stack_1 and Stack_2 are empty.
(c) Modified MethodEnter and MethodLeave events handlers for attach in enabled mode:
(c.1) MethodEnter:
If (Stack_2 is empty)
{
Get latest stack snapshot into temp_Stack_1 from JVM stack trace;
Compare Stack_1 and temp_Stack_1:
If (Methods exist in temp_Stack_1 but not exist in Stack_1)
{
Generate MethodEnter events for such kind of each method according to methods call sequence;
}
If (Methods exist in Stack_1 but not exist in temp_Stack_1)
{
Generate MethodLeave events for such kind of each method according to methods exit sequence;
}
Stack_1 = temp_Stack_1;
}
Push new instrumented method into Stack_2, its MethodEnter event will be generated automatically by after attach instrumented code.
(c.2) MethodLeave:
if (Stack_2 is empty)
{
Discard MethodLeave notification, because it is generated by instrumeted code before attach for in-runtime-stack;
Return;
}
Pop method from Stack_2, its MethodLeave event will be generated automatically by after attach instrumented code.
(c.3) When VM shutdown or detach happens:
Pop up and generate MethodLeave events for methods left in Stack_1.
(2.3) Examples
----------------------------
MethodA()
{
MethodB();
MethodD();
}
MethodB()
{
// Attach happen here
MethodC();
}
----------------------------
Algorithm Execution Process (AEP):
AEP1: Attach happen. MethodA and MethodB are not instrumented and in runtime-stack. Stack_1 and Stack_2 are empty stacks.
AEP2: Instrumented MethodC is called, MethodEnter handle:
Stack_1: Empty; temp_Stack_1(From bottom to top): MethodA->MethodB
Generate MethodEnter events for MethodA and MethodB:
=MethodA_Enter
==MethodB_Enter
Stack_1 = temp_Stack_1: MethodA->MethodB;
Push MethodC into Stack_2: MethodC, generate MethodC_Enter event:
===MethodC_Enter
AEP3: Instrumented MethodC exit, MethodLeave handle:
Stack_2: MethodC ,not empty, generate MethodC_Leave event:
===MethodC_Leave
Pop MethodC, Stack_2 becomes empty.
AEP4: Not instrumented MethodB exit, no event generate now;
AEP5: Instrumented MethodD is called, Method Enter handle:
Stack_1: MethodA->MethodB, temp_Stack_1: MethodA
Generate MethodLeave event for MethodB:
==MethodB_Leave
Stack1 = temp_Stack_1: MethodA
Push MethodD into Stack_2: MethodD, generate MethodD_Enter event:
==MethodD_Enter
AEP6: Instrumented MethodD exit, MethodLeave handle, similar to AEP3:
==MethodD_Leave
AEP7: When detach or VM shutdown, pop up Stack_1 left methods and generate MethodLeave events for them:
=MethodALeave
Now Stack_1 and Stack_2 become empty.
Method enter and leave events above summary:
=MethodA_Enter
==MethodB_Enter
===MethodC_Enter
===MethodC_Leave
==MethodB_Leave
==MethodD_Enter
==MethodD_Leave
=MethodALeave
3. Shadow stack solution limitation
The call graph maybe not the same with actual call process because of inlined methods etc.
Thanks,
Chengrui
Cheungrui,
Thanks first for your proposal. I have the following comment:
1. In (1.1), Def and Obj events should also be sent. Similar applies to (1.2).
2. In (c.2) MethodLeave, method not instrumented should also send out event at this point, Because incorrect time will be sent if there is code execution in MethodA between MethodB and MethodD due to the delay in MethodB_Leave event.
AEP4: Not instrumented MethodB exit, no event generate now;
AEP5: Instrumented MethodD is called, Method Enter handle:
Stack_1: MethodA->MethodB, temp_Stack_1: MethodA
Generate MethodLeave event for MethodB:
==MethodB_Leave
Stack1 = temp_Stack_1: MethodA
In your example, MethodB_Leave is not sent until MethodD is called. But there could be code execution between call of MethodB and MethodD.
I think we are in the right direction but have to fine-tune the algorithm.
Thanks,
Eugene
(In reply to comment #44) Hi, Eugene, Thanks for your suggestion. Comments for discussion are as below: > 1. In (1.1), Def and Obj events should also be sent. Similar applies to (1.2). I will consider this point in the solution. Thanks. > 2. In (c.2) MethodLeave, method not instrumented should also send out event at > this point, Because incorrect time will be sent if there is code execution in > MethodA between MethodB and MethodD due to the delay in MethodB_Leave event. > > AEP4: Not instrumented MethodB exit, no event generate now; > AEP5: Instrumented MethodD is called, Method Enter handle: > Stack_1: MethodA->MethodB, temp_Stack_1: MethodA > Generate MethodLeave event for MethodB: > ==MethodB_Leave > Stack1 = temp_Stack_1: MethodA > > In your example, MethodB_Leave is not sent until MethodD is called. But there > could be code execution between call of MethodB and MethodD. For not instrumented methods already in stack, we can not get the exact exit time information. But we should assure the Method Call sequence correctly. If code execution happens between MethodB and MethodD, it also can be handeled by the algorithm. ---------------- MethodA() { MethodB(); ... MethodX(); MethodD(); } ---------------- This case is similar to original case. MethodB_Leave will be sent until MethodX is called. We can not sure when MethodB exits exactly, but we can sure MethodB_Leave event happens before enter follow up MethodX. Thanks, Chengrui (In reply to comment #44) > Cheungrui, > Thanks first for your proposal. I have the following comment: > 1. In (1.1), Def and Obj events should also be sent. Similar applies to (1.2). Thanks. Yes, MethodDef and ClassDef evens should be sent out. In current codebase, these two events will be sent in MethodEnterHandler. So no extra work for Def events. Eugene, what's Obj events? > 2. In (c.2) MethodLeave, method not instrumented should also send out event at > this point, Because incorrect time will be sent if there is code execution in > MethodA between MethodB and MethodD due to the delay in MethodB_Leave event. > AEP4: Not instrumented MethodB exit, no event generate now; > AEP5: Instrumented MethodD is called, Method Enter handle: > Stack_1: MethodA->MethodB, temp_Stack_1: MethodA > Generate MethodLeave event for MethodB: > ==MethodB_Leave > Stack1 = temp_Stack_1: MethodA > In your example, MethodB_Leave is not sent until MethodD is called. But there > could be code execution between call of MethodB and MethodD. If the method call between B and D is instrumented method, It should be no problem to report the event because in method enter event we will check the stack. If the method call between B and D is non-instrumented method (for example: function call in a optimized loop), yes, we may delay send out the event and the time is not accurate. We can not catch the funtion between B and D, you know, we do not have chance to break the execution to get the function name. > I think we are in the right direction but have to fine-tune the algorithm. Yes, there're also some corner cases we need to considerate. > Thanks, > Eugene (Shadow Stack Solution Algorithm Description Update)
2. Shadow stack solution design description
(2.1) Data structure
Two stacks are used for each thread. One stack is a shadow stack to trace methods entered before attach (Stack_1), the other is original methods call record stack in profiler for tracing instrumented methods after attach (Stack_2). (For simplicity, here just describes how to handle for one thread.)
(2.2) Algorithm
(a) Data:
Stack_1 - Shadow stack for tracing methods entered before attach.
Stack_2 - Stack for tracing instrumented methods after attach.
(b) Initial condition:
Stack_1 is snapshot of VM stack when attach happens and send MethodEnter events for each methods. Stack_2 is empty.
(c) Modified MethodEnter and MethodLeave events handlers for attach in enabled mode:
(c.1) MethodEnter:
If (Stack_2 is empty)
{
Get latest stack snapshot into temp_Stack_1 from JVM stack trace;
bool isShadowStackChanged = false;
Compare Stack_1 and temp_Stack_1:
{
If (Methods exist in temp_Stack_1 but not exist in Stack_1)
{
isShadowStackChanged = true;
Generate MethodEnter events for such kind of each method according to methods call sequence;
}
If (Methods exist in Stack_1 but not exist in temp_Stack_1)
{
isShadowStackChanged = true;
Generate MethodLeave events for such kind of each method according to methods exit sequence;
}
}
If (isShadowStackChanged) { Stack_1 = temp_Stack_1; }
}
Push new instrumented method into Stack_2, its MethodEnter event will be generated automatically by after attach instrumented code.
(c.2) MethodLeave:
if (Stack_2 is empty)
{
Discard MethodLeave notification, because it is generated by instrumeted code before attach for in-runtime-stack;
Return;
}
Pop method from Stack_2, its MethodLeave event will be generated automatically by after attach instrumented code.
(c.3) When VM shutdown or detach happens or thread exit:
Pop up and generate MethodLeave events for methods left in Stack_1.
(2.3) Examples
----------------------------
MethodA()
{
MethodB();
MethodD();
}
MethodB()
{
// Attach happen here
MethodC();
}
----------------------------
Algorithm Execution Process (AEP):
AEP1: Attach happen. MethodA and MethodB are not instrumented and in runtime-stack. Snapshot VM stack trace into Stack_1.
Stack_1(From bottom to top): MethodA->MethodB.
Stack_2 is empty.
Generate Method_Enter events (MethodDef, ClassDef will be generated in current TPTP code automatically when MethodEnter is called):
=MethodA_Enter
==MethodB_Enter
AEP2: Instrumented MethodC is called, MethodEnter handle:
Stack_1: MethodA->MethodB; temp_Stack_1: MethodA->MethodB
Push MethodC into Stack_2: MethodC, generate MethodC_Enter event:
===MethodC_Enter
AEP3: Instrumented MethodC exit, MethodLeave handle:
Stack_2: MethodC ,not empty, generate MethodC_Leave event:
===MethodC_Leave
Pop MethodC, Stack_2 becomes empty.
AEP4: Not instrumented MethodB exit, no event generate now;
AEP5: Instrumented MethodD is called, Method Enter handle:
Stack_1: MethodA->MethodB, temp_Stack_1: MethodA
Generate MethodLeave event for MethodB:
==MethodB_Leave
Stack1 = temp_Stack_1: MethodA
Push MethodD into Stack_2: MethodD, generate MethodD_Enter event:
==MethodD_Enter
AEP6: Instrumented MethodD exit, MethodLeave handle, similar to AEP3:
==MethodD_Leave
AEP7: When detach or VM shutdown, pop up Stack_1 left methods and generate MethodLeave events for them:
=MethodALeave
Now Stack_1 and Stack_2 become empty.
Method enter and leave events above summary:
=MethodA_Enter
==MethodB_Enter
===MethodC_Enter
===MethodC_Leave
==MethodB_Leave
==MethodD_Enter
==MethodD_Leave
=MethodALeave
(In reply to comment #46) > Eugene, what's Obj events? I meant Method and ClassDef. sorry for the confusion. (In reply to comment #47) Getting a stack dump from JVMTI in the context of "Method Enter" may be too costly. In some cases, Stack_2 may become empty many times (depending on the exact point of attach) and this will cause large number of "stack dumps" which may be detrimental to performance. Here's another direction worth checking (not a complete solution, there are probably some corner cases as well, needs prototyping): Upon attach, suspend all threads and for each thread: 1. Get a stack dump 2. Report "Method Enter" (and all related "def" events) for all functions currently on stack. 3. Use the JVMTI "NotifyFramePop" function to mark all active frames. This will cause JVMTI to generate a "FramePop" event for these functions when they go out of stack, and these events can be easily converted to "Method Leave" events. The main difference with this approach is that it only does a stack dump once (upon attach). If the overhead of "NotifyFramePop" is not very high, it can work pretty well. BTW - I verified that all supported JVMs support the "can_generate_frame_pop_events" JVNTI capability. My 2 cents.... Asaf (In reply to comment #49) > (In reply to comment #47) > > Getting a stack dump from JVMTI in the context of "Method Enter" may be too > costly. In some cases, Stack_2 may become empty many times (depending on the > exact point of attach) and this will cause large number of "stack dumps" which > may be detrimental to performance. > > Here's another direction worth checking (not a complete solution, there are > probably some corner cases as well, needs prototyping): > > Upon attach, suspend all threads and for each thread: > 1. Get a stack dump > 2. Report "Method Enter" (and all related "def" events) for all functions > currently on stack. > 3. Use the JVMTI "NotifyFramePop" function to mark all active frames. This will > cause JVMTI to generate a "FramePop" event for these functions when they go out > of stack, and these events can be easily converted to "Method Leave" events. > > The main difference with this approach is that it only does a stack dump once > (upon attach). If the overhead of "NotifyFramePop" is not very high, it can > work pretty well. > > BTW - I verified that all supported JVMs support the > "can_generate_frame_pop_events" JVNTI capability. > > My 2 cents.... > > Asaf > If "FramePop" is supported by JVMs, "NotifyFramePop" will decrease stack dump cost. I will try it. Asaf, thanks for your information. Chengrui Created attachment 133879 [details]
Shadow Stack Solution Implementation patch. This patch is used for TPTP 4.5.2.1
This patch is the shadow stack solution implementation. Three modules are modified. They are JPIAgent, CGProf and BaseProf. The design and implementation documents will be submitted later.
NotifyFramePop is not applied, because it will need modify not only Profilers code, MPI interface and Martini framework code will need be modified too. If the current implementation have efficiency problem, we can improve it later.
Dear committers, please help to review the patch, I will submit profiler binary files on WindowsXP for testing usage soon.
Thanks,
Chengrui
Created attachment 133881 [details]
Profilers binary files for TPTP 4.5.2.1 testing usage
Please test StartStop.java with attached profilers binary files.
(In reply to comment #52) > Created an attachment (id=133881) [details] > Profilers binary files for TPTP 4.5.2.1 testing usage > > Please test StartStop.java with attached profilers binary files. > Hi Chengrui, I tried the patch in 4521 build with IBM JVM 1.5SR9. My application (PerformanceExample) crashed when I performed the reattach action. There was no crash report created. It seems to work for some other application (eg. StartStop), but I see additional profiling data in the collection post reattach. ie. data under packages java.io*,sun.nio.cs*, or.geclipse.tptp.martini* Would you please take a look a the problem? Thanks. (In reply to comment #53) > (In reply to comment #52) > > Created an attachment (id=133881) [details] [details] > > Profilers binary files for TPTP 4.5.2.1 testing usage > > > > Please test StartStop.java with attached profilers binary files. > > > > Hi Chengrui, > > I tried the patch in 4521 build with IBM JVM 1.5SR9. > My application (PerformanceExample) crashed when I performed the reattach > action. There was no crash report created. > It seems to work for some other application (eg. StartStop), but I see > additional profiling data in the collection post reattach. ie. data under > packages java.io*,sun.nio.cs*, or.geclipse.tptp.martini* > > Would you please take a look a the problem? Thanks. > I will investigate crash problem. It is also reproduced on my machine with Sun's JVM. It should be related with shadow stack MethodEnter track process. Additional profiling data problem will be handled too. If any update, I will report it. Eugene, Thanks for your help. Regards, Chengrui (In reply to comment #53) > (In reply to comment #52) > > Created an attachment (id=133881) [details] [details] > > Profilers binary files for TPTP 4.5.2.1 testing usage > > > > Please test StartStop.java with attached profilers binary files. > > > > Hi Chengrui, > > I tried the patch in 4521 build with IBM JVM 1.5SR9. > My application (PerformanceExample) crashed when I performed the reattach > action. There was no crash report created. The failure on PerformanceExample is caused by: 1) Race condition when track MethodEnter; 2) Call on original function CProfEnv::GetStackTrace(...). In GetStackTrace, SMethodInfo* methodInfo = GetMethodData(stackTrace->pStackEntries[i].methodId); Then the pointer methodInfo is used but not check whether it is a null pointer. Because GetMethodData can return a valid pointer or not. The crash on PerformanceExample has been fixed now. > It seems to work for some other application (eg. StartStop), but I see > additional profiling data in the collection post reattach. ie. data under > packages java.io*,sun.nio.cs*, or.geclipse.tptp.martini* > > Would you please take a look a the problem? Thanks. > It seems some classes are not instrumented in profiling process (CCGAdaptor::CanInstrumentClass). But when we get stack trace, methods belong to these classes are in the runtime stack. If we should not report these methods, similar filter rules as in CCGAdaptor::CanInstrumentClass will be needed in track process. I am working on this problem now. The profilers binary for testing will be provided soon. Thanks, Chengrui Created attachment 135756 [details] Shadow stack solution patch to solve reattach and attach problem with CGProf This patch is used to solve re-attach problem caused by JVMTI limitation in TPTP 4.5.2.1. The corresponding patch for TPTP 4.6.0 has been checked into CVS (Please refer to comment 17 of bug 270767). Additional patch code to solve HeapProf crash problem (bug 275321) is also included comparing with patch for TPTP 4.6.0. Thanks, Chengrui Created attachment 136422 [details] Patch - 4521 - Update Shadow stack solution to solve reattach and attach problem with CGProf Since HeapProf crash problem has been fixed in a separate bug 276921 for TPTP 4.5.2.1, I modified the patch - only including shadow stack solution related code in the patch. The shadow stack design and implementation document has been submitted to Bugzilla in bug 270767. Requesting approval to include the following defect in TPTP 4.5.2.1 branch: After re-attach, no more data is collected for methods already entered. (https://bugs.eclipse.org/bugs/show_bug.cgi?id=194081) 1. Explain why you believe this is a stop-ship defect. How does the defect manifest itself, and how will users of TPTP / consuming products be affected if the defect is not fixed? After re-attach, no more data is collected for methods already entered for CGProf with execDetals option in enabled mode. This is caused by JVMTI instrumentation profiling method limitation. It will disable attach to CGProf with execDetails option in enabled mode. 2. Is there a work-around? If so, why do you believe the work-around is insufficient? No. 3. Is this a regression or API breakage? Explain. No. 4. Does this require new API? Yes. 5. Who performed the code review? Jonathan. 6. Is there a test case attached to the bugzilla record? No. Since the test cases path in CVS has been recorded. We can get these test cases anytime. 7. What is the nature of the fix? What is the scope of the fix? What is the risk associated with this fix? This fix is a complex solution. It uses stack dump mechanism and a shadow stack algorithm is designed. About 500 lines code has been added. The fix just affects CGProf profiling with execDetals in enabled mode. Risk is medium. 8. Is this fix related to any standards that TPTP adheres to? If so, who has validated that the fix continues to adhere to the standard? No. Since we are running out of time for TPTP 4.5.2.1 Milestone 2. I'm retargetting this to milestone 3. Comment on attachment 136422 [details] Patch - 4521 - Update Shadow stack solution to solve reattach and attach problem with CGProf Since a regression is found in TPTP 4.6.0 related this submitted patch, a new updated patch will be needed for bug 194081. Please refer to comment 6 of bug 277431 about details. The updated patch for TPTP 4.5.2.1 milestone 3 will be submitted later. Thanks, Chengrui Please submit an updated patch to back port this fix from TPTP 4.6 to TPTP 4.5.2.1. We are opening up the 4.5.2.1 stream now for check-ins. Please submit an updated patch to back port this fix from TPTP 4.6 to TPTP 4.5.2.1. We are opening up the 4.5.2.1 stream now for check-ins. Please also fill in the PMC approval template so that I could bring it forward for PMC approval. Thanks! Created attachment 139720 [details] Reattach problem patch for TPTP 4.5.2.1 This patch is a counterpart for reattach problem in TPTP 4.5.2.1. It has been fixed in TPTP 4.6.0. This submitted patch actually merge the patches code for bug 270767, bug 277431 and bug 278437 in TPTP 4.6.0. Dear committers, please help to review the patch code. Thanks, Chengrui PMC approval template will be filled after the submitted patch is reviewed. Jonathan, Could you please take a look at the combined patch for TPTP 4.5.2.1 as well? Patch looks good. In the interest of time, Jonathan, could you please post a PMC approval template to this defect so that we could get PMC approval and check in the code today? On behalf of Chengrui, I would like to request that the following defect be considered for PMC approval for 4.5.2.1 1. Explain why you believe this is a stop-ship defect. How does the defect manifest itself, and how will users of TPTP / consuming products be affected if the defect is not fixed? After re-attach, no more data is collected for methods already entered. This is an important profiler requirement. 2. Is there a work-around? If so, why do you believe the work-around is insufficient? No. 3. Is this a regression or API breakage? Explain. Regression from behaviour of previous profiler. 4. Does this require new API? No. 5. Who performed the code review? Myself, for Chengrui 6. Is there a test case attached to the bugzilla record? Part of standard JVMTI tests 7. What's the nature of the fix? What is the risk associated with this fix? " This fix is a complex solution. It uses stack dump mechanism and a shadow stack algorithm is designed. About 500 lines code has been added. The fix just affects CGProf profiling with execDetals in enabled mode. Risk is medium. ". Fix has already been tested in 4.6.0. 8. Is this fix related to any standards that TPTP adheres to? If so, who has validated that the fix continues to adhere to the standard? No. We have known for a while that IBM would want this in 4.5.2.1. If we are happy with the solution in 4.6.0, I see no problems with 4.5.2.1 backpatch. Putting the list of PMC member in the PMC_approved flag so that they are being notified of this PMC approval request for TPTP 4.5.2.1. Note that Chris has already provided his PMC approval earlier. Patch checked into 4521 w/ PMC approval. Using tptp.sdk-TPTP-4.5.2.1-200906241706 build, the application crash after re-attach. Chengrui and Yunan, Please take a look to see why the patch in the TPTP-4.5.2.1-200906241706 build did not fix the problem it was intended to fix. (In reply to comment #73) > Using tptp.sdk-TPTP-4.5.2.1-200906241706 build, > the application crash after re-attach. > Hi, Eugene, I tested tptp.sdk-TPTP-4.5.2.1-200906241706 build with test cases StartStop, PerformanceExample, InOut, StatisticsTest and ArgumentsTest. There is no crash happen on my machine. Please try again. If the problem is still there, please tell me which test case crash in your testing? Thx, Chengrui The problem is reproducible with PerformanceExample on the same build (tptp.sdk-TPTP-4.5.2.1-200906241706.zip) using IBM JRE 1.5 (There is an IBM JRE shipped in this IBM Download Manager product that you can use for debugging > http://www-01.ibm.com/support/docview.wss?rs=3352&uid=swg24021016). I do not see the problem with StartStop and I do not have StatisticsTest and ArgumentsTest for further testing. The problem with PerformanceExample does not happen with Sun JRE. Attaching screen capture. Created attachment 140395 [details]
screen capture of failure
Created attachment 140396 [details]
screen capture of failure
Created attachment 140397 [details]
Java crash dialog detail
Eugene, could you please see if similar problem exist in the TPTP 4.6 driver? (In reply to comment #80) > Eugene, could you please see if similar problem exist in the TPTP 4.6 driver? > Problem is reproducible in TPTP 4.6 driver. I have installed IBM JVM. It seems the problem is caused by difference between IBM's and SUN's JVM at first glance. I will do some investigation. Thanks, Chengrui It is found the crash with PerformanceExample on IBM JVM caused by null pointer access in a self-defined list data structure in Martini profiler part(CThreadSet). Besides this problem, the hard-coded filter mechanism for CGProf reattach tracker can be replaced by external module (i.e. JPIAgent) filter. I am preparing the patch code. After finish coding and testing, the incremental patch will be submitted soon. Thanks, Chengrui Resolving this bug as the patch for fixing the Sun JRE problem has been checked into TPTP-4_5_2_1. Bug 282531 has been opened to track the remaining problem specific to IBM JRE. Verified in consuming product. |