Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 108646

Summary: Develop an Aggregating Agent for execution trace
Product: z_Archived Reporter: amehrega
Component: TPTPAssignee: Bob Duncan <duncan>
Status: CLOSED FIXED QA Contact:
Severity: enhancement    
Priority: P1 CC: anandik, andrea.aime, apratt
Version: unspecifiedKeywords: plan
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
URL: http://www.eclipse.org/tptp/groups/Architecture/documents/features/hf_108646.html
Whiteboard: closed460
Bug Depends on:    
Bug Blocks: 108938, 134137    

Description amehrega CLA 2005-09-01 19:16:52 EDT
This enhancement is opened in response to a message posted on the newsgroup:
-----------------------------------------------------
“More strongly, I would argue that the current TPTP profiling method that 
collects complete execution history is useless for the following 
reason:  Collecting data about each method call in a new data record 
each time increases the runtime of the profiled application by at least 
a factor of 5.  This means that when your application runs under the 
profiler, over 80% of the running time is not application runtime, but 
profiling overhead.  This means that your profiling data essentially 
profiles the profiler itself and not the application to be profiled.

The profiling overhead is essentially proportional to the number of 
profiled method calls.  This is not at all proportional to the true 
running time of the methods; especially when the methods cause external 
communication like database access.

 From my experience (over two years profiling and optimizing 
applications in C++ and Java), I would say that the profiling overhead 
must never exceed 100% to obtain a meaningful profile, i. e. the runtime 
under the profiler must not be more than 2x the runtime without 
profiler. Overhead of 10% to 50% is even better, as then the actual 
running time dominates the profiler overhead. Collecting aggreate 
statistics helps a lot, because per profiled method call only a few 
system calls plus a few arithmetic operations are needed, rather than 
the generation and storage of a new data record for the complete 
execution history (my guess is that this can reduce the profiling 
overhead by a factor of 10 to 100, thus making it much easier to reach 
the goal of low profiling overhead compared to the normal running time 
of the application).

I have found the Hyades profiler 3.0.0 useless for this reason, and 
currently work with the Eclipse Profiler 
(http://eclipsecolorer.sourceforge.net/index_profiler.html), which does 
exactly the aggregate data collection that Pratt describes, which works 
very well for me too.

Note that there are several bug reports related to the extraordinary 
memory overhead of complete execution history, like bugs 56645, 75266, 
and 88917, and the following trivial example numbers make it clear why 
approaches that improve only linearly, like improved storage format or 
database storage of execution history will not solve the problem:  When 
method A is called 1000 times and calls method B 1000 times each, the 
complete execution history will contain over a million call records (and 
these numbers are still small!).  The aggregate statistics only show 2 
records, namely

    Method A:      1000 calls, total real time: ... total CPU time: ...
    Method B: 1000000 calls, total real time: ... total CPU time: ...

There is no way the million-fold extra data storage overhead of the 
complete execution history can be compensated for by improved data 
storage technology; the only viable solution to reduce both storage and 
runtime overhead of profiling is to collect aggregate statistics.

At the moment, I can only recommend the Eclipse Profiler for serious 
performance work, although it seems to be unmaintained and can cause 
deadlocks on application server startup occasionally.

Regards,

Oliver Schoett”

--------------------------------------------------------------

This is a convincing argument.  There are many users that are not satisfied of
TPTP’s profiler because of its scalability issues on the workbench side.  The
loaders simply take a lot of time to process the events that are received from
the Java agent profiler.  Collecting aggregate statistics can significantly
reduce the amount of memory and processing that is performed on the client side.
 I propose creating a new profiling set called “Aggregate Result” that will only
increment counters based on the events that it receives from the workbench.  The
data model and the loaders will significantly be lighter than what is currently
present in TPTP.  This will also help in improving the scalability of the product.
Comment 1 Marius Slavescu CLA 2005-09-07 12:15:04 EDT
This will cover the first item in Bug 108938.
Comment 2 Anandi Krishnamurthy CLA 2005-10-18 16:12:47 EDT
bug 108890 is also about aggregating data collection.
Comment 3 amehrega CLA 2005-11-02 21:25:34 EST
Marius,

I think this bug should be declared as being dependent on 108890
Comment 4 Valentina Popescu CLA 2005-11-10 11:51:03 EST
Theme: Scaling Up
Comment 5 Allan Pratt CLA 2005-11-30 15:23:31 EST
I believe that this bug and bug 108890 are identical in their intent: collecting and storing aggregated data instead of streaming a complete execution trace over the wire.

I see that the draft 4.2 feature list includes this bug (that is, 108646); because of that, I would say that bug 108890 should be closed as a duplicate of this one.

In addition, if this is the bug that survives, then I think its Component should be changed from "Platform.model" to "Platform.agents." I say that because this isn't a model issue: the model can already hold aggregated data, using the "TRCAggregatedMethodInvocation" data structure. This enhancement request isn't about the model, it's about adding the capabilities in the agent to collect and transmit the aggregated data, and in the loader to receive this data and populate the model.

Comment 6 Bob Duncan CLA 2005-12-08 23:38:28 EST
Bug 108890 has been closed as a dup of this defect and this defect has been transferred to the Platform.Agent.JVMPI component (as per Allan's comments). Allan's description doc has been transferred from 108990 to this defect. 
Comment 7 Bob Duncan CLA 2005-12-08 23:39:45 EST
*** Bug 108890 has been marked as a duplicate of this bug. ***
Comment 8 Andrea Aime CLA 2005-12-20 03:48:33 EST
This is a very important improvement indeed. The simple hprof fares much better in this respect, and stays light by only gathering information on a statistical base (taking regular snapshots of the stack instead of processing all method entry/exits). See http://java.sun.com/developer/technicalArticles/Programming/HPROF.html, cpu=samples settings.
Comment 9 Bob Duncan CLA 2006-02-02 12:26:49 EST
Retargeted as per yesterday's Platform call.
Comment 10 Bob Duncan CLA 2006-03-29 10:49:43 EST
Account for work so far: 60h+
Comment 11 Bob Duncan CLA 2006-04-05 12:28:41 EDT
Code has been checked-in and unit tested (including piAgent, loader, and UI) since April 2 EOD. Full test pass underway. Defects are being opened/tracked as per usual.
Comment 12 Paul Slauenwhite CLA 2009-06-30 13:21:41 EDT
As of TPTP 4.6.0, TPTP is in maintenance mode and focusing on improving quality by resolving relevant enhancements/defects and increasing test coverage through test creation, automation, Build Verification Tests (BVTs), and expanded run-time execution. As part of the TPTP Bugzilla housecleaning process (see http://wiki.eclipse.org/Bugzilla_Housecleaning_Processes), this enhancement/defect is verified/closed by the Project Lead since this enhancement/defect has been resolved and unverified for more than 1 year and considered to be fixed. If this enhancement/defect is still unresolved and reproducible in the latest TPTP release (http://www.eclipse.org/tptp/home/downloads/), please re-open.