Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 333114 - [LTTng] Incorrect statistics for processes in statistic view
Summary: [LTTng] Incorrect statistics for processes in statistic view
Status: CLOSED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: LinuxTools (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Bernd Hufmann CLA
QA Contact: Francois Chouinard CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-22 13:13 EST by Bernd Hufmann CLA
Modified: 2022-01-13 14:53 EST (History)
1 user (show)

See Also:
fchouinard: review+


Attachments
example trace file (56.58 KB, application/x-bzip)
2010-12-22 13:13 EST, Bernd Hufmann CLA
jjohnstn: iplog-
Details
Proposed patch for bug (2.85 KB, patch)
2011-01-11 16:34 EST, Bernd Hufmann CLA
bernd.hufmann: iplog-
Details | Diff
Fix for incorrect statistics (8.31 KB, patch)
2011-01-12 15:46 EST, Bernd Hufmann CLA
jjohnstn: iplog+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Bernd Hufmann CLA 2010-12-22 13:13:52 EST
Created attachment 185728 [details]
example trace file

When executing a program in Linux from a command shell (e.g. bash), a new process is forked that has a new process ID (PID) and has the process name of the parent shell. The name is changed from bash to the program name only when fs.exec is executed. These 2 processes refer to same process. However, the Statistics View  shows distinctive statistics for both processes because the relevant statistics are counted based on the process name. The statistics for this case should be merged (as it is done in LTTV).

I attached a trace with such a case. The relevant program is called ./open-write and the PID is 570. The time when the fork happens is 205.868239395 seconds and the fs.exec is done at 205.868588049. In the statistic view under Processes, there are 2 entries with PID 570, one with the name "./open-write - 570" and one with the name "bash - 570".
Comment 1 Bernd Hufmann CLA 2011-01-11 16:34:08 EST
Created attachment 186569 [details]
Proposed patch for bug

To merge statistic entries with same process ID (PID) and start time would degrade the performance. I did a prototype and it was ~20% slower. Since performance is a key objective, I propose to not merge these two entries. The proposed patch will display the start time of each process and sort the entries by PID so that these 2 entries are under each other in the statistic tree. 

To have 2 entries for the reported case has also the advantage that the  statistics before and after the fs.exec() call is available.
Comment 2 Bernd Hufmann CLA 2011-01-11 16:35:30 EST
Hi Francois
Could you please review the patch before I commit it?
Thank you very much!
Comment 3 Francois Chouinard CLA 2011-01-11 18:06:57 EST
Hi Bernd,

I had a look at your solution and I have a few comments (this is only for the sake of discussion).

First, I understand why you populate the processes sub-tree entries with [pid + name + birth time] but I find it looks a bit heavy (and I wonder if Steve would have done it like this :-)

Without completely re-doing the underlying Statistics model, I wonder if it would be possible to use [pid + birth time] instead of [pid + name + birth time] as the key to store counters and then have an extra map from [pid + birth time] to [name]. This way, we could 1) correctly update the stats counters and 2) use the new map to populate the processes sub-tree with [name] only (or [name + pid] which would be event better).

At fork() time, the new map would be populated with the parent's name (same as what we have now) and would subsequently be overwritten by the child's name after the fs.exec().

The tricky part would be detect when to update the name but it should be possible to do it efficiently.

What do you think?
Comment 4 Bernd Hufmann CLA 2011-01-12 15:43:41 EST
(In reply to comment #3)
> Hi Bernd,
> 
> I had a look at your solution and I have a few comments (this is only for the
> sake of discussion).
> 
> First, I understand why you populate the processes sub-tree entries with [pid +
> name + birth time] but I find it looks a bit heavy (and I wonder if Steve would
> have done it like this :-)
> 
> Without completely re-doing the underlying Statistics model, I wonder if it
> would be possible to use [pid + birth time] instead of [pid + name + birth
> time] as the key to store counters and then have an extra map from [pid + birth
> time] to [name]. This way, we could 1) correctly update the stats counters and
> 2) use the new map to populate the processes sub-tree with [name] only (or
> [name + pid] which would be event better).
> 
> At fork() time, the new map would be populated with the parent's name (same as
> what we have now) and would subsequently be overwritten by the child's name
> after the fs.exec().
> 
> The tricky part would be detect when to update the name but it should be
> possible to do it efficiently.
> 
> What do you think?

Hi Francois

thank you very much for your comments.

I agree that in this case the correct way to create the key is [pid + birth time]. Actually, we also need to used the cpu ID to be able to distinguish the case of PID = 0, as we do it in the CFV. This as a side node.

The performance degradation was caused, because I had to create the process String to display every time the relevant statistic node for the process is updated. This is very costly. To use a Map as you suggested won't help much, because we don't know when the name changes. 

However, your comment made me think about it more. The key to solve this problem without degrading the performance is that we need to know when the name of the process changes. If we know that, then we create the String to be displayed and name of the statistic only if it's necessary. Luckily, when updating the process statistic, we have the LTTngEvent available which contains the event type (= marker name). We can check for the marker "exec" which is the time when the name will change. Then we create the new name String and store it the the statistic node.

I going to provide a patch for that.
Comment 5 Bernd Hufmann CLA 2011-01-12 15:46:42 EST
Created attachment 186676 [details]
Fix for incorrect statistics

This patch contains a fix for the reported problem. The implementation is as described in comment #4.
Comment 6 Francois Chouinard CLA 2011-01-12 15:53:24 EST
Cool!
Comment 7 Bernd Hufmann CLA 2011-01-12 16:35:13 EST
Comment on attachment 186676 [details]
Fix for incorrect statistics

Patch committed
Comment 8 Bernd Hufmann CLA 2011-01-12 16:36:54 EST
Bug fixed with committed patch of attachment 186676 [details]
Comment 9 Francois Chouinard CLA 2011-07-22 14:57:19 EDT
Delivered with 0.8