Community
Participate
Working Groups
Created attachment 185728 [details] example trace file When executing a program in Linux from a command shell (e.g. bash), a new process is forked that has a new process ID (PID) and has the process name of the parent shell. The name is changed from bash to the program name only when fs.exec is executed. These 2 processes refer to same process. However, the Statistics View shows distinctive statistics for both processes because the relevant statistics are counted based on the process name. The statistics for this case should be merged (as it is done in LTTV). I attached a trace with such a case. The relevant program is called ./open-write and the PID is 570. The time when the fork happens is 205.868239395 seconds and the fs.exec is done at 205.868588049. In the statistic view under Processes, there are 2 entries with PID 570, one with the name "./open-write - 570" and one with the name "bash - 570".
Created attachment 186569 [details] Proposed patch for bug To merge statistic entries with same process ID (PID) and start time would degrade the performance. I did a prototype and it was ~20% slower. Since performance is a key objective, I propose to not merge these two entries. The proposed patch will display the start time of each process and sort the entries by PID so that these 2 entries are under each other in the statistic tree. To have 2 entries for the reported case has also the advantage that the statistics before and after the fs.exec() call is available.
Hi Francois Could you please review the patch before I commit it? Thank you very much!
Hi Bernd, I had a look at your solution and I have a few comments (this is only for the sake of discussion). First, I understand why you populate the processes sub-tree entries with [pid + name + birth time] but I find it looks a bit heavy (and I wonder if Steve would have done it like this :-) Without completely re-doing the underlying Statistics model, I wonder if it would be possible to use [pid + birth time] instead of [pid + name + birth time] as the key to store counters and then have an extra map from [pid + birth time] to [name]. This way, we could 1) correctly update the stats counters and 2) use the new map to populate the processes sub-tree with [name] only (or [name + pid] which would be event better). At fork() time, the new map would be populated with the parent's name (same as what we have now) and would subsequently be overwritten by the child's name after the fs.exec(). The tricky part would be detect when to update the name but it should be possible to do it efficiently. What do you think?
(In reply to comment #3) > Hi Bernd, > > I had a look at your solution and I have a few comments (this is only for the > sake of discussion). > > First, I understand why you populate the processes sub-tree entries with [pid + > name + birth time] but I find it looks a bit heavy (and I wonder if Steve would > have done it like this :-) > > Without completely re-doing the underlying Statistics model, I wonder if it > would be possible to use [pid + birth time] instead of [pid + name + birth > time] as the key to store counters and then have an extra map from [pid + birth > time] to [name]. This way, we could 1) correctly update the stats counters and > 2) use the new map to populate the processes sub-tree with [name] only (or > [name + pid] which would be event better). > > At fork() time, the new map would be populated with the parent's name (same as > what we have now) and would subsequently be overwritten by the child's name > after the fs.exec(). > > The tricky part would be detect when to update the name but it should be > possible to do it efficiently. > > What do you think? Hi Francois thank you very much for your comments. I agree that in this case the correct way to create the key is [pid + birth time]. Actually, we also need to used the cpu ID to be able to distinguish the case of PID = 0, as we do it in the CFV. This as a side node. The performance degradation was caused, because I had to create the process String to display every time the relevant statistic node for the process is updated. This is very costly. To use a Map as you suggested won't help much, because we don't know when the name changes. However, your comment made me think about it more. The key to solve this problem without degrading the performance is that we need to know when the name of the process changes. If we know that, then we create the String to be displayed and name of the statistic only if it's necessary. Luckily, when updating the process statistic, we have the LTTngEvent available which contains the event type (= marker name). We can check for the marker "exec" which is the time when the name will change. Then we create the new name String and store it the the statistic node. I going to provide a patch for that.
Created attachment 186676 [details] Fix for incorrect statistics This patch contains a fix for the reported problem. The implementation is as described in comment #4.
Cool!
Comment on attachment 186676 [details] Fix for incorrect statistics Patch committed
Bug fixed with committed patch of attachment 186676 [details]
Delivered with 0.8