Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 245314 - Populating variables view with logical structures is really slow
Summary: Populating variables view with logical structures is really slow
Status: VERIFIED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Debug (show other bugs)
Version: 3.4   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.4.2   Edit
Assignee: Samantha Chan CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-26 18:08 EDT by Samantha Chan CLA
Modified: 2009-01-09 16:57 EST (History)
5 users (show)

See Also:
philippe_mulet: pmc_approved+


Attachments
screen cap (286.84 KB, image/jpeg)
2008-08-26 18:09 EDT, Samantha Chan CLA
no flags Details
Patch for DebugContextService. (3.41 KB, patch)
2008-10-07 00:24 EDT, Pawel Piech CLA
no flags Details | Diff
patch (13.44 KB, patch)
2008-12-17 13:08 EST, Darin Wright CLA
no flags Details | Diff
patch (17.35 KB, patch)
2008-12-17 13:58 EST, Darin Wright CLA
no flags Details | Diff
copyright update (20.65 KB, patch)
2009-01-05 11:59 EST, Darin Wright CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Samantha Chan CLA 2008-08-26 18:08:26 EDT
We have our custom logical structures in our product.  Our logical structures make use of evaluations to populate the variables view.  When creating the children of the logical structures, we call:

JDIThread.queueRunnable(runnable) to queue our evaluation runnable.

In the runnable, we call JDIThread.runEvaluation(...) to perform evaluation.

Steps to reproduce problem:
1.  Turn on logical structures
2.  Start testcase that enables our logical structure
3.  Step a few times in testcase, and expand a few variables that use the logical structure
4.  Switch to breakpoints view
5.  Step to clear any possible caching and wait a little for all jobs and things to settle down.
6.  Bring the variables view to the top.

We found that it takes a long time for the variables view to populate if logical structure is enabled and our logical structure is at play.

We used Yourkit to profile and found the following:
1.  Of all the time being spent to populate the variables, half of that time is being spent on scheduling jobs.  Furthermore, the majority of the time here is in the startJob and endJob method.  <See attached screen capture>
2.  Worker.run method is invoked 19 times.  However, JobManager.findBlockJob is called 9063 times and InternalJob.isConflictingJob is called 51,000 times!
3.  I put breakpoints and tracing in the JobManager.findBlockingJob method.  I found that when the JobManager is trying to find a blocking job, 90% of time times, it is trying to find a blocking job for "Check Debug Command" job which is the job used to enable/disable debug actions.

Are we leaking Check Debug Command jobs in the Job Manager?  and that's why we are trying to find conflicting jobs for it so many times?

Could our extra evaluations caused more Check Debug Command jobs to be run and scheduled?  I didn't think that evaluation would cause debug actions to update.
Comment 1 Samantha Chan CLA 2008-08-26 18:09:08 EDT
Created attachment 111012 [details]
screen cap
Comment 2 Pawel Piech CLA 2008-08-26 18:26:59 EDT
Samantha,
Can you turn on model proxy traces to see how often the Debug view content is refreshed?  
Comment 3 Samantha Chan CLA 2008-10-03 16:27:22 EDT
I haven't had a chance to turn on the trace.  However, is it possible to target this for 3.4.2?
Comment 4 Pawel Piech CLA 2008-10-06 23:38:17 EDT
From our discussion a week ago, I got the impression that the problem was that the evaluations resulted in lots of "CHANGED" debug events which in turn would result in deltas with CONTENT flag.  I think that this performance degradation is a side effect of fixing bug 224477, which introduced updating action if a parent of the active context has a CONTENT delta generated.  

Since, it is probably not possible to avoid issuing the content delta (since that would introduce other bugs), the best way to handle this is to try to coalesce the events somewhere in the chain between where the CHANGED debug event is issued and where the job is processed.  The possible places where the coalescing logic could be added are:

1) Where the changed event is issued.
2) In the model proxy, where the changed event is converted into a delta.
3) In DebugCommandService, where the various commands handlers are called as a result of the delta.
4) In the DebugCommand implementation itself.

At first glance, I think the easiest place to implement this would be in the command service.  There the requests to the command handlers are created but never tracked in any way.  It shouldn't be hard to add a map of outstanding requests (keyed with the actions), and with every new request for update of a given action, cancel the previous requests.
Comment 5 Pawel Piech CLA 2008-10-07 00:24:06 EDT
Created attachment 114381 [details]
Patch for DebugContextService.

Here's a quick and dirty patch of coalescing in DebugContextService.  It's actually cancellation and not coalescing, but hopefully handling the canceled jobs would be a lot more efficient than scheduling them.
Comment 6 Samantha Chan CLA 2008-10-07 09:53:58 EDT
Thanks a lot Pawel!  :)

I will give this patch a try and see if it helps.

Samantha
Comment 7 Samantha Chan CLA 2008-10-07 18:05:23 EDT
I tried out the patch, but unfortunately, I don't think it has helped. :(

The bottleneck of the performance problem is at the number of jobs being scheduled to update the debug commands.  Majority of the time is being spent by the Job Manager to figure out if certain jobs are conflicting with each other.  The more jobs being scheduled, the longer it will take for the job manager to figure out what's conflicting.

The patch cancels the command update request.  However, canceling the command update request does not cancel the job that is going to run.  The command request's isCancelled is called inside the job that gets scheduled to update the commands.  I was looking at the job being scheduled in the DebugCommand.canExecute method.  This job gets scheduled regardless if the request is cancelled or not.  And I think by the time the request is cancelled, the job is already scheduled, and we don't really have a handle to cancel it.

I have tried to figure out what's causing all the commands to update and jobs to be scheduled.  We have a variable with two logical children.  For us to populate the two logical children, we run about 15-20 evaluations.  (i.e JDIThread.runEvaluation gets called 20 times.)  

Every time the an evaluation is run, JDT fires a RESUME, EVALUATION_IMPLICIT event, which causes the debug commands to be updated.  For every resume event, 9 jobs get scheduled to update the debug actions.

Every time an evaluation is done, JDT fires a SUSPEND, EVALUATION_IMPLICIT event, which also causes debug commands to be updated.  Again, I found 9 jobs being scheduled to update the actions.

I haven't figured out how to fix this yet.  Maybe we can do a better in our logical structures to not run 20 evaluations.  Perhaps, we can group all debug commands updates into one job, rather than having a job for each type of command update.  Not sure yet.
Comment 8 Pawel Piech CLA 2008-10-07 18:32:01 EDT
(In reply to comment #7)
> ...
> The patch cancels the command update request.  However, canceling the command
> update request does not cancel the job that is going to run.  The command
> request's isCancelled is called inside the job that gets scheduled to update
> the commands.  I was looking at the job being scheduled in the
> DebugCommand.canExecute method.  This job gets scheduled regardless if the
> request is cancelled or not.  And I think by the time the request is cancelled,
> the job is already scheduled, and we don't really have a handle to cancel it.

This is not very surprising.  I implemented the shouldRun() method to check the request cancel status, hoping that the job manager would be designed to take advantage of it before scheduling the job.  A better approach would have been to have a listener for the request cancellation and in turn cancel the job directly.  Canceling a job would remove it from the waiting queue and would hopefully avoid the performance overhead that you're seeing.  Maybe we could revisit my request for a cancel listener: bug 179308 ...

Absent that, I agree that the next best chance is to implement the coalescing logic inside the DebugCommand class.  The command implementation could simply save the a reference to the last job it submitted, and actively cancel it if its corresponding request was canceled.  Or it could re-target that job to service the new request.  Additional synchronization may be needed for the latter.
Comment 9 Darin Wright CLA 2008-11-05 22:01:10 EST
investigate for 3.4.2
Comment 10 Darin Wright CLA 2008-12-17 13:08:19 EST
Created attachment 120733 [details]
patch

This patch cancels associated jobs when new jobs are scheduled. I believe it works as is, but we should probably be careful to uphold the contract of calling cancel()/done() on each request even if it is canceled before it's update job is run. We could do this with a job listener.
Comment 11 Pawel Piech CLA 2008-12-17 13:35:48 EST
Well that's an interesting way to cancel the job.  I've never would have thought of looking at what JobManager has to offer.  I guess the listener will need to be on the job manager, because looking at the implementation it doesn't seem that a listener to the job itself will be called.
Comment 12 Darin Wright CLA 2008-12-17 13:58:43 EST
Created attachment 120738 [details]
patch

This adds canel/done handling for canceled jobs.
Comment 13 Darin Wright CLA 2009-01-05 11:59:36 EST
Created attachment 121543 [details]
copyright update

Updated patch with 2009 copyrights.
Comment 14 Darin Wright CLA 2009-01-05 12:23:08 EST
Released fix to HEAD, requesting approval for 3.4.2.
Comment 15 Philipe Mulet CLA 2009-01-06 11:18:14 EST
+1 for 3.4.2
Comment 16 Darin Wright CLA 2009-01-06 13:08:55 EST
Released to 3.4.2
Comment 17 Darin Wright CLA 2009-01-06 13:09:18 EST
Please verify, Sam. (Will be in Jan 7 M-build).
Comment 18 Samantha Chan CLA 2009-01-09 16:57:20 EST
I reran the scenario with Yourkit, and found that majority of time is now spent on waiting for responses from the JVM.  Amount of time spent on trying to figure out if a job should be run has been significantly reduced.  There is still a lot of flashing on the screen, as a result of the actions being updated.  But that's another problem that needs to be looked at.

Marking this as verified.