Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 360052 - New API for monitoring UI delays
Summary: New API for monitoring UI delays
Status: VERIFIED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: SWT (show other bugs)
Version: 3.7   Edit
Hardware: All All
: P3 enhancement with 1 vote (vote)
Target Milestone: 4.4 M2   Edit
Assignee: Bogdan Gheorghe CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 79503 (view as bug list)
Depends on:
Blocks: 441015
  Show dependency tree
 
Reported: 2011-10-05 18:10 EDT by Terry Parker CLA
Modified: 2014-09-18 19:16 EDT (History)
16 users (show)

See Also:


Attachments
Proposed new API for monitoring UI delays (8.92 KB, patch)
2011-10-07 15:47 EDT, Terry Parker CLA
no flags Details | Diff
Proposal #2: instrumenting events (34.95 KB, patch)
2013-06-10 11:58 EDT, Steve Foreman CLA
no flags Details | Diff
Proposal #3 - Remove multiple registration (34.95 KB, patch)
2013-06-27 15:13 EDT, Steve Foreman CLA
no flags Details | Diff
Diff of proposal #2 and #3 (8.37 KB, patch)
2013-06-27 15:14 EDT, Steve Foreman CLA
no flags Details | Diff
Proposal #3 - Remove multiple registration (for real this time) (35.66 KB, patch)
2013-07-17 12:29 EDT, Steve Foreman CLA
no flags Details | Diff
Proposal #3 - Remove multiple registration (for real this time) (35.61 KB, patch)
2013-07-17 12:31 EDT, Steve Foreman CLA
no flags Details | Diff
Another small tweak (992 bytes, patch)
2013-07-31 18:38 EDT, Steve Foreman CLA
no flags Details | Diff
Proposal #4 - Doesn't require Java 1.7 just for Integer.compare (35.61 KB, patch)
2013-07-31 18:39 EDT, Steve Foreman CLA
no flags Details | Diff
Proposal #4 - Doesn't require Java 1.7 just for Integer.compare (35.59 KB, patch)
2013-07-31 18:42 EDT, Steve Foreman CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Terry Parker CLA 2011-10-05 18:10:32 EDT
Build Identifier: I20110613-1736

Sometimes Eclipse can become unresponsive. It would be useful to have a lightweight Eclipse API to facilitate monitoring events that lock up the Eclipse UI.

To be lightweight I would propose that the API only store transient information. Gathering and logging data about Eclipse unresponsiveness would be the responsibility of a watchdog timer task.  A couple of useful things that the API can provide are:

1) instrument the UI thread to record when the currently dispatched event was started
2) instrument the Blocked Jobs dialog to record what caused it to get popped up

I've been working on a patch and will upload it when it is a little better shape.

Reproducible: Always

Steps to Reproduce:
None, this is a feature request.
Comment 1 Terry Parker CLA 2011-10-07 15:47:55 EDT
Created attachment 204775 [details]
Proposed new API for monitoring UI delays
Comment 2 Boris Bokowski CLA 2011-10-25 15:29:00 EDT
It would be good if API along these lines could be added, ideally to 3.8 and 4.2. The purpose of the new API is to enable automatic monitoring by additional plug-ins, for detecting pauses and hangs, and capturing information like a thread dump for further analysis. Delays and UI hangs are a major pain point when using Eclipse within Google.

Adding John and Silenio for their take - would it make sense to add API to SWT instead of the Workbench for tracking the time when event processing has started? Would it be easier to expose information about blocked jobs, and in particular the UI thread being blocked by another job, at the level of the Jobs framework?
Comment 3 Paul Webster CLA 2011-11-02 13:54:45 EDT
Can we comment on this in M4?

PW
Comment 4 Boris Bokowski CLA 2011-11-09 07:15:13 EST
A short response from John and Silenio would be a great first step :-)
Comment 5 John Arthorne CLA 2011-11-09 17:13:04 EST
The general approach of having a minimal API that exposes transitory information to be collected by another plugin sounds reasonable. I wonder if this functionality could be implemented using the existing PerformanceStats API in org.eclipse.core.runtime rather than a special new API just for blocked job events.

Personally I find when things are blocked, the only way to get really reliable data is from a stack dump or JVM level event monitor like jvisualvm. Saying I am blocked by "X" isn't always helpful if it turns out that "X" is actually blocked by "Y" which is blocked by "Z". However for ongoing monitoring in the background I guess you need that trigger in the application code to say, "the event loop is blocked, now would be a good time to capture profile data".
Comment 6 John Arthorne CLA 2011-11-09 17:14:00 EST
FWIW here is a document giving background on the PerformanceStats stuff:

http://eclipse.org/eclipse/platform-core/documents/3.1/perf_stats.html
Comment 7 Sergey Prigogin CLA 2011-12-20 18:58:50 EST
(In reply to comment #5)
PerformanceStats API doesn't address the problem that we are are trying to solve with the proposed patch. http://eclipse.org/eclipse/platform-core/documents/3.1/perf_stats.html in "Instrumenting Code" section says: "The first step to using PerformanceStats is to instrument potentially long-running code." The whole point of the proposed patch is that it allows for discovery of slowness in unexpected places. Obviously, such places cannot be instrumented before they are discovered.

In our custom code that will be based on the proposed API we are going to log stacks of all threads, not just the blocked ones. The stacks will be sampled over regular time intervals more or less uniformly throughout the duration of blockage.
Comment 8 Sergey Prigogin CLA 2012-01-06 14:01:55 EST
John, do you have any objections against applying the patch?
Comment 9 Paul Webster CLA 2012-01-06 15:05:01 EST
How does this work if the operation in question doesn't deal with a BlockedJobsDialog?

PW
Comment 10 Terry Parker CLA 2012-01-06 16:16:28 EST
(In reply to comment #9)
> How does this work if the operation in question doesn't deal with a
> BlockedJobsDialog?
> 
> PW

I'm not sure I understand the scenario you are anticipating.  This patch allows a monitoring plug-in to catch the following two cases:

1) The UI thread itself is unresponsive, by watching the changing state of MonitoringUtil.getCurrentUIEventStartTimeInMilliseconds()
2) A user-initiated operation results in the BlockedJobsDialog being displayed

The BlockedJobsDialog commonly gets displayed when code in the UI thread wants to directly run or join a job that is blocked.  If code that wants to run or join a job forgets to do so, wouldn't is just block the UI thread, which is covered by case #1?
Comment 11 John Arthorne CLA 2012-01-09 16:53:31 EST
The API and patch look fine to me. I guess currentEventThread should be nulled when the event loop exits (minor detail).
Comment 12 Paul Webster CLA 2012-01-10 09:46:13 EST
If John is fine with the API, then it's OK :-)

This patch should be against R3_development.  See http://wiki.eclipse.org/Platform_UI/How_to_Contribute for what to get from where.

We also need an implementation for R4.2 before this can go in, branch master in eclipse.platform.ui.git

PW
Comment 13 Sergey Prigogin CLA 2012-01-11 20:26:20 EST
(In reply to comment #12)
> This patch should be against R3_development.

The patch cleanly applies to R3_development branch.
 
> We also need an implementation for R4.2 before this can go in, branch master in
> eclipse.platform.ui.git

In 4.2 the event loop was moved to org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine.run(MApplicationElement, IEclipseContext). To expose currentEventStartTimeInMilliseconds and currentEventThread to the outside world we would have to add accessors for them to IPresentationEngine interface. This looks like an incompatible API change since IPresentationEngine doesn't have @noimplement annotation. Surprisingly, API tooling doesn't give any warning when I add the accessors. Please let us know if adding getCurrentEventStartTimeInMilliseconds() and getCurrentEventThread() to IPresentationEngine is acceptable.
Comment 14 Paul Webster CLA 2012-01-12 08:37:02 EST
(In reply to comment #13)
> org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine.run(MApplicationElement,
> IEclipseContext). To expose currentEventStartTimeInMilliseconds and
> currentEventThread to the outside world we would have to add accessors for them
> to IPresentationEngine interface. This looks like an incompatible API change
> since IPresentationEngine doesn't have @noimplement annotation. Surprisingly,
> API tooling doesn't give any warning when I add the accessors. Please let us
> know if adding getCurrentEventStartTimeInMilliseconds() and
> getCurrentEventThread() to IPresentationEngine is acceptable.

On further reflection, there are a number of places where display.readAndDispatch() are called (workbench, different in 4.2, dialogs, etc).  It seems to me that the timing and thread information should come from the SWT Display, no?


If we wanted API with the current pattern, I would suggest adding an interface in org.eclipse.jface that can provide the useful information.  Then owner of the event loop can register an OSGi service which can be queried.  Both event loop owners  use JFace, and it's the place for common UI patterns used in the workbench.  That interface could return a more generic BlockedTaskInfo if available, that BlockedJobsDialog could fill in for the o.e.ui.workbench case.

But I'm still not a big fan of the static methods in o.e.ui.workbench.  If we provide the JFace interface and OSGi service that could be used instead of the static class.

Is it worth inverting the interface and having the monitoring class provide the OSGi service, which could be fetched and the information fed into?  Although I guess that would be a lot of calls to setCurrentUIEventStartTimeInMilliseconds(long)

PW
Comment 15 Paul Webster CLA 2012-01-19 21:18:36 EST
Next week is M5 milestone week, so changes should ideally be done by Friday.

PW
Comment 16 Terry Parker CLA 2012-01-20 20:56:22 EST
(In reply to comment #15)
> Next week is M5 milestone week, so changes should ideally be done by Friday.
> 
Paul, you made some very good points and I've had other work stacked up, so unfortunately no new patch for M5.

I haven't spent time yet looking at the e4 workbench architecture, so could you give some additional guidance by further sketching out the possible solutions?  Do you have any new thoughts on how to best structure this monitoring service?
Comment 17 Paul Webster CLA 2012-01-22 21:06:32 EST
How about something like:

org.eclipse.jface.monitoring.ITimingService {

long getCurrentUIEventStartTimeInMilliseconds();

Thread getCurrentEventThread();

TaskInfo getTaskInfo();

}

Your TaskInfo can be the POJO with the same information (that could be filled in by the blocked task dialog in the 3.8 workbench, and maybe a Job info from a UIJob if running from the IJobManager if available in 4.2).

Then have the implementation be registered as an OSGi service by the main event loop runner, the Workbench in 3.8 and org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine in 4.2.


PW
Comment 18 Paul Webster CLA 2012-02-14 11:50:49 EST
Just a heads up, M6 is the last milestone for API and we only have <3 weeks left (that has to include a couple of round trips).

PW
Comment 19 Terry Parker CLA 2012-02-15 20:50:54 EST
(In reply to comment #18)
> Just a heads up, M6 is the last milestone for API and we only have <3 weeks
> left (that has to include a couple of round trips).
> 
> PW

Please retarget this bug for 4.3M1.  I'll include a reference implementation of a monitoring plug-in that will report the UI delays.  My plan is to get something in place before 4.2 is released, but I'll have a learning curve for the e4 implementation and won't be able to supply a high-quality patch before the 4.2M6 deadline.
Comment 20 Paul Webster CLA 2012-02-16 07:22:33 EST
deferred until Kepler

PW
Comment 21 Lars Vogel CLA 2012-10-03 00:11:15 EDT
@Terry: Any news on the implementation? There are lots of complaints about Eclipse performance and a way to trace it would be useful.
Comment 22 Terry Parker CLA 2012-10-03 11:24:49 EDT
(In reply to comment #21)
> @Terry: Any news on the implementation? There are lots of complaints about
> Eclipse performance and a way to trace it would be useful.

No, I haven't had a chance to investigate how to do the performance monitoring in e4 (and I had planned to by this point). The way this quarter is shaping up I'm at least a few weeks away from looking at it, unfortunately.

Inside Google we have deployed a feature patch to 3.8 based on the patch in this bug and are successfully using it to log thousands of UI freezes per day. Paul is right that the event loop monitoring in the current patch is not at the right level--we log dispatches to dialogs that run their own event loops and just need to assume that the UI is responsive in those cases. Even so we get a lot of very useful data.

If anyone has time to adapt the proposed API for e4, I can very quickly provide a reference implementation of a monitoring plug-in that uses the API to gather stack traces and log them.
Comment 23 Paul Webster CLA 2013-03-28 12:07:34 EDT
Another possible pattern for monitoring UI events on top of your instrumenting readAndDispatch()

1) add Display filters for the events that you are interested in.  You'll get a call right before each event is dispatched to the widget listeners.

2) Create a new SWT.EventType and then add a send to org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(boolean) 

Event event = new Event();
event.data = lock.runnable;
display.sendEvent(SWT.Exec, event);

This will fire an event on every a/syncExec

PW
Comment 24 Terry Parker CLA 2013-03-28 14:27:26 EDT
(In reply to comment #23)
> Another possible pattern for monitoring UI events on top of your
> instrumenting readAndDispatch()
> 
> 1) add Display filters for the events that you are interested in.  You'll
> get a call right before each event is dispatched to the widget listeners.
> 
> 2) Create a new SWT.EventType and then add a send to
> org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(boolean) 
> 
> Event event = new Event();
> event.data = lock.runnable;
> display.sendEvent(SWT.Exec, event);
> 
> This will fire an event on every a/syncExec
> 
> PW

This sounds good.  Since Display filtering gives us a callback immediately before dispatching an event, we have two issues we need to solve:
A) Knowing when processing of an event is finished.
B) Knowing whether the next event we see is occurring at the same level as the previous one, or whether it is the result of a nested Display.readAndDispatch() call.

To handle "A", for the resolution we need when capturing UI freezes, we can use the start of the next event as the end of the previous event.  This means we will need to register our listener with Display.addFilter() for all event types and will need to add a new SWT.All event type.

To handle "B", we will need to keep a depth counter in Display.readAndDispatch() and expose it through a public Display.getReadAndDispatchDepthCount() function.

How does that sound?
Comment 25 Steve Foreman CLA 2013-06-10 11:58:00 EDT
Created attachment 232187 [details]
Proposal #2: instrumenting events

Here is our current proposal.

We decided to avoid API changes in Display because it has many platform-dependent implementations. Instead, we instrumented 2 common pieces: the Synchronizer (for a/syncExec) and the EventTable (for system events).

The instrumentation handles timing and collecting some basic information in real time and uses callbacks to inform clients of state changes.

We've also included a sample plugin to demonstrate the API and to serve as a sample for others wanting to extend the functionality. The API is very simple: the Synchronizer exposes the registerWatchdog method that takes a set of callbacks and a duration threshold. The instrumentation handles timing events and calling the appropriate client callbacks. Multiple clients may be registered independently, each with its own threshold. All callbacks are made on the UI thread so clients don't need extra memory synchronization when sharing data between them.

The implementation has been designed to have minimal impact when no events have exceeded any client duration threshold. When a long event does occur, performance is no longer a critical issue so long operations (e.g. taking stack traces and I/O) are acceptable. Similar to the rest of SWT, we've removed the costs of memory synchronization by requiring all actions to happen on the UI thread.
Comment 26 Steve Foreman CLA 2013-06-19 12:26:09 EDT
Any comments or feedback on our proposed patch?
Comment 27 Paul Webster CLA 2013-06-20 12:15:27 EDT
(In reply to comment #26)
> Any comments or feedback on our proposed patch?

Hi Steve,

I hope to review this by next Tuesday.

PW
Comment 28 Paul Webster CLA 2013-06-25 09:34:10 EDT
Hi Steve, that definitely seems reasonable.  I'm sending it to SWT for them to review since that's where the code lives.

The only comment I have is w.r.t. adding and removing the listeners.  Is there a reason you allow the same listener instance multiple times?  If not, I'd recommend making a second add a no-op, that's how most of our listener lists work.

PW
Comment 29 Steve Foreman CLA 2013-06-25 13:29:12 EDT
Thanks Paul.

I had thought about ignoring multiple registrations, but it leads to weird cases in the API. We want to allow the same watchdog to be registered with multiple thresholds, which would create a weird case for the user. For example, you could register the same handler at threshold T and again at T+dt to create a filter for events of duration in [T, T+dt) unless dt==0.

While it doesn't make a lot of sense to allow multiple registrations at the same threshold, it seems like a weird case to ignore given that we allow multiple registrations at different thresholds.

For the sake of consistency we could disallowing multiple registrations altogether. Subsequent registrations could either ignore the new instance (leaving the existing threshold unchanged) or update the threshold to the newly registered value, effectively unregistering the previous instance. Both of these seem consistent with existing API, but the latter (updating the threshold) seems better from a user perspective.
Comment 30 Paul Webster CLA 2013-06-25 14:20:05 EDT
(In reply to comment #29)
> Thanks Paul.
> 
> I had thought about ignoring multiple registrations, but it leads to weird
> cases in the API. We want to allow the same watchdog to be registered with
> multiple thresholds, which would create a weird case for the user. For
> example, you could register the same handler at threshold T and again at
> T+dt to create a filter for events of duration in [T, T+dt) unless dt==0.

It sounds like it's allowed for a reason, which addresses my concern.

PW
Comment 31 Steve Foreman CLA 2013-06-27 15:13:02 EDT
Created attachment 232861 [details]
Proposal #3 - Remove multiple registration

Disallowing multiple registrations (even at different thresholds) doesn't cost any functionality since you can always just use 2 small helper objects with a reference to some shared master to do the same thing, so for the sake of consistency with the rest of the API, I removed it.
Comment 32 Steve Foreman CLA 2013-06-27 15:14:32 EDT
Created attachment 232862 [details]
Diff of proposal #2 and #3
Comment 33 Steve Foreman CLA 2013-07-10 15:34:10 EDT
Any comments from the SWT team?
Comment 34 Silenio Quarti CLA 2013-07-10 16:03:12 EDT
(In reply to comment #33)
> Any comments from the SWT team?

Sorry, we have not had a chance to review the patch yet. It should happen in the next couple of weeks.
Comment 35 Steve Foreman CLA 2013-07-17 12:29:09 EDT
Created attachment 233568 [details]
Proposal #3 - Remove multiple registration (for real this time)

Terry and I noticed that I had accidently re-uploaded proposal #2 as #3.
The diff between 2 and 3 was correct, and now the #3 patch should be as well.
Comment 36 Steve Foreman CLA 2013-07-17 12:31:34 EDT
Created attachment 233569 [details]
Proposal #3 - Remove multiple registration (for real this time)
Comment 37 Steve Foreman CLA 2013-07-31 18:38:10 EDT
Created attachment 233999 [details]
Another small tweak

I didn't realize that Integer.compare was new in Java 1.7, so I removed it.
Comment 38 Steve Foreman CLA 2013-07-31 18:39:03 EDT
Created attachment 234000 [details]
Proposal #4 - Doesn't require Java 1.7 just for Integer.compare
Comment 39 Steve Foreman CLA 2013-07-31 18:42:30 EDT
Created attachment 234001 [details]
Proposal #4 - Doesn't require Java 1.7 just for Integer.compare

New, up-to-date patch with all changes so far
Comment 40 Steve Foreman CLA 2013-08-07 12:18:31 EDT
Any feedback?
Comment 41 Lars Vogel CLA 2013-08-08 08:41:07 EDT
Steve, could you upload this as a Gerrit review, if you are familiar with Gerrit this should not require a lot of work on your side, I hope.
Comment 42 Steve Foreman CLA 2013-08-08 12:56:25 EDT
All set to push to Gerrit, but waiting for my CLA to propagate.
Comment 43 Steve Foreman CLA 2013-08-08 13:54:38 EDT
https://git.eclipse.org/r/15260

Should I add anyone specific as a reviewer?
Comment 44 Lars Vogel CLA 2013-08-09 05:03:27 EDT
I did a first pass of the review and suggest some style changes. I suggest to add Bogdan Gheorghe and Silenio Quarti as reviewer once you reviewed the suggestions and acted upon them.
Comment 45 Steve Foreman CLA 2013-08-12 13:26:45 EDT
I've implemented all of your proposed changes and added both Bogdan and Silenio as reviewers.

Thanks Lars.
Comment 46 Bogdan Gheorghe CLA 2013-09-11 15:37:44 EDT
Patch released. Thanks Steve!
Comment 47 Lars Vogel CLA 2013-09-12 04:22:25 EDT
I suggest to add this item to the Note&Newsworthy in M2.
Comment 48 Arun Thondapu CLA 2013-09-12 07:42:54 EDT
(In reply to Bogdan Gheorghe from comment #46)
> Patch released. Thanks Steve!

This patch is causing a compile error (and consequent build failure) in Test_org_eclipse_swt_widgets_Display.java, looks like this class was not reworked after modifying the earlier versions of the patches.
Comment 49 Bogdan Gheorghe CLA 2013-09-12 12:04:10 EDT
Updated test to reflect new API.
Comment 50 Lars Vogel CLA 2014-02-17 12:41:51 EST
Looks like you forgot to include the plugin.xml into build.properties. 

Fix: https://git.eclipse.org/r/22113
Comment 51 Arun Thondapu CLA 2014-02-25 08:12:03 EST
(In reply to Lars Vogel from comment #50)
> Looks like you forgot to include the plugin.xml into build.properties. 
> 
> Fix: https://git.eclipse.org/r/22113

Thanks for the patch Lars, but this is apparently fixed via http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=897b92cc4a46c61a007f4f4d92e4bdd69c3ae557
Comment 52 Sergey Prigogin CLA 2014-08-01 19:57:41 EDT
Bug 441015 proposes a logical next step - out-of-the-box monitoring of UI delays in Platform UI.
Comment 53 Lars Vogel CLA 2014-09-02 06:07:15 EDT
.
Comment 54 Sergey Prigogin CLA 2014-09-18 19:16:15 EDT
*** Bug 79503 has been marked as a duplicate of this bug. ***