Community
Participate
Working Groups
Build Identifier: I20110613-1736 Sometimes Eclipse can become unresponsive. It would be useful to have a lightweight Eclipse API to facilitate monitoring events that lock up the Eclipse UI. To be lightweight I would propose that the API only store transient information. Gathering and logging data about Eclipse unresponsiveness would be the responsibility of a watchdog timer task. A couple of useful things that the API can provide are: 1) instrument the UI thread to record when the currently dispatched event was started 2) instrument the Blocked Jobs dialog to record what caused it to get popped up I've been working on a patch and will upload it when it is a little better shape. Reproducible: Always Steps to Reproduce: None, this is a feature request.
Created attachment 204775 [details] Proposed new API for monitoring UI delays
It would be good if API along these lines could be added, ideally to 3.8 and 4.2. The purpose of the new API is to enable automatic monitoring by additional plug-ins, for detecting pauses and hangs, and capturing information like a thread dump for further analysis. Delays and UI hangs are a major pain point when using Eclipse within Google. Adding John and Silenio for their take - would it make sense to add API to SWT instead of the Workbench for tracking the time when event processing has started? Would it be easier to expose information about blocked jobs, and in particular the UI thread being blocked by another job, at the level of the Jobs framework?
Can we comment on this in M4? PW
A short response from John and Silenio would be a great first step :-)
The general approach of having a minimal API that exposes transitory information to be collected by another plugin sounds reasonable. I wonder if this functionality could be implemented using the existing PerformanceStats API in org.eclipse.core.runtime rather than a special new API just for blocked job events. Personally I find when things are blocked, the only way to get really reliable data is from a stack dump or JVM level event monitor like jvisualvm. Saying I am blocked by "X" isn't always helpful if it turns out that "X" is actually blocked by "Y" which is blocked by "Z". However for ongoing monitoring in the background I guess you need that trigger in the application code to say, "the event loop is blocked, now would be a good time to capture profile data".
FWIW here is a document giving background on the PerformanceStats stuff: http://eclipse.org/eclipse/platform-core/documents/3.1/perf_stats.html
(In reply to comment #5) PerformanceStats API doesn't address the problem that we are are trying to solve with the proposed patch. http://eclipse.org/eclipse/platform-core/documents/3.1/perf_stats.html in "Instrumenting Code" section says: "The first step to using PerformanceStats is to instrument potentially long-running code." The whole point of the proposed patch is that it allows for discovery of slowness in unexpected places. Obviously, such places cannot be instrumented before they are discovered. In our custom code that will be based on the proposed API we are going to log stacks of all threads, not just the blocked ones. The stacks will be sampled over regular time intervals more or less uniformly throughout the duration of blockage.
John, do you have any objections against applying the patch?
How does this work if the operation in question doesn't deal with a BlockedJobsDialog? PW
(In reply to comment #9) > How does this work if the operation in question doesn't deal with a > BlockedJobsDialog? > > PW I'm not sure I understand the scenario you are anticipating. This patch allows a monitoring plug-in to catch the following two cases: 1) The UI thread itself is unresponsive, by watching the changing state of MonitoringUtil.getCurrentUIEventStartTimeInMilliseconds() 2) A user-initiated operation results in the BlockedJobsDialog being displayed The BlockedJobsDialog commonly gets displayed when code in the UI thread wants to directly run or join a job that is blocked. If code that wants to run or join a job forgets to do so, wouldn't is just block the UI thread, which is covered by case #1?
The API and patch look fine to me. I guess currentEventThread should be nulled when the event loop exits (minor detail).
If John is fine with the API, then it's OK :-) This patch should be against R3_development. See http://wiki.eclipse.org/Platform_UI/How_to_Contribute for what to get from where. We also need an implementation for R4.2 before this can go in, branch master in eclipse.platform.ui.git PW
(In reply to comment #12) > This patch should be against R3_development. The patch cleanly applies to R3_development branch. > We also need an implementation for R4.2 before this can go in, branch master in > eclipse.platform.ui.git In 4.2 the event loop was moved to org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine.run(MApplicationElement, IEclipseContext). To expose currentEventStartTimeInMilliseconds and currentEventThread to the outside world we would have to add accessors for them to IPresentationEngine interface. This looks like an incompatible API change since IPresentationEngine doesn't have @noimplement annotation. Surprisingly, API tooling doesn't give any warning when I add the accessors. Please let us know if adding getCurrentEventStartTimeInMilliseconds() and getCurrentEventThread() to IPresentationEngine is acceptable.
(In reply to comment #13) > org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine.run(MApplicationElement, > IEclipseContext). To expose currentEventStartTimeInMilliseconds and > currentEventThread to the outside world we would have to add accessors for them > to IPresentationEngine interface. This looks like an incompatible API change > since IPresentationEngine doesn't have @noimplement annotation. Surprisingly, > API tooling doesn't give any warning when I add the accessors. Please let us > know if adding getCurrentEventStartTimeInMilliseconds() and > getCurrentEventThread() to IPresentationEngine is acceptable. On further reflection, there are a number of places where display.readAndDispatch() are called (workbench, different in 4.2, dialogs, etc). It seems to me that the timing and thread information should come from the SWT Display, no? If we wanted API with the current pattern, I would suggest adding an interface in org.eclipse.jface that can provide the useful information. Then owner of the event loop can register an OSGi service which can be queried. Both event loop owners use JFace, and it's the place for common UI patterns used in the workbench. That interface could return a more generic BlockedTaskInfo if available, that BlockedJobsDialog could fill in for the o.e.ui.workbench case. But I'm still not a big fan of the static methods in o.e.ui.workbench. If we provide the JFace interface and OSGi service that could be used instead of the static class. Is it worth inverting the interface and having the monitoring class provide the OSGi service, which could be fetched and the information fed into? Although I guess that would be a lot of calls to setCurrentUIEventStartTimeInMilliseconds(long) PW
Next week is M5 milestone week, so changes should ideally be done by Friday. PW
(In reply to comment #15) > Next week is M5 milestone week, so changes should ideally be done by Friday. > Paul, you made some very good points and I've had other work stacked up, so unfortunately no new patch for M5. I haven't spent time yet looking at the e4 workbench architecture, so could you give some additional guidance by further sketching out the possible solutions? Do you have any new thoughts on how to best structure this monitoring service?
How about something like: org.eclipse.jface.monitoring.ITimingService { long getCurrentUIEventStartTimeInMilliseconds(); Thread getCurrentEventThread(); TaskInfo getTaskInfo(); } Your TaskInfo can be the POJO with the same information (that could be filled in by the blocked task dialog in the 3.8 workbench, and maybe a Job info from a UIJob if running from the IJobManager if available in 4.2). Then have the implementation be registered as an OSGi service by the main event loop runner, the Workbench in 3.8 and org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine in 4.2. PW
Just a heads up, M6 is the last milestone for API and we only have <3 weeks left (that has to include a couple of round trips). PW
(In reply to comment #18) > Just a heads up, M6 is the last milestone for API and we only have <3 weeks > left (that has to include a couple of round trips). > > PW Please retarget this bug for 4.3M1. I'll include a reference implementation of a monitoring plug-in that will report the UI delays. My plan is to get something in place before 4.2 is released, but I'll have a learning curve for the e4 implementation and won't be able to supply a high-quality patch before the 4.2M6 deadline.
deferred until Kepler PW
@Terry: Any news on the implementation? There are lots of complaints about Eclipse performance and a way to trace it would be useful.
(In reply to comment #21) > @Terry: Any news on the implementation? There are lots of complaints about > Eclipse performance and a way to trace it would be useful. No, I haven't had a chance to investigate how to do the performance monitoring in e4 (and I had planned to by this point). The way this quarter is shaping up I'm at least a few weeks away from looking at it, unfortunately. Inside Google we have deployed a feature patch to 3.8 based on the patch in this bug and are successfully using it to log thousands of UI freezes per day. Paul is right that the event loop monitoring in the current patch is not at the right level--we log dispatches to dialogs that run their own event loops and just need to assume that the UI is responsive in those cases. Even so we get a lot of very useful data. If anyone has time to adapt the proposed API for e4, I can very quickly provide a reference implementation of a monitoring plug-in that uses the API to gather stack traces and log them.
Another possible pattern for monitoring UI events on top of your instrumenting readAndDispatch() 1) add Display filters for the events that you are interested in. You'll get a call right before each event is dispatched to the widget listeners. 2) Create a new SWT.EventType and then add a send to org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(boolean) Event event = new Event(); event.data = lock.runnable; display.sendEvent(SWT.Exec, event); This will fire an event on every a/syncExec PW
(In reply to comment #23) > Another possible pattern for monitoring UI events on top of your > instrumenting readAndDispatch() > > 1) add Display filters for the events that you are interested in. You'll > get a call right before each event is dispatched to the widget listeners. > > 2) Create a new SWT.EventType and then add a send to > org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(boolean) > > Event event = new Event(); > event.data = lock.runnable; > display.sendEvent(SWT.Exec, event); > > This will fire an event on every a/syncExec > > PW This sounds good. Since Display filtering gives us a callback immediately before dispatching an event, we have two issues we need to solve: A) Knowing when processing of an event is finished. B) Knowing whether the next event we see is occurring at the same level as the previous one, or whether it is the result of a nested Display.readAndDispatch() call. To handle "A", for the resolution we need when capturing UI freezes, we can use the start of the next event as the end of the previous event. This means we will need to register our listener with Display.addFilter() for all event types and will need to add a new SWT.All event type. To handle "B", we will need to keep a depth counter in Display.readAndDispatch() and expose it through a public Display.getReadAndDispatchDepthCount() function. How does that sound?
Created attachment 232187 [details] Proposal #2: instrumenting events Here is our current proposal. We decided to avoid API changes in Display because it has many platform-dependent implementations. Instead, we instrumented 2 common pieces: the Synchronizer (for a/syncExec) and the EventTable (for system events). The instrumentation handles timing and collecting some basic information in real time and uses callbacks to inform clients of state changes. We've also included a sample plugin to demonstrate the API and to serve as a sample for others wanting to extend the functionality. The API is very simple: the Synchronizer exposes the registerWatchdog method that takes a set of callbacks and a duration threshold. The instrumentation handles timing events and calling the appropriate client callbacks. Multiple clients may be registered independently, each with its own threshold. All callbacks are made on the UI thread so clients don't need extra memory synchronization when sharing data between them. The implementation has been designed to have minimal impact when no events have exceeded any client duration threshold. When a long event does occur, performance is no longer a critical issue so long operations (e.g. taking stack traces and I/O) are acceptable. Similar to the rest of SWT, we've removed the costs of memory synchronization by requiring all actions to happen on the UI thread.
Any comments or feedback on our proposed patch?
(In reply to comment #26) > Any comments or feedback on our proposed patch? Hi Steve, I hope to review this by next Tuesday. PW
Hi Steve, that definitely seems reasonable. I'm sending it to SWT for them to review since that's where the code lives. The only comment I have is w.r.t. adding and removing the listeners. Is there a reason you allow the same listener instance multiple times? If not, I'd recommend making a second add a no-op, that's how most of our listener lists work. PW
Thanks Paul. I had thought about ignoring multiple registrations, but it leads to weird cases in the API. We want to allow the same watchdog to be registered with multiple thresholds, which would create a weird case for the user. For example, you could register the same handler at threshold T and again at T+dt to create a filter for events of duration in [T, T+dt) unless dt==0. While it doesn't make a lot of sense to allow multiple registrations at the same threshold, it seems like a weird case to ignore given that we allow multiple registrations at different thresholds. For the sake of consistency we could disallowing multiple registrations altogether. Subsequent registrations could either ignore the new instance (leaving the existing threshold unchanged) or update the threshold to the newly registered value, effectively unregistering the previous instance. Both of these seem consistent with existing API, but the latter (updating the threshold) seems better from a user perspective.
(In reply to comment #29) > Thanks Paul. > > I had thought about ignoring multiple registrations, but it leads to weird > cases in the API. We want to allow the same watchdog to be registered with > multiple thresholds, which would create a weird case for the user. For > example, you could register the same handler at threshold T and again at > T+dt to create a filter for events of duration in [T, T+dt) unless dt==0. It sounds like it's allowed for a reason, which addresses my concern. PW
Created attachment 232861 [details] Proposal #3 - Remove multiple registration Disallowing multiple registrations (even at different thresholds) doesn't cost any functionality since you can always just use 2 small helper objects with a reference to some shared master to do the same thing, so for the sake of consistency with the rest of the API, I removed it.
Created attachment 232862 [details] Diff of proposal #2 and #3
Any comments from the SWT team?
(In reply to comment #33) > Any comments from the SWT team? Sorry, we have not had a chance to review the patch yet. It should happen in the next couple of weeks.
Created attachment 233568 [details] Proposal #3 - Remove multiple registration (for real this time) Terry and I noticed that I had accidently re-uploaded proposal #2 as #3. The diff between 2 and 3 was correct, and now the #3 patch should be as well.
Created attachment 233569 [details] Proposal #3 - Remove multiple registration (for real this time)
Created attachment 233999 [details] Another small tweak I didn't realize that Integer.compare was new in Java 1.7, so I removed it.
Created attachment 234000 [details] Proposal #4 - Doesn't require Java 1.7 just for Integer.compare
Created attachment 234001 [details] Proposal #4 - Doesn't require Java 1.7 just for Integer.compare New, up-to-date patch with all changes so far
Any feedback?
Steve, could you upload this as a Gerrit review, if you are familiar with Gerrit this should not require a lot of work on your side, I hope.
All set to push to Gerrit, but waiting for my CLA to propagate.
https://git.eclipse.org/r/15260 Should I add anyone specific as a reviewer?
I did a first pass of the review and suggest some style changes. I suggest to add Bogdan Gheorghe and Silenio Quarti as reviewer once you reviewed the suggestions and acted upon them.
I've implemented all of your proposed changes and added both Bogdan and Silenio as reviewers. Thanks Lars.
Patch released. Thanks Steve!
I suggest to add this item to the Note&Newsworthy in M2.
(In reply to Bogdan Gheorghe from comment #46) > Patch released. Thanks Steve! This patch is causing a compile error (and consequent build failure) in Test_org_eclipse_swt_widgets_Display.java, looks like this class was not reworked after modifying the earlier versions of the patches.
Updated test to reflect new API.
Looks like you forgot to include the plugin.xml into build.properties. Fix: https://git.eclipse.org/r/22113
(In reply to Lars Vogel from comment #50) > Looks like you forgot to include the plugin.xml into build.properties. > > Fix: https://git.eclipse.org/r/22113 Thanks for the patch Lars, but this is apparently fixed via http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=897b92cc4a46c61a007f4f4d92e4bdd69c3ae557
Bug 441015 proposes a logical next step - out-of-the-box monitoring of UI delays in Platform UI.
.
*** Bug 79503 has been marked as a duplicate of this bug. ***