Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 332590

Summary: [TMF] Deadlock in the Events view
Product: z_Archived Reporter: Francois Chouinard <fchouinard>
Component: LinuxToolsAssignee: Francois Chouinard <fchouinard>
Status: CLOSED FIXED QA Contact: Francois Chouinard <fchouinard>
Severity: normal    
Priority: P3 CC: bernd.hufmann, jjohnstn
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Whiteboard:
Attachments:
Description Flags
Fix for the deadlock
none
Improved fix for the deadlock
fchouinard: iplog-
Optimize table refresh jjohnstn: iplog+

Description Francois Chouinard CLA 2010-12-14 18:37:40 EST
The regular SWT/JFace table can not possibly handle the amount of data in a typical LTTng trace, event with the SWT_VIRTUAL creation flag.

Therefore, we introduced a TmfVirtualTable that can handle an arbitrarily large number of rows (well, up to Integer.MAX_VALUE - or 2.14 billions - for now). The key was to just the actual table relatively small and to populate it on an as-needed basis.

An extra twist was that we don't go to the back-end (events provider) for just one event at a time: when a back-end round-trip is required, we fill a cache whose size is dimensioned so the user doesn't experience a delay that is too noticeable when we have a cache miss.

The problem with the current implementation is that the refreshing of the events cache - a possibly lengthy operation - happens on the UI thread. This not great but wouldn't be so bad if it weren't for a particular interaction with the CDT which can block on the UI when reading events from the GDB back-end.

So we have the following situation:
- The Events view locks the UI thread until its event cache is filled
- The CDT back-end won't return the next event until some tracing data is flushed to the Console
- The Console waits for the UI thread (locked by Events view) to become available

:: Deadlock
Comment 1 Francois Chouinard CLA 2010-12-14 18:52:01 EST
Created attachment 185191 [details]
Fix for the deadlock

The fix is simply to remove the event cache filling from the UI thread and have it performed in a Job.

Every time a row has to be updated, the table's handleEvent() is called. If the requested row is cached, then it is simply returned as before. If not, a call is made to populate the cache. There can be multiple simultaneous calls to this method (in fact, as many as calls to handleEvent() or about once per row displayed).

The cache filling method performs a check to see if the currently running Job (if any) will retrieve the requested row. If so, fine (just let the Job finish). Otherwise (i.e. the user scrolled beyond what will be stored in the cache when the Job completes), kill the Job and start a new one - anyway the fetched events won't be used.

The Job fills the cache and, when it completes, it simply refreshes the table. The UI will call handleEvent() and this time the cache will be properly populated.
Comment 2 Francois Chouinard CLA 2010-12-14 18:53:45 EST
Bernd, could you review? Thanks.
Comment 3 Bernd Hufmann CLA 2010-12-15 09:55:06 EST
(In reply to comment #2)
> Bernd, could you review? Thanks.

Hi Francois

Functionally, the way you addressed the deadlock issue is the right approach. However, there are a few things that needs to be look at. Here are my comments:

* Possible thread synchronization problems
  - There are 3 threads involved: GUI-thread, the job and the thread for data
    request.
  - These threads access the members fCacheStartIndex, fCacheEndIndex and fCache 
    of class TmfEventsTable.
  - The access to these members has to be thread-safe.
* Before executing "fTable.getDisplay().asyncExec(new Runnable() {}" you need 
  to check "if (!fTable.isDisposed()) {}" because the display might have been 
  disposed in the meantime.
* job.cancel() only marks the job as cancelled. It doesn't kill the job if it 
  is in state RUNNING. The job will still run to the end. You have to implement 
  the cancelling by checking monitor.isCanceled(). You could do this check in 
  handleData() method of the request and if it was cancelled call just cancel() 
  on the request. You could also check if the request was cancelled before 
  "fTable.getDisplay().asyncExec(new Runnable() {}" and return e.g. 
  Status.CANCEL_STATUS from the run() method in the cancel case.
* There is a problem when the job is in state WAITING and the populate method 
  is called subsequently. In this case fCacheStartIndex and fCacheEndIndex 
  haven't been set and the job is cancelled for a subsequent call. This is a 
  problem because this method is called once per row. Maybe it's better to set 
  fCacheStartIndex and fCacheEndIndex before creating the job.
* When using the end-button to jump to the end of the table, then the big window 
  of the HistogramView is not updated. I need to press the end-button twice or 
  need to click into the table.

Bernd
Comment 4 Francois Chouinard CLA 2010-12-15 14:31:27 EST
Created attachment 185256 [details]
Improved fix for the deadlock

This is an updated fix based on Bernd's comments. It turns out that there is really as possibility for a racing condition if we have a slow back-ends (like GDB Tracepoints) and a large cache.

A more serious problem occurs at refresh time: the previous patch kept sending the synchronization signal for the current event on the UI thread (again, not great) *after* the refresh job is created but *before* it completes. Therefore, a state timestamp data was used to synchronize the views.

This patch also fixes this issue by delaying the emission of the synch signal until the cache and the table rows have been updated.
Comment 5 Francois Chouinard CLA 2010-12-15 14:38:48 EST
Patch committed.
Comment 6 Bernd Hufmann CLA 2010-12-15 16:58:07 EST
Created attachment 185273 [details]
Optimize table refresh

The original fix lead to too many calls of method notifyUpdatedSelection() in the TmfVirtualTable. With this patch this method is called only if necessary.
Comment 7 Bernd Hufmann CLA 2010-12-15 17:08:21 EST
Comment on attachment 185273 [details]
Optimize table refresh

Patch committed.
Comment 8 Bernd Hufmann CLA 2010-12-15 17:09:10 EST
Comment on attachment 185273 [details]
Optimize table refresh

Patch committed.
Comment 9 Francois Chouinard CLA 2011-07-22 14:55:44 EDT
Delivered with 0.7