Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 330505

Summary: Potential indexing deadlock risk
Product: [Tools] PTP Reporter: John Liu <john_ws_liu>
Component: RDTAssignee: Chris Recoskie <recoskie>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: ptp-inbox, recoskie
Version: unspecified   
Target Milestone: 4.0.6   
Hardware: PC   
OS: Windows XP   
See Also: https://bugs.eclipse.org/bugs/show_bug.cgi?id=330838
Whiteboard:
Attachments:
Description Flags
Applied to rdt.core recoskie: iplog+

Description John Liu CLA 2010-11-17 13:58:48 EST
Build Identifier: 

I have ever experienced indexing deadlock in my testing, although it happens very rarely, but its consequence is very serious - the indexing freezes and RDp loses all of index related functions.

So I spent time to investigate it, I utilize break point to help me to produce a deadlock scenario, as the following steps:

1, Set break point at findCalledBy() function right after it acquires a read lock of the project index, run get call hierarchy, the process stops after it acquires a read lock of the project index.

2, Runs reindex, it will first do the index clearing up, which will need to acquire a write lock of the project index.

3, Reindex will not run until the first call hierarchy dstore command completes, this is good because it helps to keep a sequential command sequence, which prevent deadlock happens. 

4, But if the result of get call hierarchy needs to get its children's caller or callee, it will execute findCalledBy() again and this time, it will breaks the sequence of reindexing command and get call hierarchy command. As a result, we can produce a deadlock like this:

a    findCalledBy() acquires a read lock of the project index.
b    reindex clearing up step tries to acquire a write lock of the project index.The reindex process is waiting for write lock here....
c    findCalledBy() finishes the job and tries to release its read lock of the project index, but it won't be able to, so it is waiting here...    
          
See the deadlock happens here and no more index function is able to work properly anymore, the server side indexing process is hanging and even killing server won't help to kill them.

We can see the problem is in step c, in which findCalledBy() should be able to release the read lock. But it is strangely hanging there. 

After spending some time in debugging it, I find the problem is because CDT CIndex has a double synchronize control, Itself has synchronized lock acquiring and releasing function and its underneath fragment (PDOM) also synchronize an object to handle the sequence of acquiring and releasing locks. So this double synchronize control helps to create the above dealock, like this way:

   *  b step reindex calls acquiring write lock before c step findCalledBy() calls releasing read lock, so b step runs first before c step controlled by CIndex level.
   * but b step waits c step to release the lock to complete, which controlled by PDOM level.
   
I then notice that CDT always create a different CIndex to be used by PDOMManager for an index function, see IndexFactory class. Then it turns out that the solution for it is fairly simple, when RemoteIndexerManager gets an index for a scope, it should also create a new CIndex class wrapper for its PDOM fragment like CDT does, this will then breaks the synchronize control in CIndex level, since index in step b and step c will be two different objects(note only CIndex wrapper objects are different, their underneath PDOM fragments are still same). By this fix, the step c will be able to release the read lock and then step b will acquire the write lock after that.


Reproducible: Always
Comment 1 John Liu CLA 2010-11-17 14:12:45 EST
Created attachment 183331 [details]
Applied to rdt.core

A fix patch.
Comment 2 Chris Recoskie CLA 2010-11-19 12:08:34 EST
Sounds like a bug in CDT really... what we were doing ought to work.

In any case, the idea behind the patch seems sound, so I'll commit it.
Comment 3 Chris Recoskie CLA 2010-11-19 12:29:32 EST
Fixed on ptp_4_0 and HEAD
Comment 4 John Liu CLA 2010-11-19 14:27:00 EST
(In reply to comment #3)
> Fixed on ptp_4_0 and HEAD

Thanks, Chris. I believe it is a CDT bug too. I will open a defect agaist CDT indexer and see if they can fix it there for future release.
Comment 5 John Liu CLA 2010-11-22 13:48:00 EST
(In reply to comment #4)
> (In reply to comment #3)
> > Fixed on ptp_4_0 and HEAD
> Thanks, Chris. I believe it is a CDT bug too. I will open a defect agaist CDT
> indexer and see if they can fix it there for future release.

A CDT bugzilla is opened, https://bugs.eclipse.org/bugs/show_bug.cgi?id=330838