Community
Participate
Working Groups
* Have an open SSH Only connection with an expanded Sftp tree. * Switch to a different perspective. * Hibernate/Resume the computer few times, changing network connections (apparently this disconnects the SSH/Sftp connection). * Switch back to the RSE perspective, click somewhere in the tree. --> The whole IDE gets trapped in deadlock with attach thread dump. This is a critical issue since data in any open editors is lost at this point.
Created attachment 122331 [details] Thread dump of deadlocked Eclipse
Analyzing the Thread dump, there are several issues. 1. Worker-14 / SystemFetchOperation.java:264 Between SystemFetchOperation#run() and SystemFetchOperation#execute(), the "synchronized" lock on SshConnectorService must not be held while calling out to other code later (with Display.syncExec()). This is the recipe for deadlock! 2. main Thread: Locks on Display and SshConnectorService are now required in reverse order compared to Worker-14 -- this is causing the deadlock. - Why do we run into FileServiceSubSystem.getRemoteFileObject() when we should already know that we're disconnected? We shouldn't ever auto-reconnect in this code branch. This is exactly the situation that I have already described in bug 235494. So, here is the proof that this deadlock can in fact happen. I'd like to see this fixed by M5, since I personally lost some data due to this deadlock.
*** Bug 235494 has been marked as a duplicate of this bug. ***
Created attachment 122467 [details] alternative to synchronizing on the connector service Here's an approach that avoids synchronizing on the connector service. Instead we maintain a static list of connector services which are connecting. If a connector service is already connecting, then we sleep until it completes before continuing on. Martin, what are your thoughts on this approach?
Hm... the basic approach looks good, but there are a couple of issues: 1.) Whenever you add something to the _connectingConnectorServices, you must be 100% sure that you also remove it again in case an exception happens: try... ...finally (synchronized(_connectingConnectorServices) { _connectingConnectorServices.remove(cs); } 2.) Instead of Thread.sleep(500), better do synchronized(_connectingConnectorServices) { _connectingConnectorServices.wait(); } and, when you remove it, you do synchronized(_connectingConnectorServices) { _connectingConnectorServices.notifyAll(); } In order to avoid some typing work, and the synchornized statements, instead of the dumb List you should use a separate class that does all the synchronized/wait stuff for you (implementation would use your List): private static class ConnectorServicePool { public synchronized void add(IConnectorService cs) {} public synchronized void remove(IConnectorService cs) {} public synchronized boolean contains(IConnectorService cs) {} public synchronized void waitUntilNotContained(IConnectorService cs) {} } 3.) After waitUntilNotContained() you still need to check whether it's actually connected (because the connect could have been canceled due to an exception).
Created attachment 123309 [details] updated patch to use wait/notify in ConnectorServicePool This updated patch uses the suggested ConnectorServicePool class and wait/notify.
I've committed the change to cvs.
Created attachment 143901 [details] Additional patch to clean up Attached additional patch makes the fix safer by putting the try...finally construct right after adding into the COnnectorServicePool, so it is guaranteed that it gets removed under all circumstances. This is in line with the newer fix for bug 284018 in SubSystem.java.