| Summary: | RemoteToolsConnection doesn't reopen connection in certain situations | ||
|---|---|---|---|
| Product: | [Tools] PTP | Reporter: | Roland Schulz <roland> |
| Component: | Remote Tools | Assignee: | Greg Watson <g.watson> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | ||
| Version: | 4.0 | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Windows 7 | ||
| Whiteboard: | |||
|
Description
Roland Schulz
I agree that this needs to be fixed in remotetools. When FileTools uses sftp, it calls manager.connection.getSFTPChannel() which presumably should be the correct connection. When FileTools runs a command, KillableExecution eventually calls getExecutionManager().getConnection().createExecChannel(hasPTY) which should also be the correct connection. It seems to me that there is some problem with the ExecutionManager class not updating it's connection properly after it is closed and reopened. I'm not sure about changing the return value of query until we understand what these are used for. It looks like Connection.createRemoteExecutionManager() will create a new ExecutionManager as long as Connection.disconnect() is called first. (In reply to comment #3) > It looks like Connection.createRemoteExecutionManager() will create a new > ExecutionManager as long as Connection.disconnect() is called first. But Connection.disconnect() isn't called if the session fails. Should disconnect be called from isConnect if defaultSession.isConnected() returns false? Or who should call disconnect if the session fails? I guess it would be better to guarantee that disconnect is called than making the other code handle the case that the connection is not connected anymore but disconnect has never been called. I don't think there is any way to get notified that the connection has failed, so calling disconnect in isConnection if defaultSession.isConnected is false sounds reasonable. Users should always call isConnected before trying to get an execution manager. what about changing RemoteToolsConnection.open to:
if (fTargetControl.query() != ITargetStatus.RESUMED) {
monitor.beginTask(Messages.RemoteToolsConnection_open, 2);
try {
fTargetControl.kill(monitor);
fTargetControl.create(monitor);
} catch (CoreException e) {
throw new RemoteConnectionException(e);
}
}
If that works for you let me know and I'll commit it. I committed this so it would be in the next build. (In reply to comment #8) > I committed this so it would be in the next build. I don't see any problem with it. But it also doesn't solve the problem. I tested a few other things but they didn't work. I don't really understand how it is supposed to work. E.g. : - who is responsible to reopen the connection - why does FileTools call test and test just throws an exception if the connection is not open? Shouldn't their also be some reconnect? - What is this STARTED state Until this is more clear, I think I can't do much. One way to trigger the error: 1) Start eclipse 2) Start RM (e.g. PBS) 3) kill the sshd daemon OR only the shell on the server (can be done by: "pkill sshd", "pkill -9 sh") 4) try to restart the RM server (In reply to comment #9) > I tested a few other things but they didn't work. I don't really understand how > it is supposed to work. E.g. : > - who is responsible to reopen the connection > - why does FileTools call test and test just throws an exception if the > connection is not open? Shouldn't their also be some reconnect? > - What is this STARTED state I'm not sure as this code was written by someone else. > > Until this is more clear, I think I can't do much. > > One way to trigger the error: > 1) Start eclipse > 2) Start RM (e.g. PBS) > 3) kill the sshd daemon OR only the shell on the server > (can be done by: "pkill sshd", "pkill -9 sh") > 4) try to restart the RM server I'll do some more testing today. The reason it's not working seems to be that the call to kill throws an exception that is not handled. Looking into it... I've committed a new patch that fixes the problem when the sshd is killed. Please test as thoroughly as you can since this was a fairly large patch. There is a minor issue that the server failure is not reported until the next attempt to start the RM. I'm not sure why the ExcutionObserver is not catching this, but it's a minor issue that can be dealt with later. The patch changes the Connection class so that the connect and disconnect methods can be called multiple times. Previously it had to be discarded once disconnected if any of the parameters changed (since these were passed to constructors). Since various classes (ExecutionManager, etc.) interact with Connection, these had to be modified also. It is the responsibility of the caller to check that the connection is open, and if not, reopen it. Have you had a chance to test this? (In reply to comment #13) > Have you had a chance to test this? Yes. Works great! Both killing sshd and killing just the shell. |