Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 363076

Summary: Workspace unable to close when using RSE through VPN which terminates unexpectedly
Product: [Tools] Target Management Reporter: Rob Stryker <stryker>
Component: RSEAssignee: dsdp.tm.rse-inbox <tm.rse-inbox>
Status: NEW --- QA Contact: Martin Oberhuber <mober.at+eclipse>
Severity: critical    
Priority: P3 CC: anna.dushistova, vladimir.prus, ymnk
Version: 3.3   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Whiteboard:
Bug Depends on: 362005    
Bug Blocks:    
Attachments:
Description Flags
Stacktrace at shutdown none

Description Rob Stryker CLA 2011-11-07 12:38:28 EST
When using some RSE commands with a Job, and the abrupt unexpected loss of VPN connection, the commands fail to error within any reasonable amount of time. 

Steps to reproduce:
1) Load a vpn in the background
2) Create a host to the remote linux machine
3) Run a job with code similar to the following: 
    getFileService().upload(file, fileDir, fileLastSegment, true, null, null, monitor);
4) Use the Job from step 3 to send a large file (4mb)
5) Terminate your VPN connection via terminal
6) Observe the frozen job
7) Cancel the job in the Progress view (no change in behaviour)
8) Attempt to close eclipse (Cannot... eclipse has a frozen job)

Attempt 2 involved different code in the Job, specifically, having the Job launch a new thread to do this upload, while the Job itself spins in a circle waiting for the Thread to finish or the monitor to be canceled. In this attempt, the Job does eventually end if cancel is pressed in progress view, however the behind-the-scenes thread continues to sometimes get stuck in a loop, sometimes get stuck "waiting" for a response from the server. 

Delays can last upwards of 20 minutes before the product successfully closes.
Comment 1 Anna Dushistova CLA 2011-11-07 13:02:06 EST
Are you using ssh/SFTP?
In this case it can be the similar issue as described in the bug 362005.
Comment 2 Rob Stryker CLA 2011-11-07 13:02:49 EST
The typical location of freeze is at the following stack trace the moment the VPN dies, and remains there, stuck in a loop:

Thread [BackgroundWrapperThreadExample] (Suspended)	
	Object.wait(long) line: not available [native method]	
	Channel$MyPipedInputStream(PipedInputStream).read() line: 310	
	Channel$MyPipedInputStream(PipedInputStream).read(byte[], int, int) line: 361	
	ChannelSftp.fill(byte[], int, int) line: 2333	
	ChannelSftp.header(Buffer, ChannelSftp$Header) line: 2359	
	ChannelSftp.checkStatus(int[], ChannelSftp$Header) line: 1933	
	ChannelSftp._put(InputStream, String, SftpProgressMonitor, int) line: 566	
	ChannelSftp.put(String, String, SftpProgressMonitor, int) line: 388	
	SftpFileService.upload(File, String, String, boolean, String, String, IProgressMonitor) line: 870	
	RSERemotePublishHandler$1.run() line: 61	


After canceling the job (letting the job finish, but leaving the background thread stuck in its loop) attempting to close the workspace also fails. The workspace APPEARS to close, however, the UI thread gets stuck at the following stack trace. Apparently, attempting to close the connections fails.  

Thread [main] (Suspended)	
	Session._write(Packet) line: 1215	
	Session.write(Packet) line: 1210	
	ChannelSftp(Channel).close() line: 484	
	ChannelSftp(Channel).disconnect() line: 528	
	ChannelSftp.disconnect() line: 2309	
	Channel.disconnect(Session) line: 511	
	Session.disconnect() line: 1564	
	SshConnectorService.sessionDisconnect() line: 214	
	SshConnectorService.internalDisconnect(IProgressMonitor) line: 241	
	AbstractConnectorService$2.run(IProgressMonitor) line: 528	
	AbstractConnectorService$SafeRunner.run(AbstractConnectorService$UnsafeRunnableWithProgress, long, IProgressMonitor) line: 444	
	SshConnectorService(AbstractConnectorService).disconnect(IProgressMonitor) line: 534	
	SftpFileSubSystemConfiguration(SubSystemConfiguration).disconnectAllSubSystems() line: 1489	
	RSEUIPlugin.disconnectAll(boolean) line: 534	
	RSEUIPlugin.stop(BundleContext) line: 506	



Attempting to disconnect the server via UI (when the frozen thread is in existence) also causes a job to get launched, which also promptly hangs:

Thread [Worker-8] (Suspended)	
	Session._write(Packet) line: 1215	
	Session.write(Packet) line: 1210	
	ChannelSftp(Channel).close() line: 484	
	ChannelSftp(Channel).disconnect() line: 528	
	ChannelSftp.disconnect() line: 2309	
	Channel.disconnect(Session) line: 511	
	Session.disconnect() line: 1564	
	SshConnectorService.sessionDisconnect() line: 214	
	SshConnectorService.internalDisconnect(IProgressMonitor) line: 241	
	AbstractConnectorService$2.run(IProgressMonitor) line: 528	
	AbstractConnectorService$SafeRunner.run(AbstractConnectorService$UnsafeRunnableWithProgress, long, IProgressMonitor) line: 444	
	SshConnectorService(AbstractConnectorService).disconnect(IProgressMonitor) line: 534	
	FileServiceSubSystem(SubSystem).internalDisconnect(IProgressMonitor) line: 2839	
	SubSystem$DisconnectJob.performOperation(IProgressMonitor) line: 1817	
	SubSystem$DisconnectJob(SubSystem$SubSystemOperationJob).run(IProgressMonitor) line: 1453	
	Worker.run() line: 54	


I would say these hangs occur 90% of the time if transferring a large file. Sometimes, if transfering a small file, it will mercifully finish the file and the job / thread will end without hanging. 

As a reference, another JIRA mentioning this is:  https://issues.jboss.org/browse/JBIDE-9384
Comment 3 Rob Stryker CLA 2011-11-07 13:09:29 EST
Hi Anna:

It certainly seems similar. I am indeed using SSH / SFTP.  I was not able to replicate the issue with yanking my ethernet cable (turn off my networking), though I did not test the yanking case extensively, just the VPN issue. Furthermore, I did not seem to freeze in any of the stack trace areas mentioned in the linked bug. 

I am sure the issues are related, but not identical. It seems several different parts of RSE are not robust and do not protect very well against a loss of connection. I'm aware trying to guard against a loss of connection is not an easy task, so I am predominately focussing on the inability to cancel the job and properly exit the workbench here. 

In this bug, I think the RSE wrapping code needs to interface with Jsch in some way to ensure that a cancelation is possible, whether it be notifying the waiting threads, interrupting them, or a setCancel boolean for the channel / session to escape waiting loops, or any number of other possibilities. From my preliminary glances, it seems Jsch does not have any real API's to initiate cancelations and cleanups, and this seems to be the most dangerous part.
Comment 4 Rob Stryker CLA 2011-11-07 13:17:23 EST
I will also note that despite having a keepalive of 300 and a timeout of 30, the connection remains in its loop for upwards of 10-20 minutes. Changing these values seems to have no effect whatsoever.
Comment 5 Vladimir Prus CLA 2011-11-07 13:47:16 EST
Rob,

I actually observed the same. In fact, if I try to connect to target that is already disconnected, we can very well hang for 10-20 minutes as you reported, despite lower timeout settings. It seems like #362005, and this one, are symptoms of the same underlying problem inside jcraft.
Comment 6 Atsuhiko Yamanaka CLA 2011-12-22 03:12:52 EST
(In reply to comment #2)
> The typical location of freeze is at the following stack trace the moment the
> VPN dies, and remains there, stuck in a loop:
> 
...

Hi, I'm the author of jsch.
Is it possible to attach the full stack trace to this bugzilla entry?
Comment 7 Rob Stryker CLA 2012-01-09 02:04:39 EST
Atsuhiko Yamanaka:

I pasted in comment 2 the following thread example.  This is when using RSE and JSch together.  You can see the entry point is SftpFileService.upload(etc). It then dives into jsch code.  It gets stuck in a loop inside Channel. 

I hope this is useful for you. I am not sure how more complete of a stack trace we need, or if I need to make an isolated usecase / test case for you. Please let me know. 



Thread [BackgroundWrapperThreadExample] (Suspended)    
    Object.wait(long) line: not available [native method]    
    Channel$MyPipedInputStream(PipedInputStream).read() line: 310    
    Channel$MyPipedInputStream(PipedInputStream).read(byte[], int, int) line:
361    
    ChannelSftp.fill(byte[], int, int) line: 2333    
    ChannelSftp.header(Buffer, ChannelSftp$Header) line: 2359    
    ChannelSftp.checkStatus(int[], ChannelSftp$Header) line: 1933    
    ChannelSftp._put(InputStream, String, SftpProgressMonitor, int) line: 566   
    ChannelSftp.put(String, String, SftpProgressMonitor, int) line: 388    
    SftpFileService.upload(File, String, String, boolean, String, String,
IProgressMonitor) line: 870    
    RSERemotePublishHandler$1.run() line: 61
Comment 8 Atsuhiko Yamanaka CLA 2012-01-09 02:12:56 EST
I'm interested in the following trace,
  Thread [Worker-8] (Suspended)    
      Session._write(Packet) line: 1215    
      Session.write(Packet) line: 1210   
      ...
Could you paste or attach the full stack trace(including other threads) at that timing?
Comment 9 Rob Stryker CLA 2012-01-11 03:55:37 EST
Created attachment 209301 [details]
Stacktrace at shutdown

To be clear, the usecase to generate this stack was as follows. This is done using a mix of wtp servertools and RSE, but, it's 100% replicatable in RSE alone. 

Methodology:

0) A method which pops up a progress monitor dialog is the first entry point. 
1) Launch my own thread (named RSERemotePublishHandler), and inside that thread, use the following RSE command to send a 2mb file:
       getFileService().upload(file, remotePath.removeLastSegments(1).toString(), 
		remotePath.lastSegment(), true, null, null, monitor);

2) Observe the creation of my thread
3) Let the upload begin
4) SHUT DOWN my networking. Note my thread RSERemotePublishHandler is now 100% blocked and frozen
5) cancel the progress monitor. Whatever dialog associated with the task goes away, the main thread which was waiting for RSERemotePublishHandler completes, but RSERemotePublishHandler itself remains, blocked. 
6) Shutdown eclipse

My thread, RSERemotePublishHandler, remains blocked for upwards of 20 minutes. The attempt to shutdown eclipse freezes. The shutdown fails, since osgi is attempting to shutdown RSE and the RSE plugin is unable to shutdown.
Comment 10 Rob Stryker CLA 2012-01-11 04:09:34 EST
To be clear, I've even tried having my entry point thread, which launches RSERemotePublishHandler, to try to interrupt() the RSERemotePublishHandler thread when the monitor is canceled. But this has no result.  RSERemotePublishHandler remains 100% blocked at SocketOutputStream.socketWrite0(etc).
Comment 11 Atsuhiko Yamanaka CLA 2012-01-11 11:24:04 EST
(In reply to comment #10)
> To be clear, I've even tried having my entry point thread, which launches
> RSERemotePublishHandler, to try to interrupt() the RSERemotePublishHandler
> thread when the monitor is canceled. But this has no result. 
> RSERemotePublishHandler remains 100% blocked at
> SocketOutputStream.socketWrite0(etc).

Thanks a lot.
I have understood where is the problem.
I'll think of how to resolve it.
Comment 12 Rob Stryker CLA 2012-01-11 12:20:37 EST
Wow!  Thanks for the response!  I'm glad you discovered the issue and it's on your todo list. This is great news :) 

Thanks for the hard work.
Comment 13 Rob Stryker CLA 2012-04-18 15:44:32 EDT
Hi Atsuhiko!

Any advice on where this is caused from, or how to fix it? Thanks!
Comment 14 Rob Stryker CLA 2013-04-30 05:59:18 EDT
Any updates on this year old bug Atsuhiko?
Comment 15 Martin Oberhuber CLA 2014-11-03 02:45:18 EST
Eclipse Platform is updating to JSch-0.1.51 for Luna SR2 (4.3.2):
https://bugs.eclipse.org/bugs/show_bug.cgi?id=447800

In this context it would be good to know whether the issue discussed here is still relevant ? - Eclipse 4.4m3 already has JSch-0.1.51 integrated.