Community
Participate
Working Groups
When using some RSE commands with a Job, and the abrupt unexpected loss of VPN connection, the commands fail to error within any reasonable amount of time. Steps to reproduce: 1) Load a vpn in the background 2) Create a host to the remote linux machine 3) Run a job with code similar to the following: getFileService().upload(file, fileDir, fileLastSegment, true, null, null, monitor); 4) Use the Job from step 3 to send a large file (4mb) 5) Terminate your VPN connection via terminal 6) Observe the frozen job 7) Cancel the job in the Progress view (no change in behaviour) 8) Attempt to close eclipse (Cannot... eclipse has a frozen job) Attempt 2 involved different code in the Job, specifically, having the Job launch a new thread to do this upload, while the Job itself spins in a circle waiting for the Thread to finish or the monitor to be canceled. In this attempt, the Job does eventually end if cancel is pressed in progress view, however the behind-the-scenes thread continues to sometimes get stuck in a loop, sometimes get stuck "waiting" for a response from the server. Delays can last upwards of 20 minutes before the product successfully closes.
Are you using ssh/SFTP? In this case it can be the similar issue as described in the bug 362005.
The typical location of freeze is at the following stack trace the moment the VPN dies, and remains there, stuck in a loop: Thread [BackgroundWrapperThreadExample] (Suspended) Object.wait(long) line: not available [native method] Channel$MyPipedInputStream(PipedInputStream).read() line: 310 Channel$MyPipedInputStream(PipedInputStream).read(byte[], int, int) line: 361 ChannelSftp.fill(byte[], int, int) line: 2333 ChannelSftp.header(Buffer, ChannelSftp$Header) line: 2359 ChannelSftp.checkStatus(int[], ChannelSftp$Header) line: 1933 ChannelSftp._put(InputStream, String, SftpProgressMonitor, int) line: 566 ChannelSftp.put(String, String, SftpProgressMonitor, int) line: 388 SftpFileService.upload(File, String, String, boolean, String, String, IProgressMonitor) line: 870 RSERemotePublishHandler$1.run() line: 61 After canceling the job (letting the job finish, but leaving the background thread stuck in its loop) attempting to close the workspace also fails. The workspace APPEARS to close, however, the UI thread gets stuck at the following stack trace. Apparently, attempting to close the connections fails. Thread [main] (Suspended) Session._write(Packet) line: 1215 Session.write(Packet) line: 1210 ChannelSftp(Channel).close() line: 484 ChannelSftp(Channel).disconnect() line: 528 ChannelSftp.disconnect() line: 2309 Channel.disconnect(Session) line: 511 Session.disconnect() line: 1564 SshConnectorService.sessionDisconnect() line: 214 SshConnectorService.internalDisconnect(IProgressMonitor) line: 241 AbstractConnectorService$2.run(IProgressMonitor) line: 528 AbstractConnectorService$SafeRunner.run(AbstractConnectorService$UnsafeRunnableWithProgress, long, IProgressMonitor) line: 444 SshConnectorService(AbstractConnectorService).disconnect(IProgressMonitor) line: 534 SftpFileSubSystemConfiguration(SubSystemConfiguration).disconnectAllSubSystems() line: 1489 RSEUIPlugin.disconnectAll(boolean) line: 534 RSEUIPlugin.stop(BundleContext) line: 506 Attempting to disconnect the server via UI (when the frozen thread is in existence) also causes a job to get launched, which also promptly hangs: Thread [Worker-8] (Suspended) Session._write(Packet) line: 1215 Session.write(Packet) line: 1210 ChannelSftp(Channel).close() line: 484 ChannelSftp(Channel).disconnect() line: 528 ChannelSftp.disconnect() line: 2309 Channel.disconnect(Session) line: 511 Session.disconnect() line: 1564 SshConnectorService.sessionDisconnect() line: 214 SshConnectorService.internalDisconnect(IProgressMonitor) line: 241 AbstractConnectorService$2.run(IProgressMonitor) line: 528 AbstractConnectorService$SafeRunner.run(AbstractConnectorService$UnsafeRunnableWithProgress, long, IProgressMonitor) line: 444 SshConnectorService(AbstractConnectorService).disconnect(IProgressMonitor) line: 534 FileServiceSubSystem(SubSystem).internalDisconnect(IProgressMonitor) line: 2839 SubSystem$DisconnectJob.performOperation(IProgressMonitor) line: 1817 SubSystem$DisconnectJob(SubSystem$SubSystemOperationJob).run(IProgressMonitor) line: 1453 Worker.run() line: 54 I would say these hangs occur 90% of the time if transferring a large file. Sometimes, if transfering a small file, it will mercifully finish the file and the job / thread will end without hanging. As a reference, another JIRA mentioning this is: https://issues.jboss.org/browse/JBIDE-9384
Hi Anna: It certainly seems similar. I am indeed using SSH / SFTP. I was not able to replicate the issue with yanking my ethernet cable (turn off my networking), though I did not test the yanking case extensively, just the VPN issue. Furthermore, I did not seem to freeze in any of the stack trace areas mentioned in the linked bug. I am sure the issues are related, but not identical. It seems several different parts of RSE are not robust and do not protect very well against a loss of connection. I'm aware trying to guard against a loss of connection is not an easy task, so I am predominately focussing on the inability to cancel the job and properly exit the workbench here. In this bug, I think the RSE wrapping code needs to interface with Jsch in some way to ensure that a cancelation is possible, whether it be notifying the waiting threads, interrupting them, or a setCancel boolean for the channel / session to escape waiting loops, or any number of other possibilities. From my preliminary glances, it seems Jsch does not have any real API's to initiate cancelations and cleanups, and this seems to be the most dangerous part.
I will also note that despite having a keepalive of 300 and a timeout of 30, the connection remains in its loop for upwards of 10-20 minutes. Changing these values seems to have no effect whatsoever.
Rob, I actually observed the same. In fact, if I try to connect to target that is already disconnected, we can very well hang for 10-20 minutes as you reported, despite lower timeout settings. It seems like #362005, and this one, are symptoms of the same underlying problem inside jcraft.
(In reply to comment #2) > The typical location of freeze is at the following stack trace the moment the > VPN dies, and remains there, stuck in a loop: > ... Hi, I'm the author of jsch. Is it possible to attach the full stack trace to this bugzilla entry?
Atsuhiko Yamanaka: I pasted in comment 2 the following thread example. This is when using RSE and JSch together. You can see the entry point is SftpFileService.upload(etc). It then dives into jsch code. It gets stuck in a loop inside Channel. I hope this is useful for you. I am not sure how more complete of a stack trace we need, or if I need to make an isolated usecase / test case for you. Please let me know. Thread [BackgroundWrapperThreadExample] (Suspended) Object.wait(long) line: not available [native method] Channel$MyPipedInputStream(PipedInputStream).read() line: 310 Channel$MyPipedInputStream(PipedInputStream).read(byte[], int, int) line: 361 ChannelSftp.fill(byte[], int, int) line: 2333 ChannelSftp.header(Buffer, ChannelSftp$Header) line: 2359 ChannelSftp.checkStatus(int[], ChannelSftp$Header) line: 1933 ChannelSftp._put(InputStream, String, SftpProgressMonitor, int) line: 566 ChannelSftp.put(String, String, SftpProgressMonitor, int) line: 388 SftpFileService.upload(File, String, String, boolean, String, String, IProgressMonitor) line: 870 RSERemotePublishHandler$1.run() line: 61
I'm interested in the following trace, Thread [Worker-8] (Suspended) Session._write(Packet) line: 1215 Session.write(Packet) line: 1210 ... Could you paste or attach the full stack trace(including other threads) at that timing?
Created attachment 209301 [details] Stacktrace at shutdown To be clear, the usecase to generate this stack was as follows. This is done using a mix of wtp servertools and RSE, but, it's 100% replicatable in RSE alone. Methodology: 0) A method which pops up a progress monitor dialog is the first entry point. 1) Launch my own thread (named RSERemotePublishHandler), and inside that thread, use the following RSE command to send a 2mb file: getFileService().upload(file, remotePath.removeLastSegments(1).toString(), remotePath.lastSegment(), true, null, null, monitor); 2) Observe the creation of my thread 3) Let the upload begin 4) SHUT DOWN my networking. Note my thread RSERemotePublishHandler is now 100% blocked and frozen 5) cancel the progress monitor. Whatever dialog associated with the task goes away, the main thread which was waiting for RSERemotePublishHandler completes, but RSERemotePublishHandler itself remains, blocked. 6) Shutdown eclipse My thread, RSERemotePublishHandler, remains blocked for upwards of 20 minutes. The attempt to shutdown eclipse freezes. The shutdown fails, since osgi is attempting to shutdown RSE and the RSE plugin is unable to shutdown.
To be clear, I've even tried having my entry point thread, which launches RSERemotePublishHandler, to try to interrupt() the RSERemotePublishHandler thread when the monitor is canceled. But this has no result. RSERemotePublishHandler remains 100% blocked at SocketOutputStream.socketWrite0(etc).
(In reply to comment #10) > To be clear, I've even tried having my entry point thread, which launches > RSERemotePublishHandler, to try to interrupt() the RSERemotePublishHandler > thread when the monitor is canceled. But this has no result. > RSERemotePublishHandler remains 100% blocked at > SocketOutputStream.socketWrite0(etc). Thanks a lot. I have understood where is the problem. I'll think of how to resolve it.
Wow! Thanks for the response! I'm glad you discovered the issue and it's on your todo list. This is great news :) Thanks for the hard work.
Hi Atsuhiko! Any advice on where this is caused from, or how to fix it? Thanks!
Any updates on this year old bug Atsuhiko?
Eclipse Platform is updating to JSch-0.1.51 for Luna SR2 (4.3.2): https://bugs.eclipse.org/bugs/show_bug.cgi?id=447800 In this context it would be good to know whether the issue discussed here is still relevant ? - Eclipse 4.4m3 already has JSch-0.1.51 integrated.