Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 312863 - jgit hangs waiting for ssh error stream to be closed
Summary: jgit hangs waiting for ssh error stream to be closed
Status: RESOLVED FIXED
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 0.7.1   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 0.8.0   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-05-14 02:55 EDT by Dmitry Neverov CLA
Modified: 2010-05-19 14:44 EDT (History)
1 user (show)

See Also:


Attachments
possible fix (1.00 KB, patch)
2010-05-14 02:56 EDT, Dmitry Neverov CLA
no flags Details | Diff
simplified halt patch (1.25 KB, patch)
2010-05-17 05:24 EDT, Dmitry Neverov CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Neverov CLA 2010-05-14 02:55:48 EDT
Build Identifier: 

In close() method of SshFetchConnection and SshPushConnection errorThread.join() can wait forever if jsch will not close channel's error stream. Maybe join with timeout should be used. Path with possible fix is attached

Reproducible: Always

Steps to Reproduce:
1. try to fetch big objects (bigger than available heap size) from repository through ssh
2. get java.lang.OutOfMemoryException, but no one close ssh channel error stream
3. wait forever at errorThread.join()
Comment 1 Dmitry Neverov CLA 2010-05-14 02:56:40 EDT
Created attachment 168509 [details]
possible fix
Comment 2 Shawn Pearce CLA 2010-05-15 20:09:46 EDT
(In reply to comment #0)
> In close() method of SshFetchConnection and SshPushConnection
> errorThread.join() can wait forever if jsch will not close channel's error
> stream. Maybe join with timeout should be used. Path with possible fix is
> attached

Ick.  IIRC we have to wait for the errorThread because closing stdout causes JSch to immediately tear down the command channel, which means errors can get lost if they weren't yet consumed by the error thread.  In a case like this where the client side has aborted, there is no other way for JSch to know that we aren't going to continue.

What about an alternative patch that tells the errorThread we are done and
wish to terminate now?  We might be able to use a smaller timeout on the join, so we have less time to wait for the close to finish.  But the smaller timeout may cause us to miss an error message, so I'm wondering if we shouldn't also keep track of the connection state.  Once the remote side has started to send us data its less likely to fail, so any close attempt after that means we probably won't need to wait for stderr data.

http://egit.eclipse.org/r/708 has a counter-proposal patch based on your idea.
Comment 3 Dmitry Neverov CLA 2010-05-17 05:24:15 EDT
Created attachment 168700 [details]
simplified halt patch
Comment 4 Dmitry Neverov CLA 2010-05-17 05:37:48 EDT
Taking into account bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=313082, your patch in StreamCopyThread class can be simplified to attached one. Yes we can miss some error messages, but tracking state of connection will not help us, because jsch will never close error thread in case of exception in jgit code. 

I think if we made decision to close connection we have already read everything we want or we got exception and can throw it up to client code, so it is ok to miss some error messages. Anyway longer timeout won't solve this, because we can get errors after any timeout we choose.
Comment 5 Shawn Pearce CLA 2010-05-19 14:44:48 EDT
Committed as 2e961989e42b1fe7e8bd9eaa7a3d2e88a0d1d001