Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 317288 - jgit still hangs in StreamCopyThread
Summary: jgit still hangs in StreamCopyThread
Status: RESOLVED FIXED
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 0.9.0   Edit
Assignee: Shawn Pearce CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-18 07:49 EDT by Dmitry Neverov CLA
Modified: 2010-06-30 13:49 EDT (History)
1 user (show)

See Also:


Attachments
possible execution with hanging (5.04 KB, text/plain)
2010-06-22 04:28 EDT, Dmitry Neverov CLA
no flags Details
fix (2.08 KB, patch)
2010-06-22 04:56 EDT, Dmitry Neverov CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Neverov CLA 2010-06-18 07:49:13 EDT
Build Identifier: 

It is possible that StreamCopyThread will not flush everything from
it's src to it's dst.

In most cases StreamCopyThread works like that:

in loop:
  n = src.read(buf);
  dst.write(buf, 0, n);

and when we want to flush, we interrupt() StreamCopyThread and it
flushes everything it wrote to dst.

The problem is that our interrupt() could interrupt reading. In this
case we will flush everything we wrote to dst, but not everything we
wrote to src.

Consider such execution of loop:

| line | code                                                                  |
|------+-----------------------------------------------------------------------|
|      |                                                                       |
| 126: | n = src.read(buf);    //read smth                                     |
| 135: | dst.write(buf, 0, n); //write it to dst                               |
|      |                                                                       |
| 126: | n = src.read(buf);    //<- interrupt() here                           |
| 128: | continue;                                                             |
|      |                                                                       |
| 118: | needFlush();          //returns true                                  |
| 119: | dst.flush();          //flush everything we have already wrote to dst |
|      |                                                                       |
| 126: | n = src.read(buf);    //read remain data                              |
| 135: | dst.write(buf, 0, n); //write remain data to dst                      |
|      |                                                                       |
| 126: | n = src.read(buf);    //block on reading                              |
|      | //never flush dst                                                     |



I don't know how to fix it... It seems that dst.flush() after
dst.write(buf, 0, n) at line 135 helps, but I don't know is it the
right solution.


Reproducible: Sometimes
Comment 1 Dmitry Neverov CLA 2010-06-22 04:28:28 EDT
Created attachment 172390 [details]
possible execution with hanging
Comment 2 Dmitry Neverov CLA 2010-06-22 04:29:37 EDT
I finally figure out execution which make jgit to hang, it is attached.
Comment 3 Dmitry Neverov CLA 2010-06-22 04:56:17 EDT
Created attachment 172396 [details]
fix

It seems that this patch fix the problem. 
But in this case we will make flush in StreamCopyThread only when there will be no data in src. Is it OK?
Comment 4 Shawn Pearce CLA 2010-06-22 18:54:43 EDT
(In reply to comment #3)
> It seems that this patch fix the problem. 

Yikes, this code is getting horrifically ugly.

> But in this case we will make flush in StreamCopyThread only when there will be
> no data in src. Is it OK?

Yes, this is OK.  The flush is here to ensure that the OutputStream, if buffered in any way, makes it to the final destination, because the thing that is creating the InputStream is about to block and wait for responses to be sent back.

Waiting to flush until src no longer has anything pending doesn't break that.  Any known use of this code should be avoiding calling flush() until there is no data immediately available in src anyway.
Comment 5 Dmitry Neverov CLA 2010-06-23 01:16:49 EDT
(In reply to comment #4)
> (In reply to comment #3)
> > It seems that this patch fix the problem. 
> 
> Yikes, this code is getting horrifically ugly.

I agree, but working ugly code is better than broken. Do you have any ideas how to make it better?
Comment 6 Shawn Pearce CLA 2010-06-23 18:38:31 EDT
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > It seems that this patch fix the problem. 
> > 
> > Yikes, this code is getting horrifically ugly.
> 
> I agree, but working ugly code is better than broken. Do you have any ideas how
> to make it better?

Not really.

Short of writing our own pipe implementation that knows how to carry the flush() on the write side into an event that the reader can consume and mimic, I have no idea how to do this.

In principle it seems like it should be easy to write our own pipe.  Take ArrayBlockingQueue, and have it store an Entry.  Make Entry abstract with two subclasses:  Data and Flush.  Have the Data subclass contain a small byte array, like 1024.  On the writer side, during write() you append into into the current Data until its full, then drop that Data onto the ArrayBlockingQueue.  New writes go into a new Data.  When a flush() occurs, you add the current Data, and then a Flush, onto the ArrayBlockingQueue.

On the reader side, you read either the Data or the Flush object off the queue, and react to it.

This is going to produce considerable garbage as the 1k Data packets move from the writer thread to the reader thread.  You could setup a reverse direction queue to allow the reader to return Data objects to the writer for reuse, and the writer just polls that return queue before it makes a new Data block.

It may vastly simplify the code, but we lose the interrupt protection that the base JDK pipe class provides.  I know they are doing some evil magic to keep track of the peer-thread and try to abort out of an operation if the peer thread is no longer alive.

I'm not sure its worth the effort to write our own queue here.
Comment 7 Dmitry Neverov CLA 2010-06-28 01:22:39 EDT
From my point of view it is easier to apply the patch than write something new and debug it. If someone would like to implement the queue she at least will have working solution.
Comment 8 Shawn Pearce CLA 2010-06-30 13:49:14 EDT
Patch applied as 44854741c573fa8d0743c0cc7dd73323245b4b5b