Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 561532

Summary: java.nio.BufferUnderFlowException (Take Four)
Product: [Modeling] EMF Reporter: Eike Stepper <stepper>
Component: cdo.net4jAssignee: Eike Stepper <stepper>
Status: CLOSED FIXED QA Contact:
Severity: normal    
Priority: P3    
Version: 4.10   
Target Milestone: ---   
Hardware: PC   
OS: Windows 7   
Whiteboard:
Bug Depends on: 262875, 295989, 299871    
Bug Blocks:    

Description Eike Stepper CLA 2020-03-28 04:04:27 EDT
Bug 262875, bug 295989, and bug 299871 were all about BufferUnderFlowExceptions in SignalProtocol.handleBuffer(). The stack traces look like this one:

java.nio.BufferUnderflowException
	at java.nio.Buffer.nextGetIndex(Buffer.java:506)
	at java.nio.DirectByteBuffer.getShort(DirectByteBuffer.java:590)
	at org.eclipse.net4j.signal.SignalProtocol.handleBuffer(SignalProtocol.java:252)
	at org.eclipse.spi.net4j.Channel.handleBufferFromMultiplexer(Channel.java:279)
	at org.eclipse.net4j.internal.tcp.TCPConnector.handleRead(TCPConnector.java:228)
	at org.eclipse.net4j.internal.tcp.TCPSelector.handleSelection(TCPSelector.java:265)
	at org.eclipse.net4j.internal.tcp.TCPSelector.run(TCPSelector.java:190)
	at java.lang.Thread.run(Thread.java:748)

	
The code in SignalProtocol.handleBuffer() looks like so:

        // Incoming indication
        signal = signals.get(-correlationID);
        if (signal == null)
        {
          short signalID = byteBuffer.getShort();  <--- BufferUnderflowException here!
          

So far David Bonneau's analysis in the description of bug 262875 gets very close to the point, but it focuses on the special case that the last buffer of the signal stream (the one with EOS=true) is empty. It appears that this happens in particular when the signal logic explicitly calls flush() on the signal stream after all write() calls are done. And this is in line with the observation that the BufferUnderflowException mostly occurs when CDO's large objects are transferred, i.e., in CommitTransactionRequest and LoadLobRequest. The reason is that these signals use IOUtil.copyBinary() or IOUtil.copyCharacter(), which execute a final flush() under the hood (this practice should be subject to further consideration, too).

But the problem is really broader as the following test signals demonstrate:

  @Override
  protected void requesting(ExtendedDataOutputStream out) throws Exception
  {
    // This is the only data PartialReadIndication will read.
    out.writeInt(4711);

    // This will cause the current buffer to be sent.
    out.flush();

    // This will be interpreted as the signalID of a new incoming indication by SignalProtocol.handleBuffer().
    out.writeShort(Short.MAX_VALUE);
  }

    @Override
  protected void indicating(ExtendedDataInputStream in) throws Exception
  {
    int data = in.readInt();

    // Do not read the remaining 2 bytes here that PartialReadRequest has sent!
  }
  
To conclude, the root cause of all these BufferUnderflowExceptions is that the indicating() method of the receiver finishes before all bytes are read from the input stream. It seems that an explicit flush() call must be involved on the transmitter side, but that is generally allowed. A final empty buffer can be involved, but that's not strictly required to produce a BufferUnderflowException on the receiver side.
Comment 1 Eike Stepper CLA 2020-03-28 04:08:56 EDT
(In reply to comment #0)
> A final empty buffer can be involved, but that's not strictly 
> required to produce a BufferUnderflowException on the receiver side.

That's not entirely correct. If the final buffer has at least 2 bytes available the 2 bytes are interpreted as the signal ID of a new incoming signal, which can lead to either an InvalidSignalIDException or, worse, to the creation of a wrong signal instance (which will most likely fail later).
Comment 2 Eike Stepper CLA 2020-03-28 04:16:53 EDT
My new solution to the problem involves a new bit in *each* signal buffer header to indicate whether the buffer is the first buffer of a signal or not. With this new bit the SignalProtocol can detect that the offending buffers really belong to a signal that has already been discarded because the signal logic stopped prematurely to read the entire signal stream. These buffers can then be silently discarded, instead of being interpreted as buffers for a new incoming signal.

I'll try to pack this new BOS (Begin Of Signal) bit into the existing correlation ID field, which is the only field that's present in all buffers of a signal stream, anyway...
Comment 3 Eike Stepper CLA 2020-03-29 03:11:09 EDT
The changes are committed:
https://git.eclipse.org/c/cdo/cdo.git/commit/?id=60308125e4705cddea6010f4a58165dfe4ed0491
Comment 4 Eike Stepper CLA 2020-12-11 10:24:45 EST
Closing.
Comment 5 Eike Stepper CLA 2020-12-11 10:34:56 EST
Closing.