Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 295989

Summary: java.nio.BufferUnderflowException
Product: [Modeling] EMF Reporter: Anders Forsell <aforsell1971>
Component: cdo.coreAssignee: Andre Dietisheim <adietish>
Status: CLOSED DUPLICATE QA Contact: Eike Stepper <stepper>
Severity: normal    
Priority: P3 CC: adietish, lindeman1966
Version: 2.0   
Target Milestone: ---   
Hardware: PC   
OS: Windows Vista   
Whiteboard:
Bug Depends on:    
Bug Blocks: 561532    

Description Anders Forsell CLA 2009-11-24 07:42:11 EST
I am getting BufferUnderflowException during a transaction.commit(). It happens randomly but I have only seen it when running on ONE processor (setting my dual core to only run one processor using "msconfig.exe"). I am running the latest from the CDO/Net4j 2.0 maintenance branch.

Any idea what could cause this?

---
Anders

!ENTRY org.eclipse.net4j.util 4 0 2009-11-24 13:29:41.230
!MESSAGE BufferUnderflowException
!STACK 0
java.nio.BufferUnderflowException
	at java.nio.Buffer.nextGetIndex(Unknown Source)
	at java.nio.DirectByteBuffer.getShort(Unknown Source)
	at org.eclipse.net4j.signal.SignalProtocol.handleBuffer(SignalProtocol.java:217)
	at org.eclipse.spi.net4j.Channel$ReceiverWork.run(Channel.java:302)
	at org.eclipse.net4j.util.concurrent.QueueRunner.work(QueueRunner.java:26)
	at org.eclipse.net4j.util.concurrent.QueueRunner.work(QueueRunner.java:1)
	at org.eclipse.net4j.util.concurrent.QueueWorker.work(QueueWorker.java:64)
	at org.eclipse.net4j.util.concurrent.Worker$WorkerThread.run(Worker.java:154)
	
!ENTRY org.eclipse.net4j 4 0 2009-11-24 13:29:51.256
!MESSAGE EOFException
!STACK 0
java.io.EOFException
	at java.io.DataInputStream.readBoolean(Unknown Source)
	at org.eclipse.net4j.util.io.ExtendedDataInput$Delegating.readBoolean(ExtendedDataInput.java:55)
	at org.eclipse.emf.internal.cdo.net4j.protocol.ChangeSubscriptionRequest.confirming(ChangeSubscriptionRequest.java:76)
	at org.eclipse.emf.internal.cdo.net4j.protocol.ChangeSubscriptionRequest.confirming(ChangeSubscriptionRequest.java:1)
	at org.eclipse.emf.internal.cdo.net4j.protocol.CDOClientRequest.confirming(CDOClientRequest.java:83)
	at org.eclipse.net4j.signal.RequestWithConfirmation.doExtendedInput(RequestWithConfirmation.java:123)
	at org.eclipse.net4j.signal.Signal.doInput(Signal.java:312)
	at org.eclipse.net4j.signal.RequestWithConfirmation.doExecute(RequestWithConfirmation.java:103)
	at org.eclipse.net4j.signal.SignalActor.execute(SignalActor.java:66)
	at org.eclipse.net4j.signal.Signal.runSync(Signal.java:239)
	at org.eclipse.net4j.signal.SignalProtocol.startSignal(SignalProtocol.java:423)
	at org.eclipse.net4j.signal.RequestWithConfirmation.doSend(RequestWithConfirmation.java:87)
	at org.eclipse.net4j.signal.RequestWithConfirmation.send(RequestWithConfirmation.java:73)
	at org.eclipse.emf.internal.cdo.net4j.protocol.CDOClientProtocol.send(CDOClientProtocol.java:286)
	at org.eclipse.emf.internal.cdo.net4j.protocol.CDOClientProtocol.changeSubscription(CDOClientProtocol.java:150)
	at org.eclipse.emf.internal.cdo.view.CDOViewImpl$ChangeSubscriptionManager.request(CDOViewImpl.java:1816)
	at org.eclipse.emf.internal.cdo.view.CDOViewImpl$ChangeSubscriptionManager.subscribe(CDOViewImpl.java:1887)
	at org.eclipse.emf.internal.cdo.view.CDOViewImpl$ChangeSubscriptionManager.handleNewObjects(CDOViewImpl.java:1793)
	at org.eclipse.emf.internal.cdo.view.CDOViewImpl$ChangeSubscriptionManager.committedTransaction(CDOViewImpl.java:1725)
	at org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl$CDOCommitContextImpl.postCommit(CDOTransactionImpl.java:1471)
	at org.eclipse.emf.internal.cdo.transaction.CDOSingleTransactionStrategyImpl.commit(CDOSingleTransactionStrategyImpl.java:72)
	at org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commit(CDOTransactionImpl.java:626)
	at org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commit(CDOTransactionImpl.java:653)
	at com.comsol.server.internal.ServerManagerImpl.runJob(ServerManagerImpl.java:484)
	at com.comsol.server.internal.ServerManagerImpl.access$3(ServerManagerImpl.java:451)
	at com.comsol.server.internal.ServerManagerImpl$TransactionListener.runServerJob(ServerManagerImpl.java:572)
	at com.comsol.server.internal.ServerManagerImpl$TransactionListener.access$0(ServerManagerImpl.java:550)
	at com.comsol.server.internal.ServerManagerImpl$TransactionListener$1.run(ServerManagerImpl.java:541)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
Comment 1 Anders Forsell CLA 2009-11-24 08:11:45 EST
Another exception:

Thread [pool-1-thread-1] (Suspended (exception EOFException))	
	owns: CDOViewImpl$ChangeSubscriptionManager$1  (id=583)	
	owns: Object  (id=584)	
	owns: JobDispatcherImpl  (id=585)	
	ExtendedDataInputStream(DataInputStream).readBoolean() line: not available	
	CDOClientRequest$2(ExtendedDataInput$Delegating).readBoolean() line: 55	
	ChangeSubscriptionRequest.confirming(CDODataInput) line: 76	
	ChangeSubscriptionRequest.confirming(CDODataInput) line: 1	
	ChangeSubscriptionRequest(CDOClientRequest<RESULT>).confirming(ExtendedDataInputStream) line: 83	
	ChangeSubscriptionRequest(RequestWithConfirmation<RESULT>).doExtendedInput(ExtendedDataInputStream) line: 123	
	ChangeSubscriptionRequest(Signal).doInput(BufferInputStream) line: 312	
	ChangeSubscriptionRequest(RequestWithConfirmation<RESULT>).doExecute(BufferInputStream, BufferOutputStream) line: 103	
	ChangeSubscriptionRequest(SignalActor).execute(BufferInputStream, BufferOutputStream) line: 66	
	ChangeSubscriptionRequest(Signal).runSync() line: 239	
	CDOClientProtocol(SignalProtocol<INFRA_STRUCTURE>).startSignal(SignalActor, long) line: 423	
	ChangeSubscriptionRequest(RequestWithConfirmation<RESULT>).doSend(long) line: 87	
	ChangeSubscriptionRequest(RequestWithConfirmation<RESULT>).send() line: 73	
	CDOClientProtocol.send(RequestWithConfirmation<RESULT>) line: 286	
	CDOClientProtocol.changeSubscription(int, List<CDOID>, boolean, boolean) line: 150	
	CDOViewImpl$ChangeSubscriptionManager.request(List<CDOID>, boolean, boolean) line: 1816	
	CDOViewImpl$ChangeSubscriptionManager.subscribe(CDOID, InternalCDOObject, int) line: 1887	
	CDOViewImpl$ChangeSubscriptionManager.handleNewObjects(Collection<CDOObject>) line: 1793	
	CDOViewImpl$ChangeSubscriptionManager.committedTransaction(CDOTransaction, CDOCommitContext) line: 1725	
	CDOTransactionImpl$CDOCommitContextImpl.postCommit(CDOSessionProtocol$CommitTransactionResult) line: 1471	
	CDOSingleTransactionStrategyImpl.commit(InternalCDOTransaction, IProgressMonitor) line: 72	
	CDOTransactionImpl.commit(IProgressMonitor) line: 626	
	CDOTransactionImpl.commit() line: 653
Comment 2 Anders Forsell CLA 2009-12-02 01:48:16 EST
And another one:

Caused by: org.eclipse.emf.cdo.common.util.TransportException: java.util.concurrent.TimeoutException: Timeout 
at org.eclipse.emf.internal.cdo.net4j.protocol.CDOClientProtocol.send(CDOClientProtocol.java:294) 
at org.eclipse.emf.internal.cdo.net4j.protocol.CDOClientProtocol.changeSubscription(CDOClientProtocol.java:150) 
at org.eclipse.emf.internal.cdo.view.CDOViewImpl$ChangeSubscriptionManager.request(CDOViewImpl.java:1816) 
at org.eclipse.emf.internal.cdo.view.CDOViewImpl$ChangeSubscriptionManager.subscribe(CDOViewImpl.java:1887) 
at org.eclipse.emf.internal.cdo.view.CDOViewImpl$ChangeSubscriptionManager.handleNewObjects(CDOViewImpl.java:1793) 
at org.eclipse.emf.internal.cdo.view.CDOViewImpl$ChangeSubscriptionManager.committedTransaction(CDOViewImpl.java:1725) 
at org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl$CDOCommitContextImpl.postCommit(CDOTransactionImpl.java:1471) 
at org.eclipse.emf.internal.cdo.transaction.CDOSingleTransactionStrategyImpl.commit(CDOSingleTransactionStrategyImpl.java:72) 
at org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commit(CDOTransactionImpl.java:626) 
at org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commit(CDOTransactionImpl.java:653)
Comment 3 Andre Dietisheim CLA 2009-12-08 13:09:37 EST
Hi Anders

do you have any piece of code or usecase scheme you could provide that could help us in reproducing the bug?
Comment 4 Anders Forsell CLA 2009-12-10 03:55:00 EST
No, unfortunately I don't have a reproducible scenario. Have you run the Net4j test cases on a single core computer?
Comment 5 Andre Dietisheim CLA 2009-12-15 08:22:55 EST
(In reply to comment #4)
> No, unfortunately I don't have a reproducible scenario. Have you run the Net4j
> test cases on a single core computer?

Unfortunately I cannot run the tests on my machine. Eclipse gets completely exhausted in terms of power and memory. I'll have to catch up with this issue, too. Do the tests run without troubles on your machine?

I agreed with Eike on a code review. We suspect a concurrency issue in net4j. I ran across a doubt with the correlationID in the Signals (which is not synchronized).

@Eike: could you please comment on this?
Comment 6 Andre Dietisheim CLA 2009-12-15 08:27:09 EST
to be more precise:

I get an implementation error in ControlChannel#handleBuffer. I currently have no chance to get the trace nor proper details on it. My machine just roars up and eclipse is completely unresponsive.
Comment 7 Andre Dietisheim CLA 2009-12-18 04:18:40 EST
Discussing the implementation, we agreed that concurrency for the correlationID most probably is of no issue. We have another doubt on the QueueWorkerWorkSerializer though. I currently review it and most probably write tests for it.
Comment 8 Andre Dietisheim CLA 2009-12-29 10:44:20 EST
currently testing the QueueWorkerWorkSerializer. I added a conccurent test for it in QueueWorkerWorkSerializerTest (org.eclipse.net4j.tests). Everything seems normal until now. I'm proceeding to the hairy cases.
Comment 9 Andre Dietisheim CLA 2009-12-29 13:42:51 EST
still no success, no bug visible. Do you have any application logs that could give us further insight?
Comment 10 Eike Stepper CLA 2009-12-30 01:53:42 EST
Please enable Net4j tracing to show buffer usage infos in the logs. Refer to http://wiki.eclipse.org/FAQ_for_CDO_and_Net4j#How_can_I_enable_tracing.3F
Comment 11 Andre Dietisheim CLA 2009-12-30 08:45:45 EST
currently still having a closer look at the channel, where we suspect possible causes (that's what your stack traces suggest). 
I suspect concurrency issues in the send queue. I had a quick hack (inserted a set that tracks the threads that call the queue) to get a clear indication that the send queue is used in concurrent manner.

The send queue calls notification methods after additions and removals.

ex. SendQueue#offer

   @Override
    public boolean offer(IBuffer o)
    {
      super.offer(o);
      added();
      return true;
    }

The queue tracks its size with an atomic integer value that is manipulated in the notification method:

ex. SenQueue#added

   private void added()
    {
      int queueSize = size.incrementAndGet();
      IListener[] listeners = getListeners();
      if (listeners != null)
      {
        fireEvent(new SendQueueEventImpl(Type.ENQUEUED, queueSize), listeners);
      }
    }

IMHO the queue manipulation and size manipulation are not ensured to occur in sequence.

@Eike: comments?
Comment 12 Eike Stepper CLA 2009-12-30 09:04:15 EST
The size if the SendQueue is only used in the SendQueueEvents and from there it's not used by the core. It's mostly provided for potential O&M tooling that might want to print throuhput statistics. These do not need to be 100% accurate, though, in favour of better throughput characteristics without additional synchronization.
Comment 13 Andre Dietisheim CLA 2009-12-30 11:29:44 EST
The latest part I tracked in details was the whole NIO part with selectors, channels and buffers. I did not find any particular place that looks incorrectly synchronized so far. The big advice for NIO, that you should stick to 1 thread per selector is realized (see TCPSelector#run), too. I have to admit that I currently have no real clue where this synchronization issue could occur. Another indication, that net4j works without flaws in multithreaded environments, is that ChannelTest (runs concurrent threads) does succeed. 
I'd really suggest that we try to stick to your environment as close as possible and track the issues in your setup.
Comment 14 Andre Dietisheim CLA 2009-12-30 11:40:57 EST
by chance I checked my other bugs and discovered, that a similar issue was already reported: 

https://bugs.eclipse.org/bugs/show_bug.cgi?id=262875

I'd bet these beast are the same ;-)
Comment 15 Andre Dietisheim CLA 2010-01-15 17:54:21 EST
Hi Anders

I fixed the bug reported in https://bugs.eclipse.org/bugs/show_bug.cgi?id=262875.
The fix's available in 2.0 HEAD now. At least the BufferUnderrunException should be gone now. Could you please check if your bugs are gone and report it here.
Comment 16 Andre Dietisheim CLA 2010-01-16 05:10:45 EST

*** This bug has been marked as a duplicate of bug 262875 ***