| Summary: | java.nio.BufferUnderflowException | ||
|---|---|---|---|
| Product: | [Modeling] EMF | Reporter: | Anders Forsell <aforsell1971> |
| Component: | cdo.core | Assignee: | Andre Dietisheim <adietish> |
| Status: | CLOSED DUPLICATE | QA Contact: | Eike Stepper <stepper> |
| Severity: | normal | ||
| Priority: | P3 | CC: | adietish, lindeman1966 |
| Version: | 2.0 | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Windows Vista | ||
| Whiteboard: | |||
| Bug Depends on: | |||
| Bug Blocks: | 561532 | ||
|
Description
Anders Forsell
Another exception: Thread [pool-1-thread-1] (Suspended (exception EOFException)) owns: CDOViewImpl$ChangeSubscriptionManager$1 (id=583) owns: Object (id=584) owns: JobDispatcherImpl (id=585) ExtendedDataInputStream(DataInputStream).readBoolean() line: not available CDOClientRequest$2(ExtendedDataInput$Delegating).readBoolean() line: 55 ChangeSubscriptionRequest.confirming(CDODataInput) line: 76 ChangeSubscriptionRequest.confirming(CDODataInput) line: 1 ChangeSubscriptionRequest(CDOClientRequest<RESULT>).confirming(ExtendedDataInputStream) line: 83 ChangeSubscriptionRequest(RequestWithConfirmation<RESULT>).doExtendedInput(ExtendedDataInputStream) line: 123 ChangeSubscriptionRequest(Signal).doInput(BufferInputStream) line: 312 ChangeSubscriptionRequest(RequestWithConfirmation<RESULT>).doExecute(BufferInputStream, BufferOutputStream) line: 103 ChangeSubscriptionRequest(SignalActor).execute(BufferInputStream, BufferOutputStream) line: 66 ChangeSubscriptionRequest(Signal).runSync() line: 239 CDOClientProtocol(SignalProtocol<INFRA_STRUCTURE>).startSignal(SignalActor, long) line: 423 ChangeSubscriptionRequest(RequestWithConfirmation<RESULT>).doSend(long) line: 87 ChangeSubscriptionRequest(RequestWithConfirmation<RESULT>).send() line: 73 CDOClientProtocol.send(RequestWithConfirmation<RESULT>) line: 286 CDOClientProtocol.changeSubscription(int, List<CDOID>, boolean, boolean) line: 150 CDOViewImpl$ChangeSubscriptionManager.request(List<CDOID>, boolean, boolean) line: 1816 CDOViewImpl$ChangeSubscriptionManager.subscribe(CDOID, InternalCDOObject, int) line: 1887 CDOViewImpl$ChangeSubscriptionManager.handleNewObjects(Collection<CDOObject>) line: 1793 CDOViewImpl$ChangeSubscriptionManager.committedTransaction(CDOTransaction, CDOCommitContext) line: 1725 CDOTransactionImpl$CDOCommitContextImpl.postCommit(CDOSessionProtocol$CommitTransactionResult) line: 1471 CDOSingleTransactionStrategyImpl.commit(InternalCDOTransaction, IProgressMonitor) line: 72 CDOTransactionImpl.commit(IProgressMonitor) line: 626 CDOTransactionImpl.commit() line: 653 And another one: Caused by: org.eclipse.emf.cdo.common.util.TransportException: java.util.concurrent.TimeoutException: Timeout at org.eclipse.emf.internal.cdo.net4j.protocol.CDOClientProtocol.send(CDOClientProtocol.java:294) at org.eclipse.emf.internal.cdo.net4j.protocol.CDOClientProtocol.changeSubscription(CDOClientProtocol.java:150) at org.eclipse.emf.internal.cdo.view.CDOViewImpl$ChangeSubscriptionManager.request(CDOViewImpl.java:1816) at org.eclipse.emf.internal.cdo.view.CDOViewImpl$ChangeSubscriptionManager.subscribe(CDOViewImpl.java:1887) at org.eclipse.emf.internal.cdo.view.CDOViewImpl$ChangeSubscriptionManager.handleNewObjects(CDOViewImpl.java:1793) at org.eclipse.emf.internal.cdo.view.CDOViewImpl$ChangeSubscriptionManager.committedTransaction(CDOViewImpl.java:1725) at org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl$CDOCommitContextImpl.postCommit(CDOTransactionImpl.java:1471) at org.eclipse.emf.internal.cdo.transaction.CDOSingleTransactionStrategyImpl.commit(CDOSingleTransactionStrategyImpl.java:72) at org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commit(CDOTransactionImpl.java:626) at org.eclipse.emf.internal.cdo.transaction.CDOTransactionImpl.commit(CDOTransactionImpl.java:653) Hi Anders do you have any piece of code or usecase scheme you could provide that could help us in reproducing the bug? No, unfortunately I don't have a reproducible scenario. Have you run the Net4j test cases on a single core computer? (In reply to comment #4) > No, unfortunately I don't have a reproducible scenario. Have you run the Net4j > test cases on a single core computer? Unfortunately I cannot run the tests on my machine. Eclipse gets completely exhausted in terms of power and memory. I'll have to catch up with this issue, too. Do the tests run without troubles on your machine? I agreed with Eike on a code review. We suspect a concurrency issue in net4j. I ran across a doubt with the correlationID in the Signals (which is not synchronized). @Eike: could you please comment on this? to be more precise: I get an implementation error in ControlChannel#handleBuffer. I currently have no chance to get the trace nor proper details on it. My machine just roars up and eclipse is completely unresponsive. Discussing the implementation, we agreed that concurrency for the correlationID most probably is of no issue. We have another doubt on the QueueWorkerWorkSerializer though. I currently review it and most probably write tests for it. currently testing the QueueWorkerWorkSerializer. I added a conccurent test for it in QueueWorkerWorkSerializerTest (org.eclipse.net4j.tests). Everything seems normal until now. I'm proceeding to the hairy cases. still no success, no bug visible. Do you have any application logs that could give us further insight? Please enable Net4j tracing to show buffer usage infos in the logs. Refer to http://wiki.eclipse.org/FAQ_for_CDO_and_Net4j#How_can_I_enable_tracing.3F currently still having a closer look at the channel, where we suspect possible causes (that's what your stack traces suggest).
I suspect concurrency issues in the send queue. I had a quick hack (inserted a set that tracks the threads that call the queue) to get a clear indication that the send queue is used in concurrent manner.
The send queue calls notification methods after additions and removals.
ex. SendQueue#offer
@Override
public boolean offer(IBuffer o)
{
super.offer(o);
added();
return true;
}
The queue tracks its size with an atomic integer value that is manipulated in the notification method:
ex. SenQueue#added
private void added()
{
int queueSize = size.incrementAndGet();
IListener[] listeners = getListeners();
if (listeners != null)
{
fireEvent(new SendQueueEventImpl(Type.ENQUEUED, queueSize), listeners);
}
}
IMHO the queue manipulation and size manipulation are not ensured to occur in sequence.
@Eike: comments?
The size if the SendQueue is only used in the SendQueueEvents and from there it's not used by the core. It's mostly provided for potential O&M tooling that might want to print throuhput statistics. These do not need to be 100% accurate, though, in favour of better throughput characteristics without additional synchronization. The latest part I tracked in details was the whole NIO part with selectors, channels and buffers. I did not find any particular place that looks incorrectly synchronized so far. The big advice for NIO, that you should stick to 1 thread per selector is realized (see TCPSelector#run), too. I have to admit that I currently have no real clue where this synchronization issue could occur. Another indication, that net4j works without flaws in multithreaded environments, is that ChannelTest (runs concurrent threads) does succeed. I'd really suggest that we try to stick to your environment as close as possible and track the issues in your setup. by chance I checked my other bugs and discovered, that a similar issue was already reported: https://bugs.eclipse.org/bugs/show_bug.cgi?id=262875 I'd bet these beast are the same ;-) Hi Anders I fixed the bug reported in https://bugs.eclipse.org/bugs/show_bug.cgi?id=262875. The fix's available in 2.0 HEAD now. At least the BufferUnderrunException should be gone now. Could you please check if your bugs are gone and report it here. *** This bug has been marked as a duplicate of bug 262875 *** |