Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 364932 - CDO Server fails after losing connection with a client
Summary: CDO Server fails after losing connection with a client
Status: CLOSED WORKSFORME
Alias: None
Product: EMF
Classification: Modeling
Component: cdo.core (show other bugs)
Version: 4.2   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-28 05:14 EST by Alena Repina CLA
Modified: 2012-11-01 11:21 EDT (History)
2 users (show)

See Also:


Attachments
CDO server configuration (deleted)
2011-11-28 05:19 EST, Alena Repina CLA
no flags Details
CDO server configuration (deleted)
2011-11-28 13:18 EST, Alena Repina CLA
no flags Details
CDO server configuration (1.10 KB, text/xml)
2011-11-29 11:10 EST, Alena Repina CLA
no flags Details
Last 200 lines of CDO logs (31.84 KB, application/octet-stream)
2011-11-30 03:58 EST, Alena Repina CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alena Repina CLA 2011-11-28 05:14:12 EST
Build Identifier: 

I've got CDO server on Linux machine and CDO client on MacOS X. I regularly find CDO server failed after night. Tail of the CDO log contains following message:
     Socket channel closed: java.nio.channels.SocketChannel[connected local=/10.120.68.101:2036 remote=/10.253.26.231:58722]
It says that channel is closed between CDO server (10.120.68.101:2036) and CDO client (10.253.26.231:58722) despite they are connected. After that TCP connection is deactivated and server fails.

I attached CDO server configuration and there's the tail of the CDO logs:

Thread-7 [debug] Ordering server operation INTEREST WRITE java.nio.channels.SocketChannel[connected local=/10.120.68.101:2036 remote=/10.253.26.231:58722] = true
TCPSelector [debug] Executing server operation INTEREST WRITE java.nio.channels.SocketChannel[connected local=/10.120.68.101:2036 remote=/10.253.26.231:58722] = true
TCPSelector [debug] Setting interest READ|WRITE (was read)
TCPSelector [debug] Writing java.nio.channels.SocketChannel[connected local=/10.120.68.101:2036 remote=/10.253.26.231:58722]
TCPSelector [debug.buffer] Writing 5 bytes (EOS)
00 00 00 65 01 
TCPSelector [debug.buffer] Retaining Buffer@50[RELEASED]
TCPSelector [debug] Ordering server operation INTEREST WRITE java.nio.channels.SocketChannel[connected local=/10.120.68.101:2036 remote=/10.253.26.231:58722] = false
TCPSelector [debug] Executing server operation INTEREST WRITE java.nio.channels.SocketChannel[connected local=/10.120.68.101:2036 remote=/10.253.26.231:58722] = false
TCPSelector [debug] Setting interest READ (was read|write)
TCPSelector [debug] Reading java.nio.channels.SocketChannel[connected local=/10.120.68.101:2036 remote=/10.253.26.231:58722]
TCPSelector [debug.buffer] Obtained Buffer@47[INITIAL]
TCPSelector [debug.buffer] Retaining Buffer@47[RELEASED]
TCPSelector [debug] Socket channel closed: java.nio.channels.SocketChannel[connected local=/10.120.68.101:2036 remote=/10.253.26.231:58722]
Thread-8 [debug.lifecycle] Deactivating TCPServerConnector[10.253.26.231:58,722]
Thread-8 [debug.lifecycle] Deactivating Channel[Control, SERVER]
Thread-8 [debug.lifecycle] Deactivating ChannelReceiveSerializer@41
Thread-8 [debug.connector] Setting state DISCONNECTED (was connected) for TCPServerConnector[null:0]
Thread-8 [debug.lifecycle] Deactivating Channel[1, SERVER, cdo]
Thread-8 [debug.lifecycle] Deactivating ChannelReceiveSerializer@45
Thread-8 [debug.lifecycle] Deactivating SignalProtocol[cdo]
Thread-8 [debug.lifecycle] Deactivating Session[2]
Thread-8 [debug.acceptor] Removed connector TCPServerConnector[null:0]
Connection-Keep-Alive-DBStore@5 [debug] DB connection keep-alive task activated

Reproducible: Always
Comment 1 Alena Repina CLA 2011-11-28 05:19:42 EST
Created attachment 207591 [details]
CDO server configuration
Comment 2 Alena Repina CLA 2011-11-28 13:18:17 EST
Created attachment 207617 [details]
CDO server configuration
Comment 3 Eike Stepper CLA 2011-11-29 02:09:36 EST
Caspar, you're the expert for reconnecting sessions. Do those include a HeartBeatProtocol channel?
Comment 4 Alena Repina CLA 2011-11-29 07:20:08 EST
Today CDO server failed again and this time I've got an exception in logs:
    java.nio.BufferUnderflowException
	at java.nio.Buffer.nextGetIndex(Buffer.java:480)
	at java.nio.DirectByteBuffer.getShort(DirectByteBuffer.java:529)
	at org.eclipse.net4j.signal.SignalProtocol.handleBuffer(SignalProtocol.java:194)
	at org.eclipse.spi.net4j.Channel$ReceiverWork.run(Channel.java:352)
	at org.eclipse.net4j.util.concurrent.QueueRunner.work(QueueRunner.java:26)
	at org.eclipse.net4j.util.concurrent.QueueRunner.work(QueueRunner.java:1)
	at org.eclipse.net4j.util.concurrent.QueueWorker.doWork(QueueWorker.java:81)
	at org.eclipse.net4j.util.concurrent.QueueWorker.work(QueueWorker.java:72)
	at org.eclipse.net4j.util.concurrent.Worker$WorkerThread.run(Worker.java:206)

Now I'm not sure that bad connection between CDO server and CDO client really can cause CDO server fail. Nevertheless I'm going to try RecoverySession, you mentioned.
Comment 5 Eclipse Webmaster CLA 2011-11-29 09:23:31 EST
The content of attachment 207617 [details] has been deleted by
    Eclipse Webmaster <webmaster@eclipse.org>
who provided the following reason:

Requested by poster

The token used to delete this attachment was generated at 2011-11-29 09:23:18 EST.
Comment 6 Eclipse Webmaster CLA 2011-11-29 10:27:49 EST
The content of attachment 207591 [details] has been deleted by
    Eclipse Webmaster <webmaster@eclipse.org>
who provided the following reason:

Requested by poster

The token used to delete this attachment was generated at 2011-11-29 10:27:41 EST.
Comment 7 Alena Repina CLA 2011-11-29 11:10:54 EST
Created attachment 207660 [details]
CDO server configuration
Comment 8 Alena Repina CLA 2011-11-30 03:58:21 EST
Created attachment 207709 [details]
Last 200 lines of CDO logs

Today CDO server failed again. Reconnecting session didn't help. This is the tail of the latest CDO log.
Comment 9 Caspar D. CLA 2011-12-01 00:24:03 EST
(In reply to comment #3)
> Caspar, you're the expert for reconnecting sessions. Do those include a
> HeartBeatProtocol channel?

Yes, optionally, see RecoveringCDOSessionImpl.createTCPConnector.

It is configured by the RecoveringCDOSessionConfiguration instance,
see #setHeartBeatEnabled(boolean) there.
Comment 10 Eike Stepper CLA 2011-12-09 03:18:51 EST
(In reply to comment #4)
> Today CDO server failed again and this time I've got an exception in logs:
>     java.nio.BufferUnderflowException

This *can* be a follow-up problem after your TCP connection has timed out.

> Now I'm not sure that bad connection between CDO server and CDO client really
> can cause CDO server fail.

No that shouldn't make the server fail, only the client with the outtimed connection should of course fail. What makes you think that the server fails?
Comment 11 Alena Repina CLA 2011-12-09 08:55:17 EST
I'm not sure what caused server fail. Usually I start CDO server at morning and I find that CDO server process is down next morning. Every time log ends with something like that:
Connection-Keep-Alive-DBStore@5 [debug] DB connection keep-alive task activated
Usually I don't see any exceptions in log, unfortunately.
Comment 12 Eike Stepper CLA 2011-12-09 08:57:19 EST
You can add a LifecycleEventAdapter via repository.addListener() and set a breakpoint in doDeactivate() to see who's causing the deactivation.
Comment 13 Eike Stepper CLA 2012-08-14 22:56:31 EDT
Moving all open issues to 4.2. Open bugs can be ported to 4.1 maintenance after they've been fixed in master.
Comment 14 Eike Stepper CLA 2012-11-01 11:21:22 EDT
No activity or ping here for a year. Please reopen this bug if you feel a need.