Community
Participate
Working Groups
Build Identifier: ECF 3.4 While testing DistributedEventAdmin with ECF generic as underlying protocol, I experienced that the client randomly stops receiving events sent by the server. I enabled ecf tracing and following traces are logged on the server at the moment when the client stops receiving the events: (TRACE)[02/11/11;08:25:21:359]CAUGHT null (org.eclipse.ecf.provider.comm.tcp.Client#handleException in thread=StringID[ecftcp://localhost:8600/server]:rcvr:8600:127.0.0.1:1041) java.lang.ArrayIndexOutOfBoundsException at java.net.SocketOutputStream.socketWrite(Unknown Source) at java.net.SocketOutputStream.write(Unknown Source) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(Unknown Source) at java.io.ObjectOutputStream$BlockDataOutputStream.writeByte(Unknown Source) at java.io.ObjectOutputStream.writeNonProxyDesc(Unknown Source) at java.io.ObjectOutputStream.writeClassDesc(Unknown Source) at java.io.ObjectOutputStream.writeOrdinaryObject(Unknown Source) at java.io.ObjectOutputStream.writeObject0(Unknown Source) at java.io.ObjectOutputStream.writeObject(Unknown Source) at org.eclipse.ecf.provider.comm.tcp.Client.send(Client.java:287) at org.eclipse.ecf.provider.comm.tcp.Client.handleRcv(Client.java:357) at org.eclipse.ecf.provider.comm.tcp.Client$4.run(Client.java:327) at java.lang.Thread.run(Unknown Source) (TRACE)[02/11/11;08:25:21:359]removeMember(StringID[Malvs6dpxPKDAWO6C3KHBP9Zqic=]);container=StringID[ecftcp://localhost:8600/server];existingmembers=[Member[StringID[Malvs6dpxPKDAWO6C3KHBP9Zqic=];org.eclipse.ecf.provider.comm.tcp.Client@146b9de], Member[StringID[ecftcp://localhost:8600/server];null]] (TRACE)[02/11/11;08:25:21:359]ENTERING org.eclipse.ecf.provider.remoteservice.generic.RegistrySharedObject#StringID[org.eclipse.ecf.remoteservice.IRemoteServiceContainerAdapter].p.handleEvent() (TRACE)[02/11/11;08:25:21:359]ENTERING org.eclipse.ecf.remoteservice.eventadmin.DistributedEventAdmin#StringID[vsm].p.handleEvent() (TRACE)[02/11/11;08:25:21:359]ENTERING org.eclipse.ecf.remoteservice.eventadmin.DistributedEventAdmin#StringID[vsm].p.handleEvent() (TRACE)[02/11/11;08:25:21:359]ENTERING org.eclipse.ecf.provider.remoteservice.generic.RegistrySharedObject#StringID[org.eclipse.ecf.remoteservice.IRemoteServiceContainerAdapter].p.handleEvent() (TRACE)[02/11/11;08:25:21:359]StringID[ecftcp://localhost:8600/server].disconnect() (TRACE)[02/11/11;08:25:21:359]StringID[ecftcp://localhost:8600/server].RCVR TERMINATING (TRACE)[02/11/11;08:25:21:359]StringID[ecftcp://localhost:8600/server].SENDER TERMINATING I understand why the client stops receiving events as the server stops the sending and receiving threads for this client because of the thrown exception. Why the exception is thrown is less obvious but I think it could be a race condition in org.eclipse.ecf.provider.comm.tcp.Client between the sending and receiving threads because there's a path where both threads access the ObjectOutputStream instance (outputStream) at the same time: - on receiving thread a PingMessage is handled (in handleRcv) and a PingResponseMessage is sent back which results in outputStream.writeObject - on sending thread events are sent to clients which results in outputStream.writeObject and outputStream.reset every maxMsg count Reproducible: Sometimes
Thanks for the report. (In reply to comment #0) <stuff deleted> > I understand why the client stops receiving events as the server stops the > sending and receiving threads for this client because of the thrown exception. > Why the exception is thrown is less obvious but I think it could be a race > condition in org.eclipse.ecf.provider.comm.tcp.Client between the sending and > receiving threads because there's a path where both threads access the > ObjectOutputStream instance (outputStream) at the same time: > - on receiving thread a PingMessage is handled (in handleRcv) and a > PingResponseMessage is sent back which results in outputStream.writeObject > - on sending thread events are sent to clients which results in > outputStream.writeObject and outputStream.reset every maxMsg count Do you have/can you get more information about the initial exception...and/or this code path that would lead to an IO exception on outputStream.writeObject (presumably)? > > > Reproducible: Sometimes If you could help identify the conditions that allow reproducibility (e.g. more frequent ping/keepalive) this will help us both address this bug. Thanks.
(In reply to comment #0) <stuff deleted> > - on receiving thread a PingMessage is handled (in handleRcv) and a > PingResponseMessage is sent back which results in outputStream.writeObject > - on sending thread events are sent to clients which results in > outputStream.writeObject and outputStream.reset every maxMsg count > I'm looking at this now, so assigning to me. I'm thinking that perhaps I may just remove the outputStream.reset() call...in case that call is causing an IOException. Do you have evidence that this is the case? (i.e. outputStream.reset() is throwing an IOException?). Thanks.
Created attachment 188856 [details] proposed fix for Client.java This attachment is Client.java zipped into a zip file. The Client.java has been updated to remove the use of outputStream.reset(). This is a proposed fix for this bug.
Setting target milestone to ECF 3.5.
(In reply to comment #2) > (In reply to comment #0) > <stuff deleted> > > - on receiving thread a PingMessage is handled (in handleRcv) and a > > PingResponseMessage is sent back which results in outputStream.writeObject > > - on sending thread events are sent to clients which results in > > outputStream.writeObject and outputStream.reset every maxMsg count > > > I'm looking at this now, so assigning to me. > I'm thinking that perhaps I may just remove the outputStream.reset() call...in > case that call is causing an IOException. Do you have evidence that this is > the case? (i.e. outputStream.reset() is throwing an IOException?). > Thanks. Hi Scott, I don't have a real evidence that a concurrent access to outputStream.reset() and outputStream.writeObject is the cause, it's just a theory. In fact I didn't read anywhere about the ObjectOutputStream's methods not being thread safe but when I look at the source code, it does not seem to be thread safe. So I guess there might be problems when calling reset() and writeObject() concurrently. About removing the reset() method: I wouldn't remove it because of following statement in the FAQ of java object serialization: Why is OutOfMemoryError thrown after writing a large number of objects into an ObjectOutputStream? The ObjectOutputStream maintains a table mapping objects written into the stream to a handle. The first time an object is written to a stream, its contents are written into the stream; subsequent writes of the object result in a handle to the object being written into the stream. This table maintains references to objects that might otherwise be unreachable by an application, thus, resulting in an unexpected situation of running out of memory. A call to the ObjectOutputStream.reset() method resets the object/handle table to its initial state, allowing all previously written objects to be elgible for garbage collection. Meanwhile I'll try to setup a test where I can reproduce the problem. Keep you posted.
(In reply to comment #5) <stuff deleted> > About removing the reset() method: I wouldn't remove it because of following > statement in the FAQ of java object serialization: > > > Why is OutOfMemoryError thrown after writing a large number of objects into an > ObjectOutputStream? > > The ObjectOutputStream maintains a table mapping objects written into the > stream to a handle. The first time an object is written to a stream, its > contents are written into the stream; subsequent writes of the object result in > a handle to the object being written into the stream. This table maintains > references to objects that might otherwise be unreachable by an application, > thus, resulting in an unexpected situation of running out of memory. A call to > the ObjectOutputStream.reset() method resets the object/handle table to its > initial state, allowing all previously written objects to be elgible for > garbage collection. > > Meanwhile I'll try to setup a test where I can reproduce the problem. Keep you > posted. Ok. One other option that may work to deal with this issue as quickly as possible: put an internal lock around the accesses to the outputStream (both writeObject() and reset), so that multithreaded access is serialized...without assuming that multithreaded access is supported by the ObjectOutputStream. This would be easy to do...and if you are willing I'll do it and you can test locally (i.e. then try to reproduce the problem in your environment). Let me know if this sounds reasonable to you.
(In reply to comment #6) > (In reply to comment #5) > <stuff deleted> > > About removing the reset() method: I wouldn't remove it because of following > > statement in the FAQ of java object serialization: > > > > > > Why is OutOfMemoryError thrown after writing a large number of objects into an > > ObjectOutputStream? > > > > The ObjectOutputStream maintains a table mapping objects written into the > > stream to a handle. The first time an object is written to a stream, its > > contents are written into the stream; subsequent writes of the object result in > > a handle to the object being written into the stream. This table maintains > > references to objects that might otherwise be unreachable by an application, > > thus, resulting in an unexpected situation of running out of memory. A call to > > the ObjectOutputStream.reset() method resets the object/handle table to its > > initial state, allowing all previously written objects to be elgible for > > garbage collection. > > > > Meanwhile I'll try to setup a test where I can reproduce the problem. Keep you > > posted. > Ok. > One other option that may work to deal with this issue as quickly as possible: > put an internal lock around the accesses to the outputStream (both > writeObject() and reset), so that multithreaded access is serialized...without > assuming that multithreaded access is supported by the ObjectOutputStream. > This would be easy to do...and if you are willing I'll do it and you can test > locally (i.e. then try to reproduce the problem in your environment). Let me > know if this sounds reasonable to you. Hi Scott, I can reproduce the problem with the distributed eventAdmin example (org.eclipse.ecf.example.eventadmin.app) by making some modifications in order to increase the probability of occurrence of the problem, with the modifications it mostly happens within 5 minutes. The thrown exceptions are not always the same but they are everytime related to corrupt data in the stream. I already tested out the internal lock approach: I've put one lock around the reset() method call and one lock around the writeObject() and flush() method calls (in send method). The internal lock approach looks fine, I cannot reproduce the problem in the same modified distributed eventAdmin example. I have attached the modified Client.java file.
Created attachment 188999 [details] Patched Client.java with lock on outputStream
Thanks Franky for the bug report, diagnosis, and the patch. I've applied the new version of Client.java, and released to master: http://git.eclipse.org/c/ecf/org.eclipse.ecf.git/commit/?id=a02a770dbc12e4f85396b816117237023f87494d Resolving as fixed. Again...many thanks to Franky...for using and contributing to ECF.