Community
Participate
Working Groups
Build Identifier: 4.0 The following exception can occur during raw replication: !ENTRY org.eclipse.net4j.util 4 0 2010-10-25 14:28:12.124 !MESSAGE java.util.concurrent.TimeoutException !STACK 0 org.eclipse.emf.cdo.common.util.TransportException: java.util.concurrent.TimeoutException at org.eclipse.emf.cdo.internal.net4j.protocol.CDOClientProtocol.send(CDOClientProtocol.java:368) at org.eclipse.emf.cdo.internal.net4j.protocol.CDOClientProtocol.replicateRepositoryRaw(CDOClientProtocol.java:311) at org.eclipse.emf.cdo.internal.server.syncing.RepositorySynchronizer$ReplicateRunnable.run(RepositorySynchronizer.java:440) at org.eclipse.net4j.util.concurrent.QueueRunner.work(QueueRunner.java:26) at org.eclipse.net4j.util.concurrent.QueueRunner.work(QueueRunner.java:1) at org.eclipse.net4j.util.concurrent.QueueWorker.work(QueueWorker.java:75) at org.eclipse.net4j.util.concurrent.Worker$WorkerThread.run(Worker.java:188) Caused by: java.util.concurrent.TimeoutException at org.eclipse.net4j.util.io.IOTimeoutException.createTimeoutException(IOTimeoutException.java:46) at org.eclipse.net4j.signal.Signal.runSync(Signal.java:244) at org.eclipse.net4j.signal.SignalProtocol.startSignal(SignalProtocol.java:445) at org.eclipse.net4j.signal.RequestWithConfirmation.doSend(RequestWithConfirmation.java:87) at org.eclipse.net4j.signal.RequestWithConfirmation.send(RequestWithConfirmation.java:73) at org.eclipse.emf.cdo.internal.net4j.protocol.CDOClientProtocol.send(CDOClientProtocol.java:360) ... 6 more Occurs only when big databases are replicated (around 300MB H2). Reproducible: Always Steps to Reproduce: 1. Do a lot of commits on huge lists (-> The DATATYPE_LIST db table gets very big) 2. Connect a client with raw replication enabled 3. -> Raw replication fails.
Hi Erwin, I noticed that the stack trace does not match the HEAD sources. Have you applied modifications or is it just not the hottest code? If read (response) timeouts happen due to large data volumes being collected in the server, shouldn't you just increase the timeout value? You would do that in your CDOSessionConfigurationFactory where you'd create a CDOSessionConfiguration that sets the larger timeout value at the end of openSession like this: ((org.eclipse.emf.cdo.net4j.CDOSession)session).options().getProtocol().setTimeout(millis); If you can confirm that this works for you I'd like to close this bug as WORKSFORME, but I'm waiting until you've tried it ;-)
It seems that setting the protocol timeout for a Net4j CDOSession becomes easier with bug 278181: sessionsConfiguration.setSignalTimeout(millis);
(In reply to comment #1) I'm working on HEAD but not latest. I though about increasing the session timeout. This would fix the issue for some time depending how big the timeout is set. The commit request (CommitTransactionRequest) is a RequestWithMonitoring which is (I think) the reason that the timeout of the session must not be increased when working with huge commits. The ReplicateRepositoryRawRequest should also extend the RequestWithMonitoring because it will always be bigger that the biggest commit ever made.
> The ReplicateRepositoryRawRequest should also extend the RequestWithMonitoring > because it will always be bigger that the biggest commit ever made. The idea is sound but looking at the code I have some doubts that it would reliably prevent the read timeout from happening on the client. The CommitTransactionRequest is fundamentally different because it carries the major payload on the way *to* the repo, whereas the RepositoryReplicationRequest does it on the way *back*. From this point of view the active monitoring in the repo does not seem to replace the more general read timeouts but rather implement an *additional* mechanism. Thinking...
BTW. you can see the problem in RequestWithConfirmation: @Override void doExecute(BufferInputStream in, BufferOutputStream out) throws Exception { doOutput(out); doInput(in); } A client that expects a confirmation starts immediately to read from the response stream after the last request byte has been sent. At that time also the protocol's read timeout starts to become effective.
My last two comments turned out to be wrong: The RequestWithMonitoring mechanism *does* already reset the normal read timout ;-)
Created attachment 181846 [details] patch Implementation with monitoring.
Created attachment 181960 [details] Patch Previous fix didn't work. -> Monitor is now started. -> Breaks the progress monitor on client side. Therefore not yet final fix.
Created attachment 182096 [details] Patch v3 I've consolidated the monitored signals in CDO.
(In reply to comment #9) > Created an attachment (id=182096) [details] > Patch v3 > > I've consolidated the monitored signals in CDO. Tested your patch. Works well except the progress on the repository synchronizer is broken. Always starts at 96%.
Opened bug 330227 for the problem mentioned in comment 10. Therefore this bug can be closed.
Thank you, Erwin! Committed to HEAD
Available in R20110608-1407