Community
Participate
Working Groups
When a consumming product uses parallel agent initialization with 6 to 10 agents, there are frequent failures (nearly every test execution). More detailed information needs to be gathered. It is not completely clear at this point where the issue is (AC, TPTP execution harness, or consumming product code). It appears that typically, one or two of the execution runners will fail when attempting to run a test with 6 or more agent machines. The symptom is: Session JVM is created Deployment is completed. Agent environment is set Data Processors are initialized. After 60 seconds, the InactiveProcessException is thrown. The InactiveProcessException generally means that the test runner either was unable to fully start or exited nearly immmediately (we couldn't find the process ID provided by the AC when it started the test runner). This defect will be updated after more detailed information is gathered.
Duwayne, please triage, assign a sizing, and provide a patch for I7.
Marking this as depends on 229189. The failure rate I have seen of running multiple agents without parallel agent startup engaged is very high and similar to this defect failure mode. I have concluded that this defect cannot be worked on effectively until 229189 is resolved.
Adding a sizing, this is only a guess, we do not know the extent of the issues.
After running many tests using 8 to 10 windows only agents, I am unable to see a separate issue between parallel agent startup problems and the intermittant issues in the blocking defect 229189. Failure rates have varied widely from 1 in 3 to 5 runs versus 1 in 15 or more executions (using the same agents and the same runtime workbench). It would be very difficult to determine if parallel execution has a problem or to verify that the feature works properly until the serial startup reliability is resolved to a more stable condition with consistent behavior. Many of the failures actually happen at some point after launch is completed, including after the agent reaches the "READY" state or even later such as communication errors ending in a hung state during log transfer after the test execution is completed.
Adding dependency on 229189.
My understanding is that we should be able to make progress on this defect now. There are no known AC problems on Windows (229189). There are some issues on Linux still being addressed
As a result of testing, we are not currently seeing failures in multi-agent runs that are specifically associated with TPTP parallel agent initialization code changes. We have noticed failures that are thought to be related to consumming product code robustness. Therefore, given that we have reached the end of TPTP 4.5 changes, our team is marking this defect as "WORKSFORME".
Verified in 4.5i8 and closing.