Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 229134 - Parallel agent startup errors, InactiveProcessException
Summary: Parallel agent startup errors, InactiveProcessException
Status: CLOSED WORKSFORME
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: TPTP (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P1 critical (vote)
Target Milestone: ---   Edit
Assignee: DuWayne Morris CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 229189
Blocks:
  Show dependency tree
 
Reported: 2008-04-28 14:10 EDT by DuWayne Morris CLA
Modified: 2016-05-05 10:28 EDT (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description DuWayne Morris CLA 2008-04-28 14:10:47 EDT
When a consumming product uses parallel agent initialization with 6 to 10 agents, there are frequent failures (nearly every test execution).  More detailed information needs to be gathered.  It is not completely clear at this point where the issue is (AC, TPTP execution harness, or consumming product code).

It appears that typically, one or two of the execution runners will fail when attempting to run a test with 6 or more agent machines.  The symptom is:

Session JVM is created
Deployment is completed.
Agent environment is set
Data Processors are initialized.
After 60 seconds, the InactiveProcessException is thrown.

The InactiveProcessException generally means that the test runner either was unable to fully start or exited nearly immmediately (we couldn't find the process ID provided by the AC when it started the test runner).

This defect will be updated after more detailed information is gathered.
Comment 1 Paul Slauenwhite CLA 2008-04-29 08:33:24 EDT
Duwayne, please triage, assign a sizing, and provide a patch for I7.
Comment 2 DuWayne Morris CLA 2008-05-01 09:33:16 EDT
Marking this as depends on 229189.  The failure rate I have seen of running multiple agents without parallel agent startup engaged is very high and similar to this defect failure mode.  I have concluded that this defect cannot be worked on effectively until 229189 is resolved.
Comment 3 DuWayne Morris CLA 2008-05-05 09:00:22 EDT
Adding a sizing, this is only a guess, we do not know the extent of the issues.
Comment 4 DuWayne Morris CLA 2008-05-09 09:08:10 EDT
After running many tests using 8 to 10 windows only agents, I am unable to see a separate issue between parallel agent startup problems and the intermittant issues in the blocking defect 229189.  Failure rates have varied widely from 1 in 3 to 5 runs versus 1 in 15 or more executions (using the same agents and the same runtime workbench).  It would be very difficult to determine if parallel execution has a problem or to verify that the feature works properly until the serial startup reliability is resolved to a more stable condition with consistent behavior.  Many of the failures actually happen at some point after launch is completed, including after the agent reaches the "READY" state or even later such as communication errors ending in a hung state during log transfer after the test execution is completed.
Comment 5 Paul Slauenwhite CLA 2008-05-29 12:18:49 EDT
Adding dependency on 229189.
Comment 6 jkubasta CLA 2008-05-29 19:37:20 EDT
My understanding is that we should be able to make progress on this defect now. There are no known AC problems on Windows (229189). There are some issues on Linux still being addressed
Comment 7 DuWayne Morris CLA 2008-06-05 10:03:49 EDT
As a result of testing, we are not currently seeing failures in multi-agent runs that are specifically associated with TPTP parallel agent initialization code changes.  We have noticed failures that are thought to be related to consumming product code robustness.  Therefore, given that we have reached the end of TPTP 4.5 changes, our team is marking this defect as "WORKSFORME".
Comment 8 DuWayne Morris CLA 2008-06-27 09:47:46 EDT
Verified in 4.5i8 and closing.