Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 173319

Summary: Agent Controller fails to start up due to shared memory TL error
Product: z_Archived Reporter: Samson Wai <samwai>
Component: TPTPAssignee: Igor Alelekov <igor.alelekov>
Status: CLOSED FIXED QA Contact:
Severity: critical    
Priority: P1 CC: igor.alelekov, jkubasta, karla.callaghan, paulslau
Version: unspecifiedKeywords: plan
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Whiteboard: closed460
Attachments:
Description Flags
RASharedMemory patch
none
transport components (transportSupport and SharedMemTL) patch
none
modifed patch for transport components
none
patch for the SharedMemTL project file none

Description Samson Wai CLA 2007-02-07 14:10:31 EST
This happens after I have fixed bug 161220. When I try to start up the new AC on Linux/x86 I get the following error at the console:

Starting Agent Controller.
Error starting transport layers, Agent controller exiting.
See servicelog.log for error report.
ACServer failed to start.

The servicelog.log shows:

Unable to create shared memory: acbuffer.
An error was returned from TransportLayer(1003)::startTransportLayer errNum = -3
Error starting transport layers, Agent controller exiting.

The only way to make it working is to comment out the following block in the config file:

<TransportLayer loadlib="sharedMemTL" type="TPTP_SHAREDMEM">
	<Configuration>
		<MemName>acbuffer</MemName>
	</Configuration>
	<CommandExtractor>tptpCmdExtr</CommandExtractor>
</TransportLayer>

Then everything runs fine.
Comment 1 Samson Wai CLA 2007-02-07 14:13:44 EST
This should be set to 4.4i1 P1.
Comment 2 jkubasta CLA 2007-02-07 19:39:54 EST
Please set priority to P1.
Comment 3 jkubasta CLA 2007-02-07 19:45:03 EST
Please set priority to P1
Comment 4 Igor Alelekov CLA 2007-02-08 01:50:55 EST
Samson, could you repeat AC launching as superuser (root)?
Does the problem remain?
Comment 5 Samson Wai CLA 2007-02-08 10:47:23 EST
It starts only if I log on as root. It should not require root to start AC just like the ChkPass utility I used for verifying user passwords.
Comment 6 Igor Alelekov CLA 2007-02-08 10:57:41 EST
After AC get terminated it leaves the shared memory block (acbuffer) in the system (Linux), and current user is marked as owner of this memory.
Another user can't launch AC due to permission restriction in creating acbuffer. The problem could be fixed by clearing (destroying) of the shared memory on AC termination.
Comment 7 Samson Wai CLA 2007-02-08 12:39:28 EST
Set to P1.
Comment 8 Igor Alelekov CLA 2007-02-09 10:39:15 EST
Created attachment 58661 [details]
RASharedMemory patch
Comment 9 Igor Alelekov CLA 2007-02-09 10:41:29 EST
Created attachment 58662 [details]
transport components (transportSupport and SharedMemTL) patch
Comment 10 Igor Alelekov CLA 2007-02-09 10:47:42 EST
The patch releases acbuffer shared memory block on AC termination.
The patch is devided on two parts:
- RASharedMemory - affects RAC code
- transport - affects AC code

Samson, please review RASharedMemory patch
Joanna, who will review second part (transport) of the patch?
Comment 11 Igor Alelekov CLA 2007-02-12 02:40:08 EST
Created attachment 58753 [details]
modifed patch for transport components

Modified SharedMemTL.dsp (windows project file) has been added to the patch.
Comment 12 jkubasta CLA 2007-03-01 12:31:58 EST
Igor, would you please ask Randy or Kevin to review the patch for the transport components?
Comment 13 Samson Wai CLA 2007-03-01 12:54:57 EST
Hi Igor, the RAC-side fix looks good to me. It seems that what you are trying to do is to expose the stop flusher routine from the RAC code base and use it in the AC code base. The RAC itself is not using this newly exposed function as all. Am I correct?

On the other hand can this fix handle the case where there was a crashed AC leaving the shared memory uncleared? It seems not... Will we be hit by the same problem then?
Comment 14 Igor Alelekov CLA 2007-03-02 02:43:45 EST
> Hi Igor, the RAC-side fix looks good to me. It seems that what you are trying
> to do is to expose the stop flusher routine from the RAC code base and use it
> in the AC code base. The RAC itself is not using this newly exposed function as
> all. Am I correct?

Yes.

> On the other hand can this fix handle the case where there was a crashed AC
> leaving the shared memory uncleared? It seems not... Will we be hit by the same
> problem then?

Currently any normal AC shutdown leaves uncleared shared memory block acbuffer. This doesn't allow launch AC by other users.
The patch fixes this issue.

As for crashed AC - additional investigations are required.
Comment 15 Samson Wai CLA 2007-03-05 10:34:41 EST
Hi Igor. I would recommend addressing the crash scenario as well when fixing this bug. This will prevent users from hitting the same problem if they choose to run "kill -9" on the Agent Controller.
Comment 16 Igor Alelekov CLA 2007-03-05 10:58:10 EST
(In reply to comment #15)
> Hi Igor. I would recommend addressing the crash scenario as well when fixing
> this bug. This will prevent users from hitting the same problem if they choose
> to run "kill -9" on the Agent Controller.

Yes, it is important.
But it seems that the patch, fixing the normal termination scenario could be aplied. And additional investigations of the crash scenario could be done bit later.
Comment 17 Igor Alelekov CLA 2007-03-13 07:19:06 EDT
*** Bug 175264 has been marked as a duplicate of this bug. ***
Comment 18 Samson Wai CLA 2007-03-13 10:22:48 EDT
The attached two patches have been committed to CVS. Igor please mark this one as closed.

Please also open a new bug to investigate to see if we can handle the crash scenario for 4.4. If we cannot fix that then we will need to add a readme entry telling user what to clean up after an AC crash. Thanks.
Comment 19 Igor Alelekov CLA 2007-03-13 11:01:12 EDT
resolving as fixed, new bug #177153 is opened to investigate the AC crash scenario
Comment 20 Igor Alelekov CLA 2007-03-15 03:40:14 EDT
Created attachment 60903 [details]
patch for the SharedMemTL project file
Comment 21 Igor Alelekov CLA 2007-03-15 03:44:08 EDT
Hi Samson,
Review, please, the patch for the SharedMemTL project file.
It appends necessary link to the Release configuration. The same link in the Debug configuration already done by the previous patch.
Comment 22 Samson Wai CLA 2007-03-15 10:21:39 EDT
Igor. It is now checked in.
Comment 23 Igor Alelekov CLA 2007-03-15 10:24:53 EDT
Thank you.
Resolving as fixed.
Comment 24 Paul Slauenwhite CLA 2009-06-30 09:41:54 EDT
As of TPTP 4.6.0, TPTP is in maintenance mode and focusing on improving quality by resolving relevant enhancements/defects and increasing test coverage through test creation, automation, Build Verification Tests (BVTs), and expanded run-time execution. As part of the TPTP Bugzilla housecleaning process (see http://wiki.eclipse.org/Bugzilla_Housecleaning_Processes), this enhancement/defect is verified/closed by the Project Lead since this originator of this enhancement/defect has an inactive Bugzilla account and considered to be fixed. If this enhancement/defect is still unresolved and reproducible in the latest TPTP release (http://www.eclipse.org/tptp/home/downloads/), please re-open.