Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 141909

Summary: [regression] Running a class in enabled mode in the workbench
Product: z_Archived Reporter: Navid Mehregani <nmehrega>
Component: TPTPAssignee: Igor Alelekov <igor.alelekov>
Status: CLOSED FIXED QA Contact:
Severity: major    
Priority: P1 CC: andrew.kaylor, duncan, haggarty, smith, vlegros
Version: unspecifiedKeywords: plan
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard: closed460
Attachments:
Description Flags
Error Screenshot
none
Console Switch
none
patch none

Description Navid Mehregani CLA 2006-05-15 18:19:01 EDT
Please follow the instructions below to reproduce the problem:

- Import StartStop.java into a java project in your workspace
- In its launch configuration specify the -XrunpiAgent:server=enabled profiling option
- Run the class *within* eclipse
- Attach to the profiling agent and start monitoring it
- Switch to the StartStop console.  Wait for 3 seconds and press enter
- In some cases this will generate the following error message:
"Profiler unable to establish connection with RAC, Possible fix: Add more memory or reduce dataChannelSize in serviceconfig.xml"
The error message is attached.

- In some cases the error message is not generated but the profiling output is incorrect.  It's missing the invocation for methodA().

Strangely, this seems to work fine if you launch the Java app from a command prompt.  

This defect is causing 17 failures in the automation framework.  This problem seems to be a regression.  I tried this with TPTP 4.1 and it worked.
Comment 1 Navid Mehregani CLA 2006-05-15 18:20:20 EDT
Created attachment 41528 [details]
Error Screenshot
Comment 2 Navid Mehregani CLA 2006-05-15 18:23:31 EDT
Note that the error message also needs to be updated, since there doesn't seem to be a dataChannelSize in serviceconfig.xml for the new AC.
Comment 3 Bob Duncan CLA 2006-05-17 18:35:17 EDT
We are getting RC=-3 from ra_attachToShm when the piAgent first tries to connect to the Agent Controller (via BC). 
Comment 4 Bob Duncan CLA 2006-05-18 11:01:37 EDT
It seems to be failing in the shared memory code. A scan of the code suggests that the error is an OSS_ERR_NOTFOUND after either a CreateFileMapping() or OpenFileMapping() or shmget() call from ipcMemAttach in ossipcmemory.cpp.  The code sems to be substantially the same for both AC and RAC but the problem has so far only surfaced with the AC. 

Navid can reproduce this problem solidly on his machine but it is intermittent on mine (once out of every several attempts). I do get alternative symptoms solidly:  either the missing MethodA as reported by Navid or often no data in the execution stats view at all.
Comment 5 Bob Duncan CLA 2006-05-18 11:48:32 EDT
Transferring to Platform.Collection as a shared memory problem.
Comment 6 Navid Mehregani CLA 2006-05-18 12:24:18 EDT
I originally overestimated the severity of this bug.  This problem is actually just causing the 'ClientConnectionResetTestAttach' test case to fail in the automation framework.  This seems to be a regression.  I didn't experience this problem before.  I'm reducing the severity to major.

Note that you can use the automated framework to reproduce this bug.  Just run the ClientConnectionResetTestAttach test case.
Comment 7 Navid Mehregani CLA 2006-05-18 14:16:11 EDT
I've tried this with the old RAC and it does NOT seem to suffer from this problem. It's only the new AC that's experiencing this.
Comment 8 Navid Mehregani CLA 2006-05-18 14:18:12 EDT
Note that when you attach to the profiling console, Eclipse automatically switches the console view to the console for the agent.  There is a button on the console to switch back to the console of StartStop.  Please see the attached image.
Comment 9 Navid Mehregani CLA 2006-05-18 14:20:38 EDT
Created attachment 41921 [details]
Console Switch
Comment 10 Navid Mehregani CLA 2006-05-18 16:00:41 EDT
This problem seems to only occur with Sun JDK 1.5.  I can't reproduce the problem on Sun 1.4, IBM 1.4/1.5.  Hendra has also reproduced this problem on his laptop.
Comment 11 Hendra Suwanda CLA 2006-05-18 16:34:09 EDT
I am not sure if this is useful information or not, but...

We observed that the shared memory was created by the AC, but it disappeared quickly.  The shared memory could have been deleted soon after it was created, so it seemed.
Comment 12 Karla Callaghan CLA 2006-05-18 19:08:59 EDT
Setting target and priority.
Comment 13 Kevin P O'Leary CLA 2006-05-19 17:32:24 EDT
I have gone through these steps and been successful each time.

It is possible that we work we have done with removing static buffers
has fixed this issue.
Comment 14 Navid Mehregani CLA 2006-05-21 15:30:29 EDT
I'm still able to reproduce this problem with the TPTP-4.2.0-200605190100 driver.    Note that I consistently get this problem when I run the 'ClientConnectionResetTestAttach' automated test case.  Can you try running this test case to see what happens?
Comment 15 Kevin P O'Leary CLA 2006-05-22 18:57:06 EDT
I have been testing using ClientConnectionResetTestAttach... and I am seeing
a failure. I believe this to be a regression from earlier runs I have done...
but I have not run this test for a while.
The test suite tries to verify that a method has been executed and the trace dump does not look accurate. (it doesn't match the expected results which to look correct to me)

Investigating further.
Comment 16 Kevin P O'Leary CLA 2006-05-24 19:39:43 EDT
there was an error with how BC was  setting up shared memory... 

this allowed a race condition where we started a flusher thread before the agent was ready to monitor so the memory flusher exited (because nobody was attached) then we destroyed the shared memory before the agents attach was completed.
Comment 17 Navid Mehregani CLA 2006-05-26 11:36:47 EDT
Kevin, I'm still able to reprodce this problem with the TPTP-4.2.0-200605251528 driver and SUN JDK 1.5 specified in my local_config_file.xml.  It either gives me the shared memory problem or methoA can't be found.

I've also reproduced it on my desktop. If you're not observing the problem it could be due to the difference in the speed of our computers and how fast context switching is done.  
Comment 18 Karla Callaghan CLA 2006-05-26 12:30:54 EDT
Andy - please revisit this in Kevin's absence.
Comment 19 Navid Mehregani CLA 2006-05-30 14:24:00 EDT
I just tested this again and I can't seem to reproduce it.  For now, I'll mark this as fixed and reopen it if I come across the problem again.
Comment 20 Navid Mehregani CLA 2006-05-30 14:24:24 EDT
Verified.
Comment 21 Navid Mehregani CLA 2006-05-30 14:45:43 EDT
This problem is caused by a race condition.  In some cases it seems to work fine, but the problem still exists.  Kevin has been able to repduce the problem on his machine so I'll reopen the bug again.
Comment 22 Karla Callaghan CLA 2006-05-31 12:02:06 EDT
Retargetting - If time allows, will try to determine if further refinement of the fix can be done in i4.
Comment 23 Karla Callaghan CLA 2006-06-06 12:33:02 EDT
The failures are rare enough that I don't think this issue needs resolving in 4.2 and we are out of time.  It is a candidate for 4.2.1, so setting the target to that for now.
Comment 24 Kevin P O'Leary CLA 2006-07-24 18:05:59 EDT
I believe this to be a timing issue. The AC is passing back all the profiled data that it is being sent but the method counts are incorrect and "methodA" from the does not have a corresponding methodEntry and methodExit.

I have only been able to get this test to fail as part of the test framework... (Navid mentioned it is possible to time things to get it to fail using the gui)

I am retargeting this bug to 4.3 and will look into getting some piAgent and 
probekit assistance to debug the issue further.

I looking into old bugzilla reports I noticed that the RAC had some similiar timing issues that were resolved, this might be a direction to pursue.
Comment 25 Karla Callaghan CLA 2006-09-11 12:45:43 EDT
Resetting priority to P2.  Unless resources free up to assist in looking at this issue, it will not be fixed in 4.3.
Comment 26 Karla Callaghan CLA 2006-10-24 13:24:54 EDT
Retargeting to 4.4 as 4.3 is closing down to all non-essential bug fixing.
Comment 27 Karla Callaghan CLA 2007-02-01 12:32:56 EST
Reassign owner and set priority to P1
Comment 28 Karla Callaghan CLA 2007-02-09 11:44:13 EST
Added effort estimate: 10 days
Comment 29 Igor Alelekov CLA 2007-03-16 10:16:11 EDT
Created attachment 61096 [details]
patch
Comment 30 Igor Alelekov CLA 2007-03-16 10:18:06 EDT
Andy, could you review the patch?
Comment 31 Andy Kaylor CLA 2007-03-23 17:18:03 EDT
Checked in Igor's fix
Comment 32 Navid Mehregani CLA 2007-03-23 18:16:33 EDT
Allan, can you please verify the fix for this with the automation framework?
Comment 33 Alan Haggarty CLA 2007-06-21 17:33:54 EDT
Using the 4.4.0GA candidate and Sun jdk 1.5.0_12 I get the error:
[ClientConnectionResetTestAttach] methodEntry of methodA was not found
about 1/3 of the time and not the shared memory error.

Note in 4.4 we are now using jre 1.4.2 to test piagent. I used 1.5
specifically to try this.

Comment 34 Igor Alelekov CLA 2007-06-21 17:45:11 EDT
Alan, are you using the same test case?
Comment 35 Alan Haggarty CLA 2007-06-21 18:38:16 EDT
Yes - ClientConnectionResetTestAttach
Comment 36 Paul Slauenwhite CLA 2009-06-30 12:05:10 EDT
As of TPTP 4.6.0, TPTP is in maintenance mode and focusing on improving quality by resolving relevant enhancements/defects and increasing test coverage through test creation, automation, Build Verification Tests (BVTs), and expanded run-time execution. As part of the TPTP Bugzilla housecleaning process (see http://wiki.eclipse.org/Bugzilla_Housecleaning_Processes), this enhancement/defect is verified/closed by the Project Lead since this enhancement/defect has been resolved and unverified for more than 1 year and considered to be fixed. If this enhancement/defect is still unresolved and reproducible in the latest TPTP release (http://www.eclipse.org/tptp/home/downloads/), please re-open.