Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 187006

Summary: [CGProf] Crash on multi-core platforms when printing <methodDef> element
Product: z_Archived Reporter: Asaf Yaffe <asaf.yaffe>
Component: TPTPAssignee: Viacheslav <viacheslav.g.rybalov>
Status: CLOSED FIXED QA Contact:
Severity: critical    
Priority: P1 CC: analexee, guru.nagarajan, ivan.g.popov, stanislav.v.polevic
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard: closed460
Bug Depends on: 168531    
Bug Blocks: 190202    
Attachments:
Description Flags
Martini log file
none
quick patch for the bug none

Description Asaf Yaffe CLA 2007-05-15 10:03:58 EDT
Driver: 4.4.0-200705080100A (TPTP 4.4.i3 Candidate Build, Patch A)
O/S: Windows 2003 Server
Platform: Dual Intel Xeon HT 3.2 GHz (4 virtual cores)
JVM: reproduced with Sun and JRockit 1.5 for Windows IA-32 (latest releases)

Crash in standalone Aggregated CGProf when using the org.eclipse.tptp.scenario.thread.PowerWorkload test scenario.

Command line for reproducing the error:
-cp <test framework dir>\bin 
-agentlib:JPIBootLoader=JPIAgent:server=standalone,file=trace.trcxml;CGProf  
org.eclipse.tptp.scenario.thread.PowerWorkload 10

Stack trace of the crashing thread:
strlen() line 66
Martini::JPIAgent::CPrint::FormatName(const char * 0xfeeefeee) line 258 + 9 bytes
Martini::JPIAgent::CPrintXML::printNewMethodElement(unsigned __int64 65875, Martini::MPI::SMethodInfo * 0x4382ade0) line 213 + 15 bytes
PrintMethodDefElement(Martini::JPIAgent::EC_Env * 0x41cdb660, unsigned __int64 65875, Martini::MPI::SMethodInfo * 0x4382ade0) line 282 + 30 bytes
Martini::JPIAgent::EC_Env::PrintMethodDefElement(unsigned __int64 65875, Martini::MPI::SMethodInfo * 0x4382ade0) line 118 + 26 bytes
Martini::CGProf::CNewMethodEvent::HandleEvent(Martini::MPI::SNewMethodEventData & {...}) line 64
Martini::JPI::CNewMethodEventDispatcher::Notify(Martini::JPI::SEmData * 0x43cfe2ac, Martini::MPI::IEventObserver * 0x41d49588, unsigned int 8) line 422 + 17 bytes
Martini::JPI::CEventDispatcher::NotifyObservers(Martini::JPI::SEmData * 0x43cfe2ac) line 248 + 30 bytes
Martini::JPI::CEventManager::NotifyMpiEvent(unsigned int 3, Martini::JPI::SEmData * 0x43cfe2ac) line 881
Martini::JPI::CEventManager::NewMethodEvent(unsigned __int64 65875, const JNIEnv_ * 0x438eaf74) line 1391
MethodEnterHandler(JNIEnv_ * 0x438eaf74, _jobject * 0x43cfe3b4, unsigned char 0, long 65875) line 1449

It seems that the content of the methodInfo pointer passed from the CGProf New Method Event handler is corrupted by another thread. An initial investigation suggests that the Martini GetMethodInfo API returns correct data.

Assigning to Slava for further investigation.
Comment 1 Viacheslav CLA 2007-05-16 10:55:37 EDT
During the test execution following profiler assert statement in debug mode fails sometimes:

[Error: assert "iRes == MRTE_RESULT_OK" failed. File: c:\work\tptp\4.3\jvmtiagent\baseprof\sources\profenv.cpp Line:277]

iRes in this case is -2147483648

This is the code:
    SClassInfo* classInfo = new SClassInfo;
    TResult iRes = m_pMpiApi->GetClassInfo(m_clientId, classId, 
        DR_JAVA_NATIVE_CLASS_NAME | DR_SOURCE_FILE_NAME, classInfo);
    LOG_ASSERT(iRes == MRTE_RESULT_OK);
Comment 2 Asaf Yaffe CLA 2007-05-17 02:31:40 EDT
(In reply to comment #1)
I saw this assertion failure once but was not able to reproduce. I am not sure whether it is related to this bug or not.

Slava, can you please turn-on the Martini logging (level 5) and post a log file here in case you reproduce this assertion again?

Thanks,
Asaf
Comment 3 Viacheslav CLA 2007-05-17 07:37:22 EDT
Created attachment 67644 [details]
Martini log file
Comment 4 Asaf Yaffe CLA 2007-05-29 04:44:16 EDT
I was able to reproduce the assertion. The GetClassInfo API is called with an invalid class id. The class id passed to the function seems like a method id (judging by its value, which is too high for a class id). This is another indication that the New Method event handler data is corrupted.
Comment 5 Viacheslav CLA 2007-05-31 08:42:51 EDT
The crash happens because Martini produces multiple NewMethod events (see 168531 JVMTI CG profiler. Martini produces multiple NewMethod events). In each event handler profiler tries to store Method data into internal storage. If this storage already contains structure with specified method Id, it deletes the previously stored structure and puts into the storage new one. In the same time reference to deleted structure may be used in other thread that previously created this data structure. In 'PowerWorkload 10' test case the test creates 10 new threads almost simultaneously and Martini invokes NewMethodEvent handler for 'run' method in all threads also simultaneously.
There are two ways to fix the bug:
1. To fix 168531 JVMTI CG profiler. Martini produces multiple NewMethod events
2. To fix logic of internal data storing in the profiler.
Both approaches may be applied together.

Easy way to fix the bug: do not delete stored structures, but it will cause memory leaks. It will be eliminated by fixing the bug 168531.
Comment 6 Asaf Yaffe CLA 2007-05-31 09:12:06 EDT
Fixing Bug 168531 at this stage is risky. It may introduce regression problems in other profilers and event handlers (the New Method Event has multiple implementations for different JVMs and instrumentation scenarios).

Therefore, I think this bug should be fixed by modifying the data storage logic in CGProf. It is also advisable to verify that the same problem do not happen in other databases maintained by CGProf and other profilers.

Here's one possible way of fixing the New Method event handler:

1. Check if the method already exists in the database: profenv->GetMethodData(data.methodId)

2. If not exists: get information from Martini and store in the database: profenv->AddNewMethodData(data.methodId)

This solution is thread safe since both GetMethodData and AddNewMethodData are protected by the same critical section (m_pMethodSDataLockObject).

Comment 7 Viacheslav CLA 2007-05-31 09:19:19 EDT
Created attachment 69508 [details]
quick patch for the bug
Comment 8 Asaf Yaffe CLA 2007-05-31 10:24:13 EDT
*** Bug 190194 has been marked as a duplicate of this bug. ***
Comment 9 Viacheslav CLA 2007-06-01 10:04:33 EDT
The patch is checked in CVS.
Comment 10 jkubasta CLA 2007-06-01 21:56:42 EDT
Resolving as fixed since patch committed
Comment 11 Paul Slauenwhite CLA 2009-06-30 13:25:46 EDT
As of TPTP 4.6.0, TPTP is in maintenance mode and focusing on improving quality by resolving relevant enhancements/defects and increasing test coverage through test creation, automation, Build Verification Tests (BVTs), and expanded run-time execution. As part of the TPTP Bugzilla housecleaning process (see http://wiki.eclipse.org/Bugzilla_Housecleaning_Processes), this enhancement/defect is verified/closed by the Project Lead since this enhancement/defect has been resolved and unverified for more than 1 year and considered to be fixed. If this enhancement/defect is still unresolved and reproducible in the latest TPTP release (http://www.eclipse.org/tptp/home/downloads/), please re-open.
Comment 12 Paul Slauenwhite CLA 2009-06-30 13:58:19 EDT
As of TPTP 4.6.0, TPTP is in maintenance mode and focusing on improving quality by resolving relevant enhancements/defects and increasing test coverage through test creation, automation, Build Verification Tests (BVTs), and expanded run-time execution. As part of the TPTP Bugzilla housecleaning process (see http://wiki.eclipse.org/Bugzilla_Housecleaning_Processes), this enhancement/defect is verified/closed by the Project Lead since this enhancement/defect has been resolved and unverified for more than 1 year and considered to be fixed. If this enhancement/defect is still unresolved and reproducible in the latest TPTP release (http://www.eclipse.org/tptp/home/downloads/), please re-open.