This Bugzilla instance is deprecated, and most Eclipse projects now use GitHub or Eclipse GitLab. Please see the deprecation plan for details.
Bug 259993 - em.find() hangs up in some situations on WebSphere 7.0.0.1
Summary: em.find() hangs up in some situations on WebSphere 7.0.0.1
Status: RESOLVED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Eclipselink (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P2 blocker (vote)
Target Milestone: ---   Edit
Assignee: Michael OBrien CLA
QA Contact:
URL: http://wiki.eclipse.org/EclipseLink/B...
Whiteboard:
Keywords:
Depends on: 238842
Blocks: 261671 264768 326097
  Show dependency tree
 
Reported: 2009-01-05 15:17 EST by Yiping Zhao CLA
Modified: 2022-06-09 10:08 EDT (History)
5 users (show)

See Also:


Attachments
Alt#1 patch (for discussion and unblocking only) until Alt#3 is worked out (10.83 KB, patch)
2009-01-21 09:30 EST, Michael OBrien CLA
no flags Details | Diff
Hanging tests 7 + 1 for UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent (27.31 KB, text/plain)
2009-01-28 16:35 EST, Michael OBrien CLA
no flags Details
eclipselink core patch (pre code-review) (34.74 KB, patch)
2009-02-03 23:43 EST, Michael OBrien CLA
no flags Details | Diff
eclipselink core patch (pre-code review) (6.74 KB, patch)
2009-02-09 19:19 EST, Michael OBrien CLA
no flags Details | Diff
eclipselink jpa test (14.53 KB, patch)
2009-02-10 12:15 EST, Michael OBrien CLA
no flags Details | Diff
eclipselink core patch (code review 2) (7.32 KB, patch)
2009-02-10 13:43 EST, Michael OBrien CLA
no flags Details | Diff
eclipselink core patch (code review 3) (12.11 KB, patch)
2009-02-10 16:49 EST, Michael OBrien CLA
no flags Details | Diff
eclipselink core patch (code review 3b) - mergeManager NPE check on was7 afterCompletion callback (12.35 KB, patch)
2009-02-11 15:08 EST, Michael OBrien CLA
no flags Details | Diff
eclipselink core patch (code review 4) - 1 of 3 sets of lockThread kept in writeLockManager.acquire*() functions (13.76 KB, patch)
2009-02-11 16:41 EST, Michael OBrien CLA
no flags Details | Diff
eclipselink jpa test (extra tests and setup was missing table creation) (8.91 KB, patch)
2009-02-13 11:59 EST, Michael OBrien CLA
no flags Details | Diff
EclipseLink full trunk patch (off rev 3439) (34.47 KB, patch)
2009-02-17 11:42 EST, Michael OBrien CLA
no flags Details | Diff
EclipseLink full trunk patch (off rev 3439) (34.47 KB, patch)
2009-02-17 11:43 EST, Michael OBrien CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yiping Zhao CLA 2009-01-05 15:17:50 EST
It seems that the hung up appears when generated id is not in the testcase in which the em.find() resides. 

It hung up:

1. At "dt = em.find(DateTime.class, datetimeId);" in testNullifySqlDate() in org.eclipse.persistence.testing.tests.jpa.datetime.NullBindingJUnitTestCase.java. 

2. At "OneToOneVirtualAttributeHolder holder = createEntityManager().find(OneToOneVirtualAttributeHolder.class, id);" in testReadVirtualAttribute() in org.eclipse.persistence.testing.tests.jpa.relationships.VirtualAttributeTestSuite.java

But it doesn't hung up:

1. At "Customer newOwen = em.find(Customer.class, owenId);" in selfReferencingManyToManyCreateTest() in org.eclipse.persistence.testing.tests.jpa.relationships.UniAndBiDirectionalMappingTestSuite.java

Stacktrace in server log:

[1/5/09 9:25:53:531 EST] 00000004 TimeoutManage I   WTRN0006W: Transaction 0000011EA72DA6AB00000002000000029D3259B34B45596246888242073244077CB75B960000011EA72DA6AB00000002000000029D3259B34B45596246888242073244077CB75B9600000001 has timed out after 120 seconds.
[1/5/09 9:25:53:531 EST] 00000004 TimeoutManage I   WTRN0124I: When the timeout occurred the thread with which the transaction is, or was most recently, associated was Thread[ORB.thread.pool : 0,5,main]. The stack trace of this thread when the timeout occurred was: 
	java.lang.Object.wait(Native Method)
	java.lang.Object.wait(Object.java:196)
	org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent(UnitOfWorkIdentityMapAccessor.java:146)
	org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getFromIdentityMap(UnitOfWorkIdentityMapAccessor.java:110)
	org.eclipse.persistence.internal.queries.ExpressionQueryMechanism.checkCacheForObject(ExpressionQueryMechanism.java:730)
	org.eclipse.persistence.queries.ReadObjectQuery.checkEarlyReturnImpl(ReadObjectQuery.java:230)
	org.eclipse.persistence.queries.ObjectLevelReadQuery.checkEarlyReturn(ObjectLevelReadQuery.java:628)
	org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:599)
	org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:892)
	org.eclipse.persistence.queries.ReadObjectQuery.execute(ReadObjectQuery.java:398)
	org.eclipse.persistence.queries.ObjectLevelReadQuery.executeInUnitOfWork(ObjectLevelReadQuery.java:956)
	org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalExecuteQuery(UnitOfWorkImpl.java:2651)
	org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1181)
	org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1165)
	org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1125)
	org.eclipse.persistence.internal.jpa.EntityManagerImpl.executeQuery(EntityManagerImpl.java:562)
	org.eclipse.persistence.internal.jpa.EntityManagerImpl.findInternal(EntityManagerImpl.java:514)
	org.eclipse.persistence.internal.jpa.EntityManagerImpl.find(EntityManagerImpl.java:429)
	org.eclipse.persistence.internal.jpa.EntityManagerImpl.find(EntityManagerImpl.java:337)
	com.ibm.ws.jpa.management.JPATxEmInvocation.find(JPATxEmInvocation.java:211)
	com.ibm.ws.jpa.management.JPAEntityManager.find(JPAEntityManager.java:175)
	org.eclipse.persistence.testing.tests.jpa.datetime.NullBindingJUnitTestCase.testNullifySqlDate(NullBindingJUnitTestCase.java:101)
	sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:45)
	sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
	java.lang.reflect.Method.invoke(Method.java:599)
	junit.framework.TestCase.runTest(TestCase.java:168)
	junit.framework.TestCase.runBare(TestCase.java:134)
	org.eclipse.persistence.testing.framework.junit.JUnitTestCase.runBareServer(JUnitTestCase.java:397)
	org.eclipse.persistence.testing.framework.server.TestRunnerBean.runTest(TestRunnerBean.java:87)
	org.eclipse.persistence.testing.framework.server.EJSRemote0SLTestRunner_ed643271.runTest(EJSRemote0SLTestRunner_ed643271.java)
	org.eclipse.persistence.testing.framework.server._EJSRemote0SLTestRunner_ed643271_Tie.runTest(_EJSRemote0SLTestRunner_ed643271_Tie.java)
	org.eclipse.persistence.testing.framework.server._EJSRemote0SLTestRunner_ed643271_Tie._invoke(_EJSRemote0SLTestRunner_ed643271_Tie.java)
	com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:622)
	com.ibm.CORBA.iiop.ServerDelegate.dispatch(ServerDelegate.java:475)
	com.ibm.rmi.iiop.ORB.process(ORB.java:504)
	com.ibm.CORBA.iiop.ORB.process(ORB.java:1571)
	com.ibm.rmi.iiop.Connection.respondTo(Connection.java:2771)
	com.ibm.rmi.iiop.Connection.doWork(Connection.java:2640)
	com.ibm.rmi.iiop.WorkUnitImpl.doWork(WorkUnitImpl.java:63)
	com.ibm.ejs.oa.pool.PooledThread.run(ThreadPool.java:118)
	com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1527)
Comment 1 Yiping Zhao CLA 2009-01-06 14:26:47 EST
Since the bug stops me running all applicable tests on WebSphere, so I change the pripority from 3 to 2.
Comment 2 Yiping Zhao CLA 2009-01-06 15:22:52 EST
change the severity and pripority according to Tom's email.
Comment 3 Michael OBrien CLA 2009-01-13 10:30:42 EST
Michael O'Brien starting reproduction and fix on 20090113
Comment 4 Michael OBrien CLA 2009-01-15 17:29:20 EST
1) Reproduced the "hanging" behavior.  The test suite is still stuck after 60 min.  Instead of figuring out how to run WebSphere in debug mode....

java -Xdebug -Xnoagent -Djava.compiler=none -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005 -jar

....so I can attach to a debug port remotely from Eclipse, I will transition my WebLogic EAR to run this behavior from Eclipse so I can debug with/without the WAS 6.1 server plugin.

Also, the IBM Jconsole implementation is unable to attach the the WAS process as I normally do with WebLogic.

2) We also need another shared library created for jpa\plugins\javax.persistence_2.0_preview.jar in the same manner as the one created for eclipselink.jar below

configWebSphere.py
library = AdminConfig.create('Library', n1, [['name', 'eclipselinkjpa'],
['classPath', '%%eclipselinkLibDir%%/eclipselink.jar']])

3) I entered a separate bug so that we document the process of getting the tests running on WebSphere on Wiki.
http://bugs.eclipse.org/261235

websphere.properties
test.properties
jndi.properties
configWebSphere.py

The last logs I see in WebSphere is the expected JDBC table creates...

[1/15/09 16:23:49:009 EST] 00000021 SystemOut     O [EL Warning]: 2009.01.15 16:23:49.009--ServerSession(1754556564)--Exception [EclipseLink-4002] (Eclipse Persistence Services - 1.1.0 (Build 20090113)): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: ORA-00955: name is already used by an existing object
Error Code: 955
Call: CREATE TABLE CMP3_ENTITYD_SEQ (SEQ_NAME VARCHAR2(50) NOT NULL, SEQ_COUNT NUMBER(38) NULL, PRIMARY KEY (SEQ_NAME))
Query: DataModifyQuery(sql="CREATE TABLE CMP3_ENTITYD_SEQ (SEQ_NAME VARCHAR2(50) NOT NULL, SEQ_COUNT NUMBER(38) NULL, PRIMARY KEY (SEQ_NAME))")
Comment 5 Michael OBrien CLA 2009-01-15 17:48:46 EST
I restarted to force finalize() and this is when the timeout message appeared.
A little different message than "TimeoutManage" but in the same place in UnitOfWorkIdentityMapAccessor.java:146

It looks like we are experiencing a deadlock in cacheKey.getMutex().wait(5) // milliseconds

protected Object getAndCloneCacheKeyFromParent(Vector primaryKey, Class theClass, boolean shouldReturnInvalidatedObjects, ClassDescriptor descriptor) {
...
        if ((cacheKey != null) && (shouldReturnInvalidatedObjects || !descriptor.getCacheInvalidationPolicy().isInvalidated(cacheKey))) {
            synchronized (cacheKey.getMutex()) {
                //if the object in the cachekey is null but the key is acquired then
                //someone must be rebuilding it or creating a new one.  Sleep until
                // it's finished. A plain wait here would be more efficient but we may not
                // get notified for quite some time (ie deadlock) if the other thread
                //is building the object.  Must wait and not sleep in order for the monitor to be released
                objectFromCache = cacheKey.getObject();
                try {
                    while (cacheKey.isAcquired() && (objectFromCache == null)) {
146---->                 cacheKey.getMutex().wait(5);
                    }
                } catch (InterruptedException ex) {
                }
                if (objectFromCache == null) {
                    return null;
                }
            }
        } else {
            return null;
        }



------stacktrace-----------
Error Code: 955
Call: CREATE TABLE CMP3_ENTITYD_SEQ (SEQ_NAME VARCHAR2(50) NOT NULL, SEQ_COUNT NUMBER(38) NULL, PRIMARY KEY (SEQ_NAME))
Query: DataModifyQuery(sql="CREATE TABLE CMP3_ENTITYD_SEQ (SEQ_NAME VARCHAR2(50) NOT NULL, SEQ_COUNT NUMBER(38) NULL, PRIMARY KEY (SEQ_NAME))")
[1/15/09 16:36:15:781 EST] 00000016 ThreadMonitor W   WSVR0605W: Thread "ORB.thread.pool : 0" (00000021) has been active for 746725 milliseconds and may be hung.  There is/are 1 thread(s) in total in the server that may be hung.
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:196)
    at org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent(UnitOfWorkIdentityMapAccessor.java:146)
    at org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getFromIdentityMap(UnitOfWorkIdentityMapAccessor.java:110)
    at org.eclipse.persistence.internal.queries.ExpressionQueryMechanism.checkCacheForObject(ExpressionQueryMechanism.java:730)
    at org.eclipse.persistence.queries.ReadObjectQuery.checkEarlyReturnImpl(ReadObjectQuery.java:229)
    at org.eclipse.persistence.queries.ObjectLevelReadQuery.checkEarlyReturn(ObjectLevelReadQuery.java:627)
    at org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:613)
    at org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:891)
    at org.eclipse.persistence.queries.ReadObjectQuery.execute(ReadObjectQuery.java:397)
    at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeInUnitOfWork(ObjectLevelReadQuery.java:954)
    at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalExecuteQuery(UnitOfWorkImpl.java:2651)
    at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1181)
    at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1165)
    at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1125)
    at org.eclipse.persistence.internal.jpa.EntityManagerImpl.executeQuery(EntityManagerImpl.java:571)
    at org.eclipse.persistence.internal.jpa.EntityManagerImpl.findInternal(EntityManagerImpl.java:516)
    at org.eclipse.persistence.internal.jpa.EntityManagerImpl.find(EntityManagerImpl.java:431)
    at org.eclipse.persistence.internal.jpa.EntityManagerImpl.find(EntityManagerImpl.java:339)
    at com.ibm.ws.jpa.management.JPATxEmInvocation.find(JPATxEmInvocation.java:211)
    at com.ibm.ws.jpa.management.JPAEntityManager.find(JPAEntityManager.java:175)
    at org.eclipse.persistence.testing.tests.jpa.relationships.VirtualAttributeTestSuite.testReadVirtualAttribute(VirtualAttributeTestSuite.java:76)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:45)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
    at java.lang.reflect.Method.invoke(Method.java:599)
    at junit.framework.TestCase.runTest(TestCase.java:154)
    at junit.framework.TestCase.runBare(TestCase.java:127)
    at org.eclipse.persistence.testing.framework.junit.JUnitTestCase.runBareServer(JUnitTestCase.java:391)
    at org.eclipse.persistence.testing.framework.server.TestRunnerBean.runTest(TestRunnerBean.java:87)
    at org.eclipse.persistence.testing.framework.server.EJSRemote0SLTestRunner_ed643271.runTest(EJSRemote0SLTestRunner_ed643271.java)
    at org.eclipse.persistence.testing.framework.server._EJSRemote0SLTestRunner_ed643271_Tie.runTest(_EJSRemote0SLTestRunner_ed643271_Tie.java)
    at org.eclipse.persistence.testing.framework.server._EJSRemote0SLTestRunner_ed643271_Tie._invoke(_EJSRemote0SLTestRunner_ed643271_Tie.java)
    at com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:622)
    at com.ibm.CORBA.iiop.ServerDelegate.dispatch(ServerDelegate.java:475)
    at com.ibm.rmi.iiop.ORB.process(ORB.java:504)
    at com.ibm.CORBA.iiop.ORB.process(ORB.java:1571)
    at com.ibm.rmi.iiop.Connection.respondTo(Connection.java:2771)
    at com.ibm.rmi.iiop.Connection.doWork(Connection.java:2640)
    at com.ibm.rmi.iiop.WorkUnitImpl.doWork(WorkUnitImpl.java:63)
    at com.ibm.ejs.oa.pool.PooledThread.run(ThreadPool.java:118)
    at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1527)
[1/15/09 17:36:22:877 EST] 00000019 RoleBasedAuth A   SECJ0305I: The role-based authorization check failed for admin-authz operation Server:getProcessType.  The user UNAUTHENTICATED (unique ID: unauthenticated) was not granted any of the following required roles: deployer, operator, configurator, monitor, administrator, adminsecuritymanager, auditor.
Comment 6 Michael OBrien CLA 2009-01-16 09:45:00 EST
Just out of curiosity I left the treads running on the server - they never returned in 11 hours. 

[1/15/09 17:48:15:936 EST] 0000000e ThreadMonitor W   WSVR0605W: Thread "ORB.thread.pool : 1" (00000022) has been active for 670586 milliseconds and may be hung.  There is/are 2 thread(s) in total in the server that may be hung.

Q) Do we see this in WebSphere 6.1?
Comment 7 Michael OBrien CLA 2009-01-19 23:01:36 EST
The alt #1 workaround allows the tests to continue and emit the base server exception below:

    [junit]     org.omg.CORBA.MARSHAL: Unable to read value from underlying bridge : Mismatched serialization UIDs : Source (RepId RMI:junit.framework.AssertionFailedError:2E921E9DEB124A7F:0000000000000001) = 0000000000000001 whereas Target (RepId RMI:junit.framework.AssertionFailedError:048A3675A6FD5463:A02206A4C2A2FD62) = A02206A4C2A2FD62  vmcid: IBM  minor code: 896  completed: No)


Also, there are (2) two nested deadlocks in our code.
The second deadlock occurs periodically after the first has been fixed during testModifyItem() in ConcurrencyManager.releasesDeferredLock()

1/19/09 16:17:33:719 EST] 00000019 TimeoutManage I   WTRN0006W: Transaction 0000011EF0BF93D7000000010000000F3FC544039A0DB96F61B909BB432079E1295E672A0000011EF0BF93D7000000010000000F3FC544039A0DB96F61B909BB432079E1295E672A00000001 has timed out after 120 seconds.
[1/19/09 16:17:33:719 EST] 00000019 TimeoutManage I   WTRN0124I: When the timeout occurred the thread with which the transaction is, or was most recently, associated was Thread[ORB.thread.pool : 0,5,main]. The stack trace of this thread when the timeout occurred was: 
	java.lang.Thread.sleep(Native Method)
	java.lang.Thread.sleep(Thread.java:850)
	org.eclipse.persistence.internal.helper.ConcurrencyManager.releaseDeferredLock(ConcurrencyManager.java:429)
	org.eclipse.persistence.internal.helper.ConcurrencyManager.checkDeferredLock(ConcurrencyManager.java:193)
	org.eclipse.persistence.internal.identitymaps.CacheKey.checkDeferredLock(CacheKey.java:176)
	org.eclipse.persistence.internal.identitymaps.IdentityMapManager.getFromIdentityMapWithDeferredLock(IdentityMapManager.java:686)
	org.eclipse.persistence.internal.sessions.IdentityMapAccessor.getFromIdentityMapWithDeferredLock(IdentityMapAccessor.java:424)
	org.eclipse.persistence.internal.queries.ExpressionQueryMechanism.checkCacheForObject(ExpressionQueryMechanism.java:728)
	org.eclipse.persistence.queries.ReadObjectQuery.checkEarlyReturnImpl(ReadObjectQuery.java:229)
	org.eclipse.persistence.queries.ObjectLevelReadQuery.checkEarlyReturn(ObjectLevelReadQuery.java:620)
	org.eclipse.persistence.queries.ObjectLevelReadQuery.checkEarlyReturn(ObjectLevelReadQuery.java:644)
	org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:613)
	org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:891)
	org.eclipse.persistence.queries.ReadObjectQuery.execute(ReadObjectQuery.java:397)
	org.eclipse.persistence.queries.ObjectLevelReadQuery.executeInUnitOfWork(ObjectLevelReadQuery.java:954)
	org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalExecuteQuery(UnitOfWorkImpl.java:2651)
	org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1181)
	org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1165)
	org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1125)
	org.eclipse.persistence.internal.jpa.EntityManagerImpl.executeQuery(EntityManagerImpl.java:571)
	org.eclipse.persistence.internal.jpa.EntityManagerImpl.findInternal(EntityManagerImpl.java:516)
	org.eclipse.persistence.internal.jpa.EntityManagerImpl.find(EntityManagerImpl.java:431)
	org.eclipse.persistence.internal.jpa.EntityManagerImpl.find(EntityManagerImpl.java:339)
	com.ibm.ws.jpa.management.JPATxEmInvocation.find(JPATxEmInvocation.java:211)
	com.ibm.ws.jpa.management.JPAEntityManager.find(JPAEntityManager.java:175)
	org.eclipse.persistence.testing.tests.jpa.relationships.RelationshipModelJUnitTestSuite.testModifyItem(RelationshipModelJUnitTestSuite.java:119)
	sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:45)
	sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
	java.lang.reflect.Method.invoke(Method.java:599)
	junit.framework.TestCase.runTest(TestCase.java:154)
	junit.framework.TestCase.runBare(TestCase.java:127)
	org.eclipse.persistence.testing.framework.junit.JUnitTestCase.runBareServer(JUnitTestCase.java:391)
	org.eclipse.persistence.testing.framework.server.TestRunnerBean.runTest(TestRunnerBean.java:87)
	org.eclipse.persistence.testing.framework.server.EJSRemote0SLTestRunner_ed643271.runTest(EJSRemote0SLTestRunner_ed643271.java)
	org.eclipse.persistence.testing.framework.server._EJSRemote0SLTestRunner_ed643271_Tie.runTest(_EJSRemote0SLTestRunner_ed643271_Tie.java)
	org.eclipse.persistence.testing.framework.server._EJSRemote0SLTestRunner_ed643271_Tie._invoke(_EJSRemote0SLTestRunner_ed643271_Tie.java)
	com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:622)
 

        while (true) {
            // 2612538 - the default size of Map (32) is appropriate
            Map recursiveSet = new IdentityHashMap();
            if (isBuildObjectOnThreadComplete(currentThread, recursiveSet)) {// Thread job done.
                lockManager.releaseActiveLocksOnThread();
                removeDeferredLockManager(currentThread);
                AbstractSessionLog.getLog().log(SessionLog.FINER, "deferred_locks_released", currentThread.getName());
                return;
            } else {// Not done yet, wait and check again.
                try {
---->                Thread.sleep(1);
                } catch (InterruptedException ignoreAndContinue) {
                }
            }
        }
Comment 8 Michael OBrien CLA 2009-01-21 09:00:12 EST
How to remote debug EclipseLink on WebSphere 7 using Eclipse (without an Eclipse Server Plugin for WebSphere)

http://wiki.eclipse.org/EclipseLink/Examples/JPA/WebSphere7_Web_Tutorial#Remote_Debugging_EclipseLink_in_WebSphere_7.0_from_Eclipse_3.4

Comment 9 Michael OBrien CLA 2009-01-21 09:30:23 EST
Created attachment 123234 [details]
Alt#1 patch (for discussion and unblocking only) until Alt#3 is worked out

This patch is for internal discussion only and will not be checked in until a full fix addressing the deadlock issue is discussed in alt #3

http://wiki.eclipse.org/EclipseLink/Bugs/259993#Alternative_1:_Short_Fix:_Do_Not_Wait_Indefinitely
Comment 10 Michael OBrien CLA 2009-01-22 09:27:59 EST
This issue stays as Target Milestone=1.1 until we determine the severity of the underlying issue - starting with this section.

http://wiki.eclipse.org/EclipseLink/Bugs/259993#The_activeThread_on_release.28.29_is_different_than_the_one_on_acquire.28.29
Comment 11 Michael OBrien CLA 2009-01-22 14:06:31 EST
As of an update from 3202 to 3266 I am unable to reproduce the deadlock in this bug anymore.  Both breakpoints are no longer hit.  If I debug the acquire() and release() code - it works fine now.

The change for bug# 261910 affected the same areas around readObjectQuiery() that were causing a deadlock that this bug# 259993 was experiencing.

It looks like rev 3266 (20090122:0934) may have fixed this deadlock that occurs on a readQuery inside an em.find() after a recent persist/merge of the entity being referenced.
I am unable to reproduce the deadlock that I could debug just before an SVN update to 3266 (I was previously at rev 3202) 

>20090122:1048: merging with deadlock during readlock during merge 261910
http://fisheye2.atlassian.com/changelog/eclipselink/?cs=3266

I am retesting all configurations, but the current configuration that works is
- compile test and eclipselink.jar with the IBM J9 jvm in WebSphere 7
- run server in debug mode
- run tests with/without eclipse debugger attached

Previously before 3266 I would hit the following breakpoint in an infinite loop until I modified depth --> 0 in the Eclipse debugger.

                  while (cacheKey.isAcquired() && (objectFromCache == null)) {
                        cacheKey.getMutex().wait(MUTEX_STILL_ACQUIRED_WAIT_MILLISECONDS_TIME);
                    }


	mutex	ConcurrencyManager  (id=287)	
		activeThread	ThreadPool$Worker  (id=133)	
		depth	1	
		lockedByMergeManager	true	
		numberOfReaders	0	
		numberOfWritersWaiting	0	
		ownerCacheKey	HardCacheWeakIdentityMap$ReferenceCacheKey  (id=288)	


-------------test results---------------------
>prereq:
   - I installed as a service - but it is still scriptable with server-start/stop
   - I installed with no admin security (no more popups)
   - I went into the admin console and selected the debug flag - the server runs a bit slower and can be attached to by a debug client
   - also I compiled the test source and eclipselink.jar using the J9 jvm from websphere
     I will try with regular SUN JDK 1.6.0_07 compiled code


>With Eclipse Debugger attached - using F8 step through 

server-run-test:
     [echo] Running test eclipselink-relationships-model on websphere
Warning: Reference compile.server.path has not been set at runtime, but was found during
build file parsing, attempting to resolve. Future versions of Ant may support
 referencing ids defined in non-executed targets.
Warning: Reference compile.path has not been set at runtime, but was found during
build file parsing, attempting to resolve. Future versions of Ant may support
 referencing ids defined in non-executed targets.
    [junit] Running org.eclipse.persistence.testing.tests.jpa.relationships.ServerTestSuite
    [junit]     org.omg.CORBA.MARSHAL: Unable to read value from underlying bridge : Mismatched serialization UIDs : Sou
rce (RepId RMI:junit.framework.AssertionFailedError:2E921E9DEB124A7F:0000000000000001) = 0000000000000001 whereas Target
 (RepId RMI:junit.framework.AssertionFailedError:048A3675A6FD5463:A02206A4C2A2FD62) = A02206A4C2A2FD62  vmcid: IBM  mino
r code: 896  completed: No)
    [junit]     org.omg.CORBA.MARSHAL: Unable to read value from underlying bridge : Mismatched serialization UIDs : Sou
rce (RepId RMI:junit.framework.AssertionFailedError:2E921E9DEB124A7F:0000000000000001) = 0000000000000001 whereas Target
 (RepId RMI:junit.framework.AssertionFailedError:048A3675A6FD5463:A02206A4C2A2FD62) = A02206A4C2A2FD62  vmcid: IBM  mino
r code: 896  completed: No)
    [junit] Tests run: 13, Failures: 0, Errors: 2, Time elapsed: 153.484 sec
    [junit] Test org.eclipse.persistence.testing.tests.jpa.relationships.ServerTestSuite FAILED

Total time: 6 minutes 17 seconds


>With no debugger attached - only WebSphere server running in debug mode
    [junit] Tests run: 13, Failures: 0, Errors: 2, Time elapsed: 18.095 sec

>stop server
>restart server

>now I hang again on the following (I did a build but I am turning off auto-compilation in eclipse and rebuilding)
[1/22/09 12:06:30:643 EST] 00000019 SystemOut     O [EL Finest]: 2009.01.22 12:06:30.643--UnitOfWork(970209748)--Thread(Thread[ORB.thread.pool : 0,5,main])--Execute query ReadObjectQuery(referenceClass=OneToOneVirtualAttributeHolder sql="SELECT O_O_VIRTUALID, VIRTUAL_ID FROM O_O_VIRTUAL WHERE (O_O_VIRTUALID = ?)")

>attach debugger - no break points hit

>rerun test with debugger attached - working

>rerun test with debugger attached but all breakpoints disabled except at the 2 deadlock while loops - no bp hit - working

>rerun test with debugger above 2nd time to verify theory of every second run hangs - working
    [junit] Tests run: 13, Failures: 0, Errors: 2, Time elapsed: 18.673 sec

>rerun test with debugger detached again - working
    [junit] Tests run: 13, Failures: 0, Errors: 2, Time elapsed: 18.283 sec

>stop server
>restart server
>rerun test without debugger detached - working
    [junit] Tests run: 13, Failures: 0, Errors: 2, Time elapsed: 18.174 sec
>rerun test without debugger detached - working
    [junit] Tests run: 13, Failures: 0, Errors: 2, Time elapsed: 13.9 sec

>stop server
>restart server
>rerun test without debugger detached - working
    [junit] Tests run: 13, Failures: 0, Errors: 2, Time elapsed: 17.102 sec

>recompile test/eclipselink.jar using the SUN JDK 1.6.0_07 instaed of the IBM J9 1.6.0 JVM
- pending

I have asked Yiping to reverify with 3266.





Comment 12 Michael OBrien CLA 2009-01-22 15:12:38 EST
Correction I was previously failing with svn rev 3220 (16 Jan 2009), 3266 is working fine.
>stop server
>recompile test/eclipselink.jar using the SUN JDK 1.6.0_07 instead of the IBM J9 1.6.0 JVM (we still run the server on J9 though)
>restart server
>rerun test (server in debug mode) without debugger detached - working
    [junit] Tests run: 13, Failures: 0, Errors: 2, Time elapsed: 28.303 sec
>stop server
>We should reverify that we are failing with the current test code using an old eclipselink.jar
take old jar from 3220 (14 Jan 2009) and retest - expect to hang with a deadlock
>restart server
>rerun test (server in debug mode) without debugger detached - hanging = OK ---> as expected on an old 3220 label
    [junit] Running org.eclipse.persistence.testing.tests.jpa.relationships.ServerTestSuite
Terminate batch job (Y/N)? y

>stop server
>restart server to recover deadlocked threads
>stop server
Comment 13 Michael OBrien CLA 2009-01-22 15:47:44 EST
After Yiping tested - it is verified that we no longer get the first deadlock 
in 
http://wiki.eclipse.org/EclipseLink/Bugs/259993#Timeout_during_UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent.28.29

but we still get the 2nd deadlock (that i intermittently saw only once) in 

http://wiki.eclipse.org/EclipseLink/Bugs/259993#Timeout_during_ConcurrencyManager.releaseDeferredLock.28.29

when running 
ant server-test-fieldaccess-advanced


I see the following new finer log message from 3226 at the end which allows us to continue and loop in ConcurrencyManager

[1/22/09 15:35:44:016 EST] 0000001e SystemOut     O [EL Finer]: 2009.01.22 15:35:44.016--UnitOfWork(1534811003)--Thread(Thread[ORB.thread.pool : 0,5,main])--begin unit of work commit
[1/22/09 15:35:44:031 EST] 0000001e SystemOut     O [EL Finer]: 2009.01.22 15:35:44.031--UnitOfWork(1534811003)--Thread(Thread[ORB.thread.pool : 0,5,main])--initialize identitymaps
[1/22/09 15:35:44:031 EST] 0000001e SystemOut     O [EL Finer]: 2009.01.22 15:35:44.031--UnitOfWork(1534811003)--Thread(Thread[ORB.thread.pool : 0,5,main])--TX afterCompletion callback, status=COMMITTED
[1/22/09 15:35:44:031 EST] 0000001e SystemOut     O [EL Finer]: 2009.01.22 15:35:44.031--ServerSession(282857692)--Thread(Thread[ORB.thread.pool : 0,5,main])--client acquired
[1/22/09 15:35:44:031 EST] 0000001e SystemOut     O [EL Finest]: 2009.01.22 15:35:44.031--UnitOfWork(259264372)--Thread(Thread[ORB.thread.pool : 0,5,main])--Execute query ReadAllQuery(referenceClass=Employee sql="SELECT t1.EMP_ID, t2.EMP_ID, t1.GENDER, t1.L_NAME, t1.PAY_SCALE, t1.ROOM_NUM, t2.SALARY, t1.STATUS, t1.VERSION, t1.F_NAME, t1.START_DATE, t1.END_DATE, t1.FORMER_COMPANY, t1.FORMER_START_DATE, t1.FORMER_END_DATE, t1.DEPT_ID, t1.ADDR_ID, t1.MANAGER_EMP_ID, t0.ID, t0.NAME FROM CMP3_FA_DEPT t0, CMP3_FA_SALARY t2, CMP3_FA_EMPLOYEE t1 WHERE ((t2.EMP_ID = t1.EMP_ID) AND (t0.ID (+) = t1.DEPT_ID))")
[1/22/09 15:35:44:031 EST] 0000001e SystemOut     O [EL Finest]: 2009.01.22 15:35:44.031--ServerSession(282857692)--Thread(Thread[ORB.thread.pool : 0,5,main])--reconnecting to external connection pool
[1/22/09 15:35:44:031 EST] 0000001e SystemOut     O [EL Fine]: 2009.01.22 15:35:44.031--ServerSession(282857692)--Connection(1384141440)--Thread(Thread[ORB.thread.pool : 0,5,main])--SELECT t1.EMP_ID, t2.EMP_ID, t1.GENDER, t1.L_NAME, t1.PAY_SCALE, t1.ROOM_NUM, t2.SALARY, t1.STATUS, t1.VERSION, t1.F_NAME, t1.START_DATE, t1.END_DATE, t1.FORMER_COMPANY, t1.FORMER_START_DATE, t1.FORMER_END_DATE, t1.DEPT_ID, t1.ADDR_ID, t1.MANAGER_EMP_ID, t0.ID, t0.NAME FROM CMP3_FA_DEPT t0, CMP3_FA_SALARY t2, CMP3_FA_EMPLOYEE t1 WHERE ((t2.EMP_ID = t1.EMP_ID) AND (t0.ID (+) = t1.DEPT_ID))
[1/22/09 15:35:44:047 EST] 0000001e SystemOut     O [EL Finest]: 2009.01.22 15:35:44.047--UnitOfWork(259264372)--Thread(Thread[ORB.thread.pool : 0,5,main])--Register the existing object Employee: 3834
--->[1/22/09 15:35:44:047 EST] 0000001e SystemOut     O [EL Finer]: 2009.01.22 15:35:44.047--Thread(Thread[ORB.thread.pool : 0,5,main])--Thread "ORB.thread.pool : 0" has acquired a deferred lock on object : null in order to avoid deadlock.


server-run-test:
     [echo] Running test eclipselink-advanced-field-access-model on websphere
Warning: Reference compile.server.path has not been set at runtime, but was found during
build file parsing, attempting to resolve. Future versions of Ant may support
 referencing ids defined in non-executed targets.
Warning: Reference compile.path has not been set at runtime, but was found during
build file parsing, attempting to resolve. Future versions of Ant may support
 referencing ids defined in non-executed targets.
    [junit] Running org.eclipse.persistence.testing.tests.jpa.fieldaccess.advanced.ServerTestSuite
    [junit]     org.omg.CORBA.MARSHAL: Unable to read value from underlying bridge : Mismatched serialization UIDs : Sou
rce (RepId RMI:junit.framework.AssertionFailedError:2E921E9DEB124A7F:0000000000000001) = 0000000000000001 whereas Target
 (RepId RMI:junit.framework.AssertionFailedError:048A3675A6FD5463:A02206A4C2A2FD62) = A02206A4C2A2FD62  vmcid: IBM  mino
r code: 896  completed: No)
    [junit]     org.omg.CORBA.MARSHAL: Unable to read value from underlying bridge : Mismatched serialization UIDs : Sou
rce (RepId RMI:junit.framework.AssertionFailedError:2E921E9DEB124A7F:0000000000000001) = 0000000000000001 whereas Target
 (RepId RMI:junit.framework.AssertionFailedError:048A3675A6FD5463:A02206A4C2A2FD62) = A02206A4C2A2FD62  vmcid: IBM  mino
r code: 896  completed: No)
    [junit]     org.omg.CORBA.MARSHAL: Unable to read value from underlying bridge : Mismatched serialization UIDs : Sou
rce (RepId RMI:junit.framework.AssertionFailedError:2E921E9DEB124A7F:0000000000000001) = 0000000000000001 whereas Target
 (RepId RMI:junit.framework.AssertionFailedError:048A3675A6FD5463:A02206A4C2A2FD62) = A02206A4C2A2FD62  vmcid: IBM  mino
r code: 896  completed: No)
    [junit]     org.omg.CORBA.MARSHAL: Unable to read value from underlying bridge : Mismatched serialization UIDs : Sou
rce (RepId RMI:junit.framework.AssertionFailedError:2E921E9DEB124A7F:0000000000000001) = 0000000000000001 whereas Target
 (RepId RMI:junit.framework.AssertionFailedError:048A3675A6FD5463:A02206A4C2A2FD62) = A02206A4C2A2FD62  vmcid: IBM  mino
r code: 896  completed: No)
    [junit]     >> SERVER (id=4773e3aa, host=mobrien-pc2.st-mdc.ca.oracle.com) TRACE START:
    [junit]     >>    javax.ejb.EJBTransactionRolledbackException
    [junit]     >>       at com.ibm.ejs.csi.BeanManaged.postInvoke(BeanManaged.java:329)
    [junit]     >>       at com.ibm.ejs.csi.TransactionControlImpl.postInvoke(TransactionControlImpl.java:561)
    [junit]     >>       at com.ibm.ejs.container.EJSContainer.postInvoke(EJSContainer.java:4512)
    [junit]     >>       at org.eclipse.persistence.testing.framework.server.EJSRemote0SLTestRunner_ed643271.runTest(EJS
Remote0SLTestRunner_ed643271.java)
    [junit]     >>       at org.eclipse.persistence.testing.framework.server._EJSRemote0SLTestRunner_ed643271_Tie.runTes
t(_EJSRemote0SLTestRunner_ed643271_Tie.java)
    [junit]     >>       at org.eclipse.persistence.testing.framework.server._EJSRemote0SLTestRunner_ed643271_Tie._invok
e(_EJSRemote0SLTestRunner_ed643271_Tie.java)
    [junit]     >>       at com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:622)
    [junit]     >>       at com.ibm.CORBA.iiop.ServerDelegate.dispatch(ServerDelegate.java:475)
    [junit]     >>       at com.ibm.rmi.iiop.ORB.process(ORB.java:504)
    [junit]     >>       at com.ibm.CORBA.iiop.ORB.process(ORB.java:1571)
    [junit]     >>       at com.ibm.rmi.iiop.Connection.respondTo(Connection.java:2771)
    [junit]     >>       at com.ibm.rmi.iiop.Connection.doWork(Connection.java:2640)
    [junit]     >>       at com.ibm.rmi.iiop.WorkUnitImpl.doWork(WorkUnitImpl.java:63)
    [junit]     >>       at com.ibm.ejs.oa.pool.PooledThread.run(ThreadPool.java:118)
    [junit]     >>       at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1527)
    [junit]     >> SERVER (id=4773e3aa, host=mobrien-pc2.st-mdc.ca.oracle.com) TRACE END.
    [junit] )
    [junit]     org.omg.CORBA.MARSHAL: Unable to read value from underlying bridge : Mismatched serialization UIDs : Sou
rce (RepId RMI:junit.framework.AssertionFailedError:2E921E9DEB124A7F:0000000000000001) = 0000000000000001 whereas Target
 (RepId RMI:junit.framework.AssertionFailedError:048A3675A6FD5463:A02206A4C2A2FD62) = A02206A4C2A2FD62  vmcid: IBM  mino
r code: 896  completed: No)
    [junit]     org.omg.CORBA.MARSHAL: Unable to read value from underlying bridge : Mismatched serialization UIDs : Sou
rce (RepId RMI:junit.framework.AssertionFailedError:2E921E9DEB124A7F:0000000000000001) = 0000000000000001 whereas Target
 (RepId RMI:junit.framework.AssertionFailedError:048A3675A6FD5463:A02206A4C2A2FD62) = A02206A4C2A2FD62  vmcid: IBM  mino
r code: 896  completed: No)
    [junit] Exception Description: Cannot persist detached object [Employee: 3828].
    [junit] Class> org.eclipse.persistence.testing.models.jpa.fieldaccess.advanced.Employee Primary Key> [3828])
-->deadlock here



Summarizing: For some models that failed on both deadlocks - they fully pass, some other models only fail on the 2nd deadlock - so we have progress

Before 3226
-----------
Hangs
ant server-test-relationships
ant server-test-datetime

?
ant server-test-fieldaccess-advanced

After 3226
-----------
Hangs
ant server-test-fieldaccess-advanced

OK
ant server-test-relationships
ant server-test-datetime
Comment 14 Michael OBrien CLA 2009-01-22 16:46:04 EST
>3rd time through internalTestUpdateAllProjects

Daemon Thread [ORB.thread.pool : 1] (Suspended (breakpoint at line 434 in ConcurrencyManager))	
	ConcurrencyManager.releaseDeferredLock() line: 434	
	HardCacheWeakIdentityMap$ReferenceCacheKey(CacheKey).releaseDeferredLock() line: 413	
	ObjectBuilder.buildObject(ObjectBuildingQuery, AbstractRecord, AbstractSession, Vector, ClassDescriptor, JoinedAttributeManager) line: 769	
	ObjectBuilder.buildWorkingCopyCloneNormally(ObjectBuildingQuery, AbstractRecord, UnitOfWorkImpl, Vector, ClassDescriptor, JoinedAttributeManager) line: 574	
	ObjectBuilder.buildObjectInUnitOfWork(ObjectBuildingQuery, JoinedAttributeManager, AbstractRecord, UnitOfWorkImpl, Vector, ClassDescriptor) line: 544	
	ObjectBuilder.buildObject(ObjectBuildingQuery, AbstractRecord, JoinedAttributeManager) line: 485	
	ObjectBuilder.buildObject(ObjectLevelReadQuery, AbstractRecord) line: 437	
	ReadAllQuery(ObjectLevelReadQuery).buildObject(AbstractRecord) line: 571	
	ReadAllQuery.registerResultInUnitOfWork(Object, UnitOfWorkImpl, AbstractRecord, boolean) line: 903	
	ReadAllQuery.executeObjectLevelReadQuery() line: 489	
	ReadAllQuery(ObjectLevelReadQuery).executeDatabaseQuery() line: 930	
	ReadAllQuery(DatabaseQuery).execute(AbstractSession, AbstractRecord) line: 664	
	ReadAllQuery(ObjectLevelReadQuery).execute(AbstractSession, AbstractRecord) line: 891	
	ReadAllQuery.execute(AbstractSession, AbstractRecord) line: 457	
	ReadAllQuery(ObjectLevelReadQuery).executeInUnitOfWork(UnitOfWorkImpl, AbstractRecord) line: 954	
	RepeatableWriteUnitOfWork(UnitOfWorkImpl).internalExecuteQuery(DatabaseQuery, AbstractRecord) line: 2664	
	RepeatableWriteUnitOfWork(AbstractSession).executeQuery(DatabaseQuery, AbstractRecord, int) line: 1181	
	RepeatableWriteUnitOfWork(AbstractSession).executeQuery(DatabaseQuery, AbstractRecord) line: 1165	
	RepeatableWriteUnitOfWork(AbstractSession).executeQuery(DatabaseQuery, List) line: 1139	
	EJBQueryImpl.executeReadQuery() line: 392	
	EJBQueryImpl.getResultList() line: 592	
	EntityManagerJUnitTestSuite.internalTestUpdateAllProjects(Class) line: 3742	
	EntityManagerJUnitTestSuite.testUpdateAllSmallProjects() line: 3720	
	NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]	
	NativeMethodAccessorImpl.invoke(Object, Object[]) line: 45	
	DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 37	
	Method.invoke(Object, Object...) line: 599	
	EntityManagerJUnitTestSuite(TestCase).runTest() line: 154	
	EntityManagerJUnitTestSuite(TestCase).runBare() line: 127	
	EntityManagerJUnitTestSuite(JUnitTestCase).runBareServer() line: 391	
	TestRunnerBean.runTest(String, String, Properties) line: 87	
	EJSRemote0SLTestRunner_ed643271.runTest(String, String, Properties) line: not available	
	_EJSRemote0SLTestRunner_ed643271_Tie.runTest(InputStream, ResponseHandler) line: not available	
	_EJSRemote0SLTestRunner_ed643271_Tie._invoke(String, InputStream, ResponseHandler) line: not available	
	WSExtendedServerDelegate(ServerDelegate).dispatchInvokeHandler(InterceptorManager, ServerRequestInfoImpl, String, Object, ServerRequestReader) line: 622	
	WSExtendedServerDelegate(ServerDelegate).dispatch(ServerRequest) line: 475	
	ORB(ORB).process(ServerRequest) line: 504	
	ORB.process(ServerRequest) line: 1571	
	Connection.respondTo(RequestHandler, IIOPReader) line: 2771	
	Connection.doWork(RequestHandler, IIOPReader) line: 2640	
	WorkUnitImpl.doWork() line: 63	
	PooledThread.run() line: 118	
	ThreadPool$Worker.run() line: 1527	


this	ConcurrencyManager  (id=169)	
	activeThread	ThreadPool$Worker  (id=96)	
	depth	1	
	lockedByMergeManager	false	
	numberOfReaders	0	
	numberOfWritersWaiting	0	
	ownerCacheKey	HardCacheWeakIdentityMap$ReferenceCacheKey  (id=170)	
recursiveSet	IdentityHashMap<K,V>  (id=185)	
currentThread	ThreadPool$Worker  (id=162)	
lockManager	DeferredLockManager  (id=187)	
	activeLocks	Vector<E>  (id=192)	
	deferredLocks	Vector<E>  (id=200)	
	isThreadComplete	true	
	threadDepth	1	
depth	1	
iterations	0	


[1/22/09 16:17:10:366 EST] 0000001b SystemOut     O [EL Finer]: 2009.01.22 16:17:10.366--Thread(Thread[ORB.thread.pool : 1,5,main])--Thread "ORB.thread.pool : 1" has acquired a deferred lock on object : org.eclipse.persistence.testing.models.jpa.fieldaccess.advanced.LargeProject@1da81da8 in order to avoid deadlock.


ConcurrencyManager 
public static boolean isBuildObjectOnThreadComplete(Thread thread, Map recursiveSet) {
                // the active thread may be set to null at anypoint
                // if added for CR 2330 
                if (activeThread != null) {
                    DeferredLockManager currentLockManager = getDeferredLockManager(activeThread);
                    if (currentLockManager == null) {
-->                     return false;



public void releaseDeferredLock() throws ConcurrencyException {
        while (true) {
            // 2612538 - the default size of Map (32) is appropriate
            Map recursiveSet = new IdentityHashMap();
            if (isBuildObjectOnThreadComplete(currentThread, recursiveSet)) {// Thread job done.
                lockManager.releaseActiveLocksOnThread();
                removeDeferredLockManager(currentThread);
                AbstractSessionLog.getLog().log(SessionLog.FINER, "deferred_locks_released", currentThread.getName());
                return;
            } else {// Not done yet, wait and check again.
                    try {
-->                     Thread.sleep(DEFERRED_LOCK_SLEEP_MILLISECONDS_TIME);
                        AbstractSessionLog.getLog().log(SessionLog.FINEST, "timeout_waiting_for_deferred_locks_release", iterations++ );//currentThread.getName());                        
                    } catch (InterruptedException ignoreAndContinue) {
                        ignoreAndContinue.printStackTrace();
                    }
            }
        }


Again, same issue here except that we are no longer lockedByMergeManager=true, but our depth is still > 0 so isAcquired() will return true.
If while we are debugging we set depth=0 - we exit the loop

we get
    [junit] Exception Description: Max number of attempts to lock object: org.eclipse.persistence.testing.models.jpa.fie
ldaccess.advanced.LargeProject@1da81da8 exceded.  Failed to clone the object.)

2) However late we get the original first deadlock on UnitOfWorkItentityMapAccessor for a getReference call (which is more restrictive than a find())
    public void testSetFieldForPropertyAccessWithRefresh() {
...
 
        employee = em.getReference(Employee.class, employee.getId());


We need the same type of fix that does not lock the mutex during a read in getReference()

mutex	ConcurrencyManager  (id=593)	
	activeThread	ThreadPool$Worker  (id=96)	
	depth	1	
	lockedByMergeManager	true	
	numberOfReaders	0	
	numberOfWritersWaiting	0	
	ownerCacheKey	HardCacheWeakIdentityMap$ReferenceCacheKey  (id=569)	


Daemon Thread [ORB.thread.pool : 0] (Suspended (breakpoint at line 154 in UnitOfWorkIdentityMapAccessor))	
	UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent(Vector, Class, boolean, ClassDescriptor) line: 154	
	UnitOfWorkIdentityMapAccessor.getFromIdentityMap(Vector, Class, boolean, ClassDescriptor, boolean) line: 116	
	UnitOfWorkIdentityMapAccessor(IdentityMapAccessor).getFromIdentityMap(Vector, Class, boolean) line: 327	
	UnitOfWorkIdentityMapAccessor(IdentityMapAccessor).getFromIdentityMap(Vector, Class) line: 310	
	RepeatableWriteUnitOfWork(UnitOfWorkImpl).getReference(Class, Object) line: 5375	
	EntityManagerImpl.getReference(Class<T>, Object) line: 973	
	JPATxEmInvocation.getReference(Class<T>, Object) line: 262	
	JPATxEntityManager(JPAEntityManager).getReference(Class<T>, Object) line: 188	
	EntityManagerJUnitTestSuite.testSetFieldForPropertyAccessWithRefresh() line: 3346	
	NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]	
	NativeMethodAccessorImpl.invoke(Object, Object[]) line: 45	
	DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 37	
	Method.invoke(Object, Object...) line: 599	
	EntityManagerJUnitTestSuite(TestCase).runTest() line: 154	
	EntityManagerJUnitTestSuite(TestCase).runBare() line: 127	
	EntityManagerJUnitTestSuite(JUnitTestCase).runBareServer() line: 391	
	TestRunnerBean.runTest(String, String, Properties) line: 87	
	EJSRemote0SLTestRunner_ed643271.runTest(String, String, Properties) line: not available	
	_EJSRemote0SLTestRunner_ed643271_Tie.runTest(InputStream, ResponseHandler) line: not available	
	_EJSRemote0SLTestRunner_ed643271_Tie._invoke(String, InputStream, ResponseHandler) line: not available	
	WSExtendedServerDelegate(ServerDelegate).dispatchInvokeHandler(InterceptorManager, ServerRequestInfoImpl, String, Object, ServerRequestReader) line: 622	
	WSExtendedServerDelegate(ServerDelegate).dispatch(ServerRequest) line: 475	
	ORB(ORB).process(ServerRequest) line: 504	
	ORB.process(ServerRequest) line: 1571	
	Connection.respondTo(RequestHandler, IIOPReader) line: 2771	
	Connection.doWork(RequestHandler, IIOPReader) line: 2640	
	WorkUnitImpl.doWork() line: 63	
	PooledThread.run() line: 118	
	ThreadPool$Worker.run() line: 1527	

Comment 15 Michael OBrien CLA 2009-01-23 09:58:15 EST
There are 2 deadlocks here
1) readObjectQuery without merge changes
2) readObjectQuery with merge changes

The deadlock in (1) has been fixed in bug# 261910 in rev 3266 yesterday 20090122 and was part of bug# 259993.
http://bugs.eclipse.org/261910

The deadlock in (2) is being fixed as the 2nd part of 259993 - in progress
see:
http://bugs.eclipse.org/259993
http://wiki.eclipse.org/EclipseLink/Bugs/259993

See possibly related new GlassFish bug# 262157

Comment 16 Michael OBrien CLA 2009-01-28 16:29:41 EST
Rev 3322 : Retesting with http://fisheye2.atlassian.com/changelog/eclipselink/?cs=3322
The fix in ref 3322 has reversed test results. Where before we were only hanging in ConcurrencyManager, we now only hang in UnitOfWorkIdentityMapAccessor. 

ant server-test-fieldaccess-advanced
We only hang on 1 test instead of 2 - in UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent(). 
We no longer hang on the secondary timeout in ConcurrencyManager.releaseDeferredLock() 

ant server-test-relationships
We hang on 7 tests - all in the same place in UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent(). 
We no longer hang on ConcurrencyManager.releaseDeferredLock() 

Summary:
No longer occurs
http://wiki.eclipse.org/EclipseLink/Bugs/259993#Timeout_during_ConcurrencyManager.releaseDeferredLock.28.29

Still occurring - depth on single thread remains == 1
http://wiki.eclipse.org/EclipseLink/Bugs/259993#Timeout_during_UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent.28.29
Comment 17 Michael OBrien CLA 2009-01-28 16:35:30 EST
Created attachment 124091 [details]
Hanging tests 7 + 1 for UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent

The following stack and variable trace is for each of the 7 (server-test-fieldaccess-advanced) and 1 (server-test-relationships) remaining hung test threads.
- we no longer see 2 threads causing issues with ConcurrencyManager.releaseDeferredLock()
- when debugging the tests, we exit the hung while loop by manually changing mutex.depth from 1 to 0 so we can continue to the next test.
Comment 18 Michael OBrien CLA 2009-01-28 18:00:58 EST
New Deadlock #3
I had the following suite commented - when it is enabled we now get a new 3rd type of hang waiting indefinitely for a notify() or notifyAll() on the current thread during a find/ReadObjectQuery

        suite.addTest(VirtualAttributeTestSuite.suite());

Daemon Thread [ORB.thread.pool : 1] (Running)	
Daemon Thread [ORB.thread.pool : 0] (Suspended)	
	Object.wait(long, int) line: not available [native method]	
	ConcurrencyManager(Object).wait() line: 167	
	ConcurrencyManager.acquireReadLock() line: 242 [local variables unavailable]	
	ConcurrencyManager.checkReadLock() line: 230	
	HardCacheWeakIdentityMap$ReferenceCacheKey(CacheKey).checkReadLock() line: 180	
	IdentityMapManager.getFromIdentityMap(Vector, Class, boolean, ClassDescriptor) line: 610	
	IdentityMapAccessor.getFromIdentityMap(Vector, Class, boolean, ClassDescriptor) line: 345	
	ExpressionQueryMechanism.checkCacheForObject(AbstractRecord, AbstractSession) line: 730	
	ReadObjectQuery.checkEarlyReturnImpl(AbstractSession, AbstractRecord) line: 229	
	ReadObjectQuery(ObjectLevelReadQuery).checkEarlyReturn(AbstractSession, AbstractRecord) line: 620	
	ReadObjectQuery(ObjectLevelReadQuery).checkEarlyReturn(AbstractSession, AbstractRecord) line: 644	
	ReadObjectQuery(DatabaseQuery).execute(AbstractSession, AbstractRecord) line: 613	
	ReadObjectQuery(ObjectLevelReadQuery).execute(AbstractSession, AbstractRecord) line: 891	
	ReadObjectQuery.execute(AbstractSession, AbstractRecord) line: 397	
	ReadObjectQuery(ObjectLevelReadQuery).executeInUnitOfWork(UnitOfWorkImpl, AbstractRecord) line: 954	
	RepeatableWriteUnitOfWork(UnitOfWorkImpl).internalExecuteQuery(DatabaseQuery, AbstractRecord) line: 2690	
	RepeatableWriteUnitOfWork(AbstractSession).executeQuery(DatabaseQuery, AbstractRecord, int) line: 1181	
	RepeatableWriteUnitOfWork(AbstractSession).executeQuery(DatabaseQuery, AbstractRecord) line: 1165	
	RepeatableWriteUnitOfWork(AbstractSession).executeQuery(DatabaseQuery) line: 1125	
	EntityManagerImpl.executeQuery(ReadObjectQuery, LockModeType, UnitOfWork) line: 626	
	EntityManagerImpl.findInternal(ClassDescriptor, AbstractSession, Object, LockModeType, Map) line: 570	
	EntityManagerImpl.find(Class<T>, Object, LockModeType, Map) line: 474	
	EntityManagerImpl.find(Class<T>, Object) line: 374	
	JPATxEmInvocation.find(Class<T>, Object) line: 211	
	JPATxEntityManager(JPAEntityManager).find(Class<T>, Object) line: 175	
	VirtualAttributeTestSuite.testReadVirtualAttribute() line: 79	
	NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]	
	NativeMethodAccessorImpl.invoke(Object, Object[]) line: 45	
	DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 37	
	Method.invoke(Object, Object...) line: 599	
	VirtualAttributeTestSuite(TestCase).runTest() line: 154	
	VirtualAttributeTestSuite(TestCase).runBare() line: 127	
	VirtualAttributeTestSuite(JUnitTestCase).runBareServer() line: 396	
	TestRunnerBean.runTest(String, String, Properties) line: 87	
	EJSRemote0SLTestRunner_ed643271.runTest(String, String, Properties) line: not available	
	_EJSRemote0SLTestRunner_ed643271_Tie.runTest(InputStream, ResponseHandler) line: not available	
	_EJSRemote0SLTestRunner_ed643271_Tie._invoke(String, InputStream, ResponseHandler) line: not available	
	WSExtendedServerDelegate(ServerDelegate).dispatchInvokeHandler(InterceptorManager, ServerRequestInfoImpl, String, Object, ServerRequestReader) line: 622	
	WSExtendedServerDelegate(ServerDelegate).dispatch(ServerRequest) line: 475	
	ORB(ORB).process(ServerRequest) line: 504	
	ORB.process(ServerRequest) line: 1571	
	Connection.respondTo(RequestHandler, IIOPReader) line: 2771	
	Connection.doWork(RequestHandler, IIOPReader) line: 2640	
	WorkUnitImpl.doWork() line: 63	
	PooledThread.run() line: 118	
	ThreadPool$Worker.run() line: 1527	


    public synchronized void acquireReadLock() throws ConcurrencyException {
        // Cannot check for starving writers as will lead to deadlocks.
        while ((this.activeThread != null) && (this.activeThread != Thread.currentThread())) {
            try {
-->holding         wait();
            } catch (InterruptedException exception) {
                throw ConcurrencyException.waitWasInterrupted(exception.getMessage());
            }
        }
        this.numberOfReaders++;
    }
Comment 19 Michael OBrien CLA 2009-01-29 12:18:41 EST
After several debug sessions here are some new facts:
- we intermittently fail with 2 types of deadlocks
- we always fail with deadlock type 1
- working around the deadlock using any type of workaround like the alt# 1 patch - causes a 3rd or 4th type of deadlock on a wait() for another thread later.

1) "Intermittently" there are 2 threads running, on occassion we only see a single thread and only see the 8 failures mentioned above.

- only UnitOfWorkIdentityMapAccessor deadlocks
Daemon Thread [RT=0:P=942386:O=0:WSTCPTransportConnection[addr=10.156.52.98,port=1511,local=2813]] (Running)	
Daemon Thread [ORB.thread.pool : 0] (Running)	

- ConcurrencyManager deadlocks
Daemon Thread [RT=0:P=942386:O=0:WSTCPTransportConnection[addr=10.156.52.98,port=1511,local=2813]] (Running)	
Daemon Thread [ORB.thread.pool : 0] (Running)	
Daemon Thread [RT=1:P=942386:O=0:WSTCPTransportConnection[addr=10.156.52.98,port=1512,local=9104]] (Running)	
Daemon Thread [ORB.thread.pool : 1] (Running)	

2) When a deadlock occurs on ConcurrencyManager, this is due to to the fact that the currentThread != the activeThread.
This agrees with our original theory that the acquire and release threads are different and that we are not checking the thread on release - like we are on acquire.

[1/29/09 10:56:05:127 EST] 00000025 SystemOut     O [EL Finer]: 2009-01-29 10:56:05.127--Thread(Thread[ORB.thread.pool : 0,5,main])--Thread "ORB.thread.pool : 0" has acquired a deferred lock on object : null in order to avoid deadlock.
[1/29/09 10:59:30:006 EST] 00000025 SystemOut     O __ConcurrencyManager.releaseDeferredLock() deadlock: currentThread: Thread[ORB.thread.pool : 0,5,main] activeThread: Thread[ORB.thread.pool : 1,5,main]


3) A 3rd or 4th type of deadlock on a wait() occurs as an affect of breaking out of the deadlock in 1 or 2 above because the thread is still not released - it will not occur under normal functioning.
In the javadoc for wait() in ConcurrencyManager.acquireDeferredLock():190 we state that "If another thread is also waiting on the acquire then a deadlock could occur" - this is what is happening here.

Two deadlocked threads (thread 0 isHung==true
---------------------------------------------
Daemon Thread [RT=0:P=942386:O=0:WSTCPTransportConnection[addr=10.156.52.98,port=1511,local=2813]] (Running)	
Daemon Thread [ORB.thread.pool : 0] (Suspended)	
	Object.wait(long, int) line: not available [native method]	
	ConcurrencyManager(Object).wait() line: 167	
	ConcurrencyManager.acquire(boolean) line: 89	
	ConcurrencyManager.acquire() line: 75 [local variables unavailable]	
	ConcurrencyManager.acquireDeferredLock() line: 198	
	EntityManagerJUnitTestSuite.testUpdateAllProjectsWithNullTeamLeader() line: 3913 [local variables unavailable]	
Daemon Thread [RT=1:P=942386:O=0:WSTCPTransportConnection[addr=10.156.52.98,port=1512,local=9104]] (Running)	
Daemon Thread [ORB.thread.pool : 1] (Suspended)	
	Object.wait(long, int) line: not available [native method]	

this	ThreadPool  (id=313)	
	activeThreads	1	
	maximumPoolSize_	50	
	minimumPoolSize_	10	
	name	"ORB.thread.pool" (id=329)	
	poolSize_	2	
	threads_	HashMap<K,V>  (id=333)	
		elementCount	2	
		elementData	HashMap$Entry<K,V>[16]  (id=357)	
			[1]	HashMap$Entry<K,V>  (id=363)	
				key	ThreadPool$Worker  (id=166)	
					isHung	false	
					name	"ORB.thread.pool : 1" (id=316)	
					threadNumber	"00000026" (id=319)	
					uniqueId	102	
			[8]	HashMap$Entry<K,V>  (id=367)	
				key	ThreadPool$Worker  (id=139)	
					isHung	true	
					lock	Thread$ThreadLock  (id=251)	
					name	"ORB.thread.pool : 0" (id=253)	
					threadNumber	"00000025" (id=376)	
					uniqueId	100	



- Currently debugging earlier in AbstractSynchronizationLister.afterCompletion()
Comment 20 Michael OBrien CLA 2009-02-03 23:43:02 EST
Created attachment 124639 [details]
eclipselink core patch (pre code-review)

There are 3 places where we have thread wait issues, here are the fixes in core.
I aligned these changes with exiting thread checks and wait timeouts already in the code.

We are seeing this thread wait behavior when we run get two different server threads for writing and reading.
1) A find after changes were already committed
Reproduction: ant server-test-relationships

Where:
UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent()

Fix:
Same fix as CacheKey.waitForObject() for a wait(10) call
http://fisheye2.atlassian.com/browse/eclipselink/trunk/foundation/org.eclipse.persistence.core/src/org/eclipse/persistence/internal/identitymaps/CacheKey.java?r1=1817&r2=3322
We use CacheKey.MAX_WAIT_TRIES on the same wait(5) call

Effect:
throw same ConcurrencyException.maxTriesLockOnBuildObjectExceded


2) A find or getReference that causes pending changes to commit
Reproduction: ant server-test-relationships
Reproduction: ant server-test-fieldaccess-advanced

Where:
ConcurrencyManager.releaseDeferredLock()

Fix:
In WriteLockManager.acquireLocksForClone()
We use the same fix as WriteLockManager.acquireRequiredLocks() where we also verify the threads are different before a wait()
&& (activeCacheKey.getMutex().getActiveThread() != Thread.currentThread())

also
We preemptively check for different current/active threads and switch them on the after AbstractSynchronizationListener.afterCompletion() callback.
and log a warning

Effect:
- no more thread wait issue
- threads are switched and new activeThread is released
- warning emitted when threads were different on afterCompletion callback



3) A find that acquires a readLock with no pending changes
Same issue as
https://bugs.eclipse.org/bugs/show_bug.cgi?id=261910#c2

Reproduction: ant server-test-relationships

Where:
ConcurrencyManager.acquireReadLock() line: 243 

Fix:
Same fix as WriteLockManager.acquireLocksForClone() using MAXTRIES 
Except we throw a different maxTriesAcquireReadLockExceded a ConcurrencyException.maxTriesLockOnCloneExceded

Effect:
- we exit by throwing a maxTriesAcquireReadLockExceded

Other options:
-------------
I am only using option 2 below - so there is no configuration changes requires of developers.

2) switch activeThread to currentThread
                    this.activeThread = currentThread;
                    this.release();


3) For deferred lock deadlock - disable caching using the persistence.xml property "eclipselink.cache.shared.default"="false"

4) Configure cache isolation using
	James
    /** Reads and unit of work merges can occur concurrently. */
	getServerSession().getLogin().setCacheTransactionIsolation(DatabaseLogin.CONCURRENT_READ_WRITE);
	Gordon
    /** Writes to the cache (merge, object build/refresh will be synchronized
     * as will cache access (cloning) based on when access is required.
     */
	getServerSession().getLogin().setCacheTransactionIsolation(DatabaseLogin.SYNCRONIZED_OBJECT_LEVEL_READ_WRITE);

5) Disable deferred locks - only for debugging the cause - DeferredLockManager.SHOULD_USE_DEFERRED_LOCKS = false;
>result = 
6) Use lazy relationships with no join fetching (that requires deferred locks) - use batch reading
- but one of the tests only involves a single entity with an embeddable


Testing:
-----------------
LRG: before fix#3 
    [junit] Tests run: 6647, Failures: 0, Errors: 8, Time elapsed: 3,514.685 sec
Total time: 58 minutes 50 seconds
 - server test hung only on issue #3
 - I will retest the lrg tomorrow as it takes an hour.

After fix #3: we have no more thread wait issues - we used to fail with 8 errors here.
    [junit] Tests run: 30, Failures: 0, Errors: 4, Time elapsed: 296.177 sec
    [junit] Test org.eclipse.persistence.testing.tests.jpa.relationships.ServerTestSuite FAILED

This suite is unchanged (but without any wait loops)
    [junit] Tests run: 203, Failures: 0, Errors: 28, Time elapsed: 553.018 sec
    [junit] Test org.eclipse.persistence.testing.tests.jpa.fieldaccess.advanced.ServerTestSuite FAILED

Note: the bug wiki still needs to be updated with this patch's results
Comment 21 Michael OBrien CLA 2009-02-09 19:10:39 EST
Cause found while debugging with Gordon and Peter.
How? added extra breakpoints on all (non-setLifecycle()) direct modifier of uow.lifecycle to pick up callbacks from the server during the previous commit before a find()

1) We acquire locks in beforeCompletion()
2) During a commit WebSphere 7 is attempting to clearForClose the entityManager before the transaction has completed.

WebSphere JUnit jpa.test [Remote Java Application]	
	IBM J9 VM[localhost:7777]	
		Daemon Thread [ORB.thread.pool : 1] (Suspended (breakpoint at line 5339 in UnitOfWorkImpl))	
			RepeatableWriteUnitOfWork(UnitOfWorkImpl).clearForClose(boolean) line: 5339	
			RepeatableWriteUnitOfWork.clearForClose(boolean) line: 126	
			EntityManagerImpl.clear() line: 220	
			JPAEMPool.putEntityManager(EntityManager) line: 147	
			JPATxEntityManager.closeTxEntityManager(EntityManager, boolean) line: 299	
			JPATxEmInvocation.afterCompletion(int) line: 87	
			ComponentContextSynchronizationWrapper.afterCompletion(int) line: 83	
			RegisteredSyncs(RegisteredSyncs).coreDistributeAfter(int) line: 357	
			RegisteredSyncs.distributeAfter(int) line: 317	
			TransactionImpl(TransactionImpl).distributeAfter(int) line: 2933	
			TransactionImpl(TransactionImpl).postCompletion(int, boolean, boolean) line: 2892	
			TransactionImpl(TransactionImpl).postCompletion(int) line: 2805	
			TransactionImpl(TransactionImpl).commitXAResources() line: 1775	
			TransactionImpl.stage1CommitProcessing() line: 497	
			TransactionImpl(TransactionImpl).processCommit() line: 978	
			TransactionImpl(TransactionImpl).commit() line: 913	
			TranManagerImpl.commit() line: 369	
			TranManagerSet(TranManagerSet).commit() line: 161	
			UserTransactionImpl.commit() line: 293	
			UserTransactionWrapper.commit() line: 305	
			JEEPlatform.commitTransaction() line: 74	
			VirtualAttributeTestSuite(JUnitTestCase).commitTransaction(EntityManager) line: 209	
			VirtualAttributeTestSuite.testInsertVirtualAttribute() line: 65	


3) consequently, the state of the unitOfWork is modified from 4 (MergePending) back to 0 (Birth).
4) During the commit before in the AbstractSynchronizationListener.afterCompletion() callback - the uow.isMergePending() is now 0 instead of 4, so no mergeClonesAfterCompletion() happens.
   - this results no release of the lock acquired before the em was cleared in mid commit.

Fix:
   Ignore/defer any request to uow.clear() in uow.clearForClose() if the state/lifecyle of the unitOfWork is *Pending or one of (1,2 or 4), the clear() will be done later on release().
   We no longer have issues with a find that has issues with a non-released previous commit.

Tested: The following suites are OK now
ant server-test-relationships
    [junit] Tests run: 30, Failures: 0, Errors: 0, Time elapsed: 39.021 sec
ant server-test-fieldaccess-advanced
    [junit] Tests run: 203, Failures: 0, Errors: 2, Time elapsed: 77.011 sec

Code-review, Java SE tests and full regression tests are in progress.
Comment 22 Michael OBrien CLA 2009-02-09 19:19:26 EST
Created attachment 125187 [details]
eclipselink core patch (pre-code review)
Comment 23 Michael OBrien CLA 2009-02-10 12:15:47 EST
Created attachment 125268 [details]
eclipselink jpa test 

5 of 6 tests complete (postPersist callback is pending)
Comment 24 Michael OBrien CLA 2009-02-10 13:43:07 EST
Created attachment 125287 [details]
eclipselink core patch (code review 2)

    1) UnitOfWorkImpl.mergeClonesAfterCompletion() change moved to MergeManager.mergeChanges()

    2) UnitOfWorkImpl.clearForClose() unchanged from previous patch
Comment 25 Michael OBrien CLA 2009-02-10 16:49:50 EST
Created attachment 125318 [details]
eclipselink core patch (code review 3)

Review 3:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=259993


1) UnitOfWorkImpl.mergeClonesAfterCompletion()
    - I moved the different thread code block back to UnitOfWorkImpl from MergeManager
       - I assume that the activeThread != null
    After James/Gordon's suggestion did the following MergeManager change
     - cache the currentThread as MergeManager.lockThread on the 3 acquire calls in WriteLockManager
     - if this thread is different in uow.mergeClonesAfterCompletion - change the activeThread on all lockManagers

2) UnitOfWorkImpl.clearForClose()
    - Andrei's suggestion that we still clear the identityMaps has been added as a "Partial clear()" when the clearForClose flag is true
      I split out the part of clear() that clears the identityMaps and run this from clear() as well as clearForClose() in our *Pending state case.

Test results:
-------------
With the 2nd last patch
https://bugs.eclipse.org/bugs/attachment.cgi?id=125187&action=diff

ant server-test-relationships
    [junit] Tests run: 30, Failures: 0, Errors: 0, Time elapsed: 39.021 sec
ant server-test-fieldaccess-advanced
    [junit] Tests run: 203, Failures: 0, Errors: 2, Time elapsed: 77.011 sec

With the current patch (we get 1 extra error in test-fieldaccess-advanced)
ant server-test-relationships
    [junit] Tests run: 30, Failures: 0, Errors: 0, Time elapsed: 44.788 sec
ant server-test-fieldaccess-advanced
    [junit] Tests run: 203, Failures: 0, Errors: 3, Time elapsed: 89.341 sec

    thank you
    /michael
Comment 26 Michael OBrien CLA 2009-02-11 15:08:12 EST
Created attachment 125441 [details]
eclipselink core patch (code review 3b) - mergeManager NPE check on was7 afterCompletion callback

This patch is the same as the previous version except for a null check in uow.mergeClonesAfterCompletion()
>The null pointer exception was on a WAS7 callback to afterCompletion where the mergeManager was null - I added back the !null check.
WebSphere 7 Tests now match the results we have for WebSphere 6.1

    [junit] Running org.eclipse.persistence.testing.tests.jpa.relationships.ServerTestSuite
    [junit] Tests run: 30, Failures: 0, Errors: 0, Time elapsed: 97.437 sec

and
    [junit] Running org.eclipse.persistence.testing.tests.jpa.fieldaccess.advanced.ServerTestSuite
    [junit] Tests run: 203, Failures: 0, Errors: 2, Time elapsed: 164.829 sec
Comment 27 Michael OBrien CLA 2009-02-11 16:41:37 EST
Created attachment 125455 [details]
eclipselink core patch (code review 4) - 1 of 3 sets of lockThread kept in writeLockManager.acquire*() functions 

Code Review: Gordon/Andrei
1) We only require the lockThread set on acquires that are involved in inserts/updates and not reads.

Keep:
public void acquireRequiredLocks(MergeManager mergeManager, UnitOfWorkChangeSet changeSet)

Remove:
public Map acquireLocksForClone(Object objectForClone, ClassDescriptor descriptor, CacheKey cacheKey, AbstractSession session, UnitOfWorkImpl unitOfWork)
public CacheKey acquireLockAndRelatedLocks(Object objectForClone, Map lockedObjects, Map refreshedObjects, CacheKey cacheKey, ClassDescriptor descriptor, AbstractSession session, UnitOfWorkImpl unitOfWork)

2) Of the 3 lifecycle states that require us to do a partial clear() that involves only the cache - the CommitTransactionPending==2 state may not be required.  However, we will never reach lifecycle state 2 because the entityManager.clear(true) call in the first part of the if is the only part that calls clearForClose(true).  If we are in a transaction then the clearForClose() code will not be called.  We are ok with/without the check on lifecyle state 2==CommitTransactionPending.
The other two CommitPending and MergePending are still required

Testing: ok
    [junit] Running org.eclipse.persistence.testing.tests.jpa.relationships.ServerTestSuite
    [junit] Tests run: 30, Failures: 0, Errors: 0, Time elapsed: 98.124 sec
    [junit] Running org.eclipse.persistence.testing.tests.jpa.fieldaccess.advanced.ServerTestSuite      [junit] Tests run: 203, Failures: 0, Errors: 2, Time elapsed: 162.483 sec
Comment 28 Michael OBrien CLA 2009-02-11 20:38:06 EST
Code review #4 complete - ready for checkin as soon as I verify JPA/Core LRG results
- these tests are in progress today and tomorrow


JPA-LRG tests results:
I am not getting 0,0 results for the non-server SE tests - triaging
part1 = clearForClose() clear only cache when in state *Pending
part2 = switch activeThread during mergeClonesAfterCompletion() to acquired thread saved on MergeManager 

Triaging: IBM J9 JVM test combinations

            |clean | w/part1 | w/part2 | all changes
------------+---------------------------------------
branch 1.1  |?pend | 36,152  | 36,152  | 36,153
trunk       |8,18  | 8,17    | ?pend   | 8,17


branch 1.1 compiled and run under IBM J9 (with last patch)
----------------------------------------------------------
>ant test-jpa
    [junit] Tests run: 1064, Failures: 36, Errors: 156, Time: 1,871.345 sec
>after refresh to rev 3439 without recompile
    [junit] Tests run: 1064, Failures: 36, Errors: 153, Time: 1,478.587 sec
>after switch to local db, and log back to info
>after compile under sun jdk
    [junit] Tests run: 1064, Failures: 36, Errors: 157, Time: 730.414 sec
>comment out part 2 of fix (thread switch)
>recompile under ibm j9
    [junit] Tests run: 1064, Failures: 36, Errors: 152, Time: 940.399 sec
>comment out part 1 of fix (clearForClose) only, reenable part 2
    [junit] Tests run: 1064, Failures: 36, Errors: 152, Time: 896.875 sec
>comment out everything
..pending - expect the same 36,152

trunk 2.0 clean 20090211: compiled and run under IBM J9 (rev 3439 (no modifications))
---------------------------------------------------------------------------------
>ant test-jpa
    [junit] Tests run: 1087, Failures: 8, Errors: 18, Time: 1,829.168 sec
>after switch to local db, and log back to info
>after compile under sun jdk - new issues with jaxp compiler path
>switch back to ibm - recompile
    [junit] Tests run: 1087, Failures: 9, Errors: 18, Time: 964.189 sec
>adding part 1 of fix (clearForClose) only
>recompile under ibm j9
    [junit] Tests run: 1087, Failures: 8, Errors: 17, Time: 1,006.544 sec
>add only part 2 of fix (thread switch)
..pending - expect the same 8,17
>add full fix to trunk
    [junit] Tests run: 1087, Failures: 8, Errors: 17, Time: 978.588 sec


summary:
--------
an aside-->switching the database from a hosted server to a local (a min dual-core) or in my case 2nd pc on the same subnet - cuts transaction times in half - the tests run in 15 minn instead of 30

From the results it looks like there is no new failures on the [trunk] branch with these changes.
The 1.1 branch is having issues with numbers that are not related to these 2 changes
- testing core LRG tomorrow
Comment 29 Michael OBrien CLA 2009-02-12 13:38:17 EST
New tests numbers - JPA/Core LRG ok

 1.1 (with local db clean)        
    New tests in jpa/advanced/concurrency/LifecycleJUnitTest
    JPA LRG
       Without fix (- expected errors on new LifecycleJUnitTest suite if fix is disabled)
         [junit] Tests run: 1071, Failures: 0, Errors: 3, Time elapsed: 896.499 sec
       With fix
         [junit] Tests run: 1071, Failures: 0, Errors: 3, Time elapsed: 935.753 sec
    Core LRG
       With/Without fix
         [junit] Tests run: 6647, Failures: 0, Errors: 0, Time elapsed: 1,523.895 sec

Comment 30 Michael OBrien CLA 2009-02-12 15:39:46 EST
>ant server-test-lrg
WebSphere 7.0

Total time: 35 minutes 11 seconds
1045 0 28 97.32% 508.332 
Name Tests Errors Failures Time(s) Time Stamp Host 
org.eclipse.persistence.testing.tests.jpa.advanced 248 6 0 94.782 2009-02-12T19:31:08 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.datatypes 17 0 0 4.399 2009-02-12T19:32:41 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.datetime 7 0 0 2.943 2009-02-12T19:34:03 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.ddlgeneration 9 1 0 6.633 2009-02-12T19:35:26 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.fieldaccess.advanced 203 2 0 95.392 2009-02-12T19:36:56 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.fieldaccess.relationships 16 0 0 12.135 2009-02-12T19:40:01 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.inheritance 41 1 0 35.441 2009-02-12T19:41:37 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.inherited 51 1 0 12.866 2009-02-12T19:43:37 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.jpaadvancedproperties 10 10 0 2.628 2009-02-12T19:45:13 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.jpql 273 0 0 145.442 2009-02-12T19:46:44 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.lob 5 1 0 2.145 2009-02-12T19:50:38 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.relationships 30 0 0 49.495 2009-02-12T19:52:07 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.sessionbean 5 2 0 8.675 2009-02-12T19:54:27 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.timestamptz 5 0 0 1.758 2009-02-12T20:01:47 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.validation 3 0 0 3.767 2009-02-12T19:56:01 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.xml 108 4 0 23.578 2009-02-12T19:59:38 mfobrien-PC3 
org.eclipse.persistence.testing.tests.jpa.xml.merge.inherited 14 0 0 6.253 2009-02-12T19:57:46 mfobrien-PC3 

>results: we have the same # of total errors+failures (28 instead of 27) but they are all errors in WAS 7.0
All but one of these failure to error transitions are due to new serialization requirements on WAS...
WebSphere 6.1 (for comparison)

http://stame02.us.oracle.com/eclipselink/1.1.0-release/2009-02-10/server-test-lrg-websphere-6.1.html
1045 6 21 97.42% 544.358 
Name Tests Errors Failures Time(s) Time Stamp Host 
org.eclipse.persistence.testing.tests.jpa.advanced 248 3 2 79.121 2009-02-10T20:14:20 stame10 
org.eclipse.persistence.testing.tests.jpa.datatypes 17 0 0 5.182 2009-02-10T20:15:57 stame10 
org.eclipse.persistence.testing.tests.jpa.datetime 7 0 0 3.400 2009-02-10T20:17:25 stame10 
org.eclipse.persistence.testing.tests.jpa.ddlgeneration 9 1 0 8.118 2009-02-10T20:19:02 stame10 
org.eclipse.persistence.testing.tests.jpa.fieldaccess.advanced 203 0 2 104.429 2009-02-10T20:20:38 stame10 
org.eclipse.persistence.testing.tests.jpa.fieldaccess.relationships 16 0 0 15.935 2009-02-10T20:23:48 stame10 
org.eclipse.persistence.testing.tests.jpa.inheritance 41 0 1 41.463 2009-02-10T20:25:34 stame10 
org.eclipse.persistence.testing.tests.jpa.inherited 51 0 1 22.055 2009-02-10T20:27:41 stame10 
org.eclipse.persistence.testing.tests.jpa.jpaadvancedproperties 10 10 0 5.170 2009-02-10T20:29:32 stame10 
org.eclipse.persistence.testing.tests.jpa.jpql 273 0 0 139.433 2009-02-10T20:31:06 stame10 
org.eclipse.persistence.testing.tests.jpa.lob 5 1 0 4.359 2009-02-10T20:34:50 stame10 
org.eclipse.persistence.testing.tests.jpa.relationships 30 0 0 48.656 2009-02-10T20:36:19 stame10 
org.eclipse.persistence.testing.tests.jpa.sessionbean 5 2 0 13.413 2009-02-10T20:38:33 stame10 
org.eclipse.persistence.testing.tests.jpa.timestamptz 5 0 0 3.860 2009-02-10T20:45:40 stame10 
org.eclipse.persistence.testing.tests.jpa.validation 3 0 0 6.590 2009-02-10T20:40:14 stame10 
org.eclipse.persistence.testing.tests.jpa.xml 108 4 0 33.863 2009-02-10T20:43:42 stame10 
org.eclipse.persistence.testing.tests.jpa.xml.merge.inherited 14 0 0 9.311 2009-02-10T20:41:52 stame10 
Comment 31 Michael OBrien CLA 2009-02-13 11:59:27 EST
Created attachment 125662 [details]
eclipselink jpa test (extra tests and setup was missing table creation)

Added 2 extra SE unit tests that test commit with/without a clear after.
I was missing the schema generation in the setup - the first run would fail on missing tables - thank you Guy - setup() change reviewed by Tom.

All 5 tests run fine after running setup() to clean the db.
Comment 32 Michael OBrien CLA 2009-02-13 19:27:13 EST
>EclipseLink 1.1 branch fix checked into SVN 3447
http://fisheye2.atlassian.com/changelog/eclipselink/?cs=3447

>With test case additions and changes in...

http://fisheye2.atlassian.com/changelog/eclipselink/?cs=3455
http://fisheye2.atlassian.com/changelog/eclipselink/?cs=3461

Trunk commit is pending for Tuesday after the 1.1 branch full regression tests are run on all application servers.
A review mail to the dev group for trunk will be sent out before commit.

The wiki page will be updated to reflect final changes after the servers come back on line next tue.

Comment 33 Michael OBrien CLA 2009-02-17 11:42:56 EST
Created attachment 125910 [details]
EclipseLink full trunk patch (off rev 3439)
Comment 34 Michael OBrien CLA 2009-02-17 11:43:01 EST
Created attachment 125911 [details]
EclipseLink full trunk patch (off rev 3439)
Comment 35 Michael OBrien CLA 2009-02-17 11:46:37 EST
Comment on attachment 125910 [details]
EclipseLink full trunk patch (off rev 3439)

double submitted patch
Comment 36 Michael OBrien CLA 2009-02-17 19:15:06 EST
Patch checked into Trunk in rev 3472
http://fisheye2.atlassian.com/changelog/eclipselink/?cs=3472
Comment 37 Michael OBrien CLA 2009-02-18 09:25:04 EST
The WebSphere 6.1 server tests are reduced from 6 failures, 28 errors to 6 failures, 27 errors with this transaction.
org.eclipse.persistence.testing.tests.jpa.advanced | EntityManagerJUnitTestSuite | testGetReferenceUsedInUpdate passes now.
Comment 38 Gordon Yorke CLA 2009-04-13 10:27:00 EDT
This bug still exists in the case where the transaction rolls back.
UnitOfWorkImpl.afterExternalTransactionRollback() does not include the code to update the lock owning threads but it does attempt to release locks.

The lock owner updating code should be abstracted into a separate thread.  Support to switch deferred locks is needed as well.  It could easily be added to this bug or an additional bug could be opened.
Comment 39 Michael OBrien CLA 2009-04-13 10:46:27 EDT
I think we need a new bug - as 259993 is fixed as far as WebSphere 7 is concerned regarding a prematurely cleared entitymanager.
    The 259993 bug never really had anything to do with mismatched lock threads - and it occurs on WebLogic.

The 2nd part of the fix for 259993 that we put in as an extra fix along with the base 259993 clearForClose() fix for WAS, does not have a test case and we were not able to reproduce it - so we currently have no way to test any change until we create a unit server test on WebLogic.
   
    1) The change for WebSphere in clearForClose() is reproducable and has server tests.
    2) The change for mismatched threads in mergeClonesAfterCompletion() needs a reproduction test case

    If we reopen the bug I think we should remove the 2nd part of the original fix that is unrelated to this bug - restest on WebSphere 7 , close the bug and reopen a bug specifically for this thread locking issue.
Comment 40 Gordon Yorke CLA 2009-04-13 11:13:44 EDT
This bug has two different fixes one of which does not need to be updated.  A new bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=272022 has been created to isolated the required additional updates.
Comment 41 Michael OBrien CLA 2010-06-15 15:36:44 EDT
>See additional workaround for the fact that WebSphere 7 caches the EntityManager Impls and uses clear() instead of close() on them
in bug# 316531
Comment 42 Eclipse Webmaster CLA 2022-06-09 10:08:10 EDT
The Eclipselink project has moved to Github: https://github.com/eclipse-ee4j/eclipselink