Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 109898 - Intermittent failures in IJobManagerTests relating to job families
Summary: Intermittent failures in IJobManagerTests relating to job families
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Runtime (show other bugs)
Version: 3.5   Edit
Hardware: PC Windows 2000
: P3 normal (vote)
Target Milestone: 3.5 M3   Edit
Assignee: John Arthorne CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-19 11:53 EDT by John Arthorne CLA
Modified: 2008-10-16 14:08 EDT (History)
2 users (show)

See Also:


Attachments
The one-liner fix for the problem (766 bytes, patch)
2008-10-16 09:24 EDT, Danail Nachev CLA
john.arthorne: iplog+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description John Arthorne CLA 2005-09-19 11:53:19 EDT
Test failure in N20050918-0010, Mac OSX:

testJobFamilyFind	Failure	Timeout waiting for job to start

junit.framework.AssertionFailedError: Timeout waiting for job to start
at
org.eclipse.core.tests.runtime.jobs.IJobManagerTest.waitForStart(IJobManagerTest.java:1644)
at
org.eclipse.core.tests.runtime.jobs.IJobManagerTest.testJobFamilyFind(IJobManagerTest.java:458)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:330)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:204)
at org.eclipse.test.CoreTestApplication.runTests(CoreTestApplication.java:35)
at org.eclipse.test.CoreTestApplication.run(CoreTestApplication.java:31)
at
org.eclipse.core.internal.runtime.PlatformActivator$1.run(PlatformActivator.java:226)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:377)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:164)
at org.eclipse.core.launcher.Main.invokeFramework(Main.java:338)
at org.eclipse.core.launcher.Main.basicRun(Main.java:282)
at org.eclipse.core.launcher.Main.run(Main.java:977)
at org.eclipse.core.launcher.Main.main(Main.java:952)
Comment 1 John Arthorne CLA 2005-10-07 15:27:42 EDT
Test failure in N20051007 MacOSX:

Timeout waiting for job to start

junit.framework.AssertionFailedError: Timeout waiting for job to start
at
org.eclipse.core.tests.runtime.jobs.IJobManagerTest.waitForStart(IJobManagerTest.java:1644)
at
org.eclipse.core.tests.runtime.jobs.IJobManagerTest.testJobFamilyCancel(IJobManagerTest.java:326)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:330)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:204)
at org.eclipse.test.CoreTestApplication.runTests(CoreTestApplication.java:35)
at org.eclipse.test.CoreTestApplication.run(CoreTestApplication.java:31)
at
org.eclipse.core.internal.runtime.PlatformActivator$1.run(PlatformActivator.java:226)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:377)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:164)
at org.eclipse.core.launcher.Main.invokeFramework(Main.java:338)
at org.eclipse.core.launcher.Main.basicRun(Main.java:282)
at org.eclipse.core.launcher.Main.run(Main.java:977)
at org.eclipse.core.launcher.Main.main(Main.java:952)
Comment 2 John Arthorne CLA 2005-11-01 09:35:44 EST
Another Mac failure on I20051101-0010 (note this is an error, not a test
assertion failure:

testJobFamilyNULL	Error	assertion failed:

org.eclipse.core.internal.runtime.AssertionFailedException: assertion failed:
at org.eclipse.core.internal.runtime.Assert.isTrue(Assert.java:107)
at org.eclipse.core.internal.runtime.Assert.isTrue(Assert.java:93)
at org.eclipse.core.internal.jobs.JobQueue.enqueue(JobQueue.java:73)
at org.eclipse.core.internal.jobs.JobManager.changeState(JobManager.java:279)
at org.eclipse.core.internal.jobs.JobManager.sleep(JobManager.java:978)
at org.eclipse.core.internal.jobs.JobManager.sleep(JobManager.java:990)
at
org.eclipse.core.tests.runtime.jobs.IJobManagerTest.testJobFamilyNULL(IJobManagerTest.java:907)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:330)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:204)
at org.eclipse.test.CoreTestApplication.runTests(CoreTestApplication.java:35)
at org.eclipse.test.CoreTestApplication.run(CoreTestApplication.java:31)
at
org.eclipse.core.internal.runtime.PlatformActivator$1.run(PlatformActivator.java:226)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:376)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:165)
at org.eclipse.core.launcher.Main.invokeFramework(Main.java:338)
at org.eclipse.core.launcher.Main.basicRun(Main.java:282)
at org.eclipse.core.launcher.Main.run(Main.java:977)
at org.eclipse.core.launcher.Main.main(Main.java:952)
Comment 3 John Arthorne CLA 2005-11-15 16:21:32 EST
I have found a place in JobManager that could be responsible for this failure. 
In JobManager.endJob, the completed job is detached from the linked list of
blocked jobs by calling job.setPrevious(null). However, since the list is doubly
linked, the first blocked job to be removed will relink the finished job to the
next blocked job in the list.  Thus the completed job ends up having a dangling
"previous" pointer.  I have released a fix for this and we will see if it fixes
this test failure.
Comment 4 John Arthorne CLA 2005-11-21 17:52:09 EST
Note to self: This hasn't been released yet because of the runtime refactoring.
Comment 5 John Arthorne CLA 2005-12-12 09:53:36 EST
Failed again in N20051209:

testJobFamilyCancel	Failure	Timeout waiting for job to start. Job: TestSecondFamily(40), state: 2

Console output:

     [java] !ENTRY org.eclipse.core.jobs 4 1 2005-12-09 02:18:14.484
     [java] !MESSAGE Worker thread ended job: testBeginRuleNoEnd(24), but still holds rule: ThreadJob(testBeginRuleNoEnd(24),[PathRule(testBeginRuleNoEnd),])
     [java] **** BEGIN DUMP JOB MANAGER INFORMATION ****
     [java] TestSecondFamily(42) state: RUNNING
     [java] TestSecondFamily(44) state: WAITING
     [java] TestSecondFamily(46) state: WAITING
     [java] TestSecondFamily(48) state: WAITING
     [java] TestSecondFamily(50) state: WAITING
     [java] TestSecondFamily(52) state: WAITING
     [java] TestSecondFamily(54) state: WAITING
     [java] TestSecondFamily(56) state: WAITING
     [java] TestSecondFamily(58) state: WAITING
     [java] Compacting memory(10) state: SLEEPING
     [java] **** END DUMP JOB MANAGER INFORMATION ****
Comment 6 John Arthorne CLA 2006-02-08 09:40:21 EST
Failed in I20060207 on Linux.  I have disabled the test on non-windows platforms for now because after significant investigation I was unable to find the cause of the failure.  Running the test independently, it fails roughly once in a hundred executions.  On the test machine, it seems to happen more frequently.

testJobFamilyFind	Failure	Timeout waiting for job to start. Job: TestThirdFamily(122), state: 2

junit.framework.AssertionFailedError: Timeout waiting for job to start. Job: TestThirdFamily(122), state: 2
at org.eclipse.core.tests.runtime.jobs.IJobManagerTest.waitForStart(IJobManagerTest.java:1657)
at org.eclipse.core.tests.runtime.jobs.IJobManagerTest.testJobFamilyFind(IJobManagerTest.java:457)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:330)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:204)
at org.eclipse.test.CoreTestApplication.runTests(CoreTestApplication.java:35)
at org.eclipse.test.CoreTestApplication.run(CoreTestApplication.java:31)
at org.eclipse.core.internal.runtime.PlatformActivator$1.run(PlatformActivator.java:109)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:92)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:68)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:374)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:169)
at org.eclipse.core.launcher.Main.invokeFramework(Main.java:338)
at org.eclipse.core.launcher.Main.basicRun(Main.java:282)
at org.eclipse.core.launcher.Main.run(Main.java:977)
at org.eclipse.core.launcher.Main.main(Main.java:952)
Comment 7 John Arthorne CLA 2006-03-08 11:26:47 EST
This is possibly a duplicate of bug 129551.  I will reenable the tests in the nightly builds to confirm.
Comment 8 Danail Nachev CLA 2007-07-11 05:02:55 EDT
I wonder what is the status of this bug? There is still trace of this bug # in the IJobManagerTest and some of the tests are still executed only on Windows. One such test has failed in the exact same way as described.
Comment 9 John Arthorne CLA 2007-07-11 17:05:23 EDT
Danail, what do you mean by "One such test has failed..." Failed on Windows, or on other platforms? What build?

There have been some bug fixes in jobs in 3.3 that might have solved this problem. When I have some time I may re-enable these tests again in the nightly build and see how it goes.
Comment 10 Danail Nachev CLA 2007-07-12 02:25:34 EDT
The test failed on Windows machine. The code under test is 3.2.2. However, that doesn't mean anything, so we can ignore this:)

I wondered whether these bugs are fixed in 3.3. I expected to see that they are reenabled, but they weren't.

I will keep you informed with any progress I make.
Comment 11 John Arthorne CLA 2007-11-01 10:18:16 EDT
I20071031-0800, win32:

testJobFamilyCancel	Failure	Timeout waiting for job to start. Job: TestSecondFamily(43), state: 2

junit.framework.AssertionFailedError: Timeout waiting for job to start. Job: TestSecondFamily(43), state: 2
at org.eclipse.core.tests.runtime.jobs.IJobManagerTest.waitForRunCount(IJobManagerTest.java:1682)
at org.eclipse.core.tests.runtime.jobs.IJobManagerTest.waitForStart(IJobManagerTest.java:1692)
at org.eclipse.core.tests.runtime.jobs.IJobManagerTest.testJobFamilyCancel(IJobManagerTest.java:326)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:354)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:206)
at org.eclipse.test.CoreTestApplication.runTests(CoreTestApplication.java:35)
at org.eclipse.test.CoreTestApplication.run(CoreTestApplication.java:31)
at org.eclipse.equinox.internal.app.EclipseAppContainer.callMethod(EclipseAppContainer.java:570)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:195)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:106)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:76)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:362)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:175)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:515)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:455)
at org.eclipse.equinox.launcher.Main.run(Main.java:1193)
at org.eclipse.equinox.launcher.Main.main(Main.java:1169)
at org.eclipse.core.launcher.Main.main(Main.java:30)
Comment 12 John Arthorne CLA 2007-11-01 10:20:10 EDT
I have released some changes to testJobFamilyCancel so that we still get some useful testing of this API, while hopefully avoiding the intermittent failure. I have re-enabled the test on all platforms.
Comment 13 John Arthorne CLA 2008-02-18 22:22:49 EST
Timeout waiting for job to start. Job: TestThirdFamily(73), state: 2

junit.framework.AssertionFailedError: Timeout waiting for job to start. Job: TestThirdFamily(73), state: 2
at org.eclipse.core.tests.runtime.jobs.IJobManagerTest.waitForRunCount(IJobManagerTest.java:1680)
at org.eclipse.core.tests.runtime.jobs.IJobManagerTest.waitForStart(IJobManagerTest.java:1690)
at org.eclipse.core.tests.runtime.jobs.IJobManagerTest.testJobFamilyFind(IJobManagerTest.java:459)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:354)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:206)
at org.eclipse.test.CoreTestApplication.runTests(CoreTestApplication.java:35)
at org.eclipse.test.CoreTestApplication.run(CoreTestApplication.java:31)
at org.eclipse.equinox.internal.app.EclipseAppContainer.callMethod(EclipseAppContainer.java:547)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:195)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:106)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:76)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:362)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:175)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:549)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:504)
at org.eclipse.equinox.launcher.Main.run(Main.java:1236)
at org.eclipse.equinox.launcher.Main.main(Main.java:1212)
at org.eclipse.core.launcher.Main.main(Main.java:30)
Comment 14 John Arthorne CLA 2008-02-20 13:53:47 EST
testJobFamilyWakeUp	Failure	5.0: expected state: NONE actual state: RUNNING

junit.framework.AssertionFailedError: 5.0: expected state: NONE actual state: RUNNING
at org.eclipse.core.tests.runtime.jobs.IJobManagerTest.assertState(IJobManagerTest.java:87)
at org.eclipse.core.tests.runtime.jobs.IJobManagerTest.testJobFamilyWakeUp(IJobManagerTest.java:1126)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:354)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:206)
at org.eclipse.test.CoreTestApplication.runTests(CoreTestApplication.java:35)
at org.eclipse.test.CoreTestApplication.run(CoreTestApplication.java:31)
at org.eclipse.equinox.internal.app.EclipseAppContainer.callMethod(EclipseAppContainer.java:547)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:195)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:107)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:77)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:362)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:175)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:549)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:504)
at org.eclipse.equinox.launcher.Main.run(Main.java:1236)
at org.eclipse.equinox.launcher.Main.main(Main.java:1212)
at org.eclipse.core.launcher.Main.main(Main.java:30)
Comment 15 John Arthorne CLA 2008-02-20 13:58:42 EST
I have disabled the above two failing tests for now. This is likely caused by bug 211799.
Comment 16 John Arthorne CLA 2008-02-21 18:54:08 EST
Now that bug 211799 is fixed, I will start re-enabling these tests to see if they fail again.
Comment 17 John Arthorne CLA 2008-02-29 14:51:35 EST
I have re-enabled all tests on all platforms. We'll see what happens...
Comment 18 John Arthorne CLA 2008-03-07 16:38:40 EST
So far so good. Marking fixed!
Comment 19 John Arthorne CLA 2008-03-10 01:15:30 EDT
From N20080308:

5.0: expected state: NONE actual state: RUNNING

junit.framework.AssertionFailedError: 5.0: expected state: NONE actual state: RUNNING
at org.eclipse.core.tests.runtime.jobs.IJobManagerTest.assertState(IJobManagerTest.java:87)
at org.eclipse.core.tests.runtime.jobs.IJobManagerTest.testJobFamilyWakeUp(IJobManagerTest.java:1122)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:354)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:206)
at org.eclipse.test.CoreTestApplication.runTests(CoreTestApplication.java:35)
at org.eclipse.test.CoreTestApplication.run(CoreTestApplication.java:31)
at org.eclipse.equinox.internal.app.EclipseAppContainer.callMethod(EclipseAppContainer.java:547)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:195)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:362)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:175)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:549)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:504)
at org.eclipse.equinox.launcher.Main.run(Main.java:1236)
at org.eclipse.equinox.launcher.Main.main(Main.java:1212)
at org.eclipse.core.launcher.Main.main(Main.java:30)
Comment 20 Danail Nachev CLA 2008-10-13 11:58:00 EDT
(In reply to comment #19)
> From N20080308:
> 
> 5.0: expected state: NONE actual state: RUNNING
> 
> junit.framework.AssertionFailedError: 5.0: expected state: NONE actual state:
> RUNNING
> at
> org.eclipse.core.tests.runtime.jobs.IJobManagerTest.assertState(IJobManagerTest.java:87)
> at
> org.eclipse.core.tests.runtime.jobs.IJobManagerTest.testJobFamilyWakeUp(IJobManagerTest.java:1122)


I investigated this and I think that the problem lies in JobManager.cancel() method and JobManager.startJob().

In JobManager.startJob(), after the job is determined to be canceled before it is run, the state of the job is updated outside the synchronization block (in endJob() method). Meanwhile, the test has called cancel(), which returns true, because the state is ABOUT_TO_RUN. The test checks immediately after returning from cancel(), whether the state is NONE, but there is a chance that this check is done after job is determined to be ended, but before the state of the job is updated.

I'm not sure what is the proper fix here:
1) change the cancel() method to return false in ABOUT_TO_RUN case
2) change the startJob() method to update the state of the job in synchronized block
3) fix the test to expect this situation (properly not very good idea:)
Comment 21 Danail Nachev CLA 2008-10-15 03:32:54 EDT
I can provide patch for this bug, but I need to know what is the preferred way of fixing this.
Comment 22 John Arthorne CLA 2008-10-15 09:25:34 EDT
I'm leaning towards solution 1), which is also likely the safest fix. The ABOUT_TO_RUN state is an internal state that clients don't know about, but essentially this means the job has been removed from the wait queue, and there is a worker thread available that is about to run the job. I think it's fair (and safe) to return false in this case, which is the result the client would get about 1ms later when the thread re-enters the sync block to enter the running state.

This is a great find... this failure has been a mystery to me for three years and I worried it represented a deeper synchronization problem that I hadn't tracked down.
Comment 23 Danail Nachev CLA 2008-10-16 09:24:03 EDT
Created attachment 115249 [details]
The one-liner fix for the problem

Here is the one-liner fix for the problem as you suggested. Taking the other road is rather problematic.

I have tested the fix and it hasn't failed for 500 passes, whereas before it failed once in 100 passes.
Comment 24 John Arthorne CLA 2008-10-16 10:34:49 EDT
Thanks Danail, I'll get this released.
Comment 25 John Arthorne CLA 2008-10-16 14:08:30 EDT
Fix released and test re-enabled. Thanks Danail!