Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 211799

Summary: Jobs with identical priority, delay and conflicting scheduling rules can run in a different order than they were scheduled
Product: [Eclipse Project] Platform Reporter: David Cummings <dcummin>
Component: RuntimeAssignee: John Arthorne <john.arthorne>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: john.arthorne, philippe_mulet, slubicki, Tod_Creasey
Version: 3.3.1   
Target Milestone: 3.4 M6   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Attachments:
Description Flags
Example plug-in that demonstrates the bug
none
Work in progress
none
Updated patch none

Description David Cummings CLA 2007-12-03 13:38:22 EST
Created attachment 84345 [details]
Example plug-in that demonstrates the bug

Build ID: M20071023-1652

The Job#schedule(long) method clearly states in its Javadoc that jobs with identical priority, delay and conflicting scheduling rules are guaranteed to run in the order that they were scheduled.

I have noticed behaviour where this is not always the case.  I have attached an example plug-in which reproduces the ordering problem.  The example schedules a series of Jobs with identical priority, delay and scheduling rules and determines if they executed in the order in which they were scheduled.

To run the example, import the example plug-in and launch a runtime.  From the Sample Menu choose Sample Command.

I've done some initial debugging to identify the issue and this is what I've found in the JobManager#nextJob() call:
1) Typically the scheduled jobs from the example all reside the the waiting queue in their proper order.  None of the scheduled jobs from the example are found in the sleeping queue.  As long as they remain in the waiting queue and are executed from there, the ordering is maintained.
2) When the ordering does get rearranged, and a job is run out of order, some of the jobs are still in the waiting queue and the rest are sitting in the sleeping queue.

Seems like the ordering gets rearranged because only a portion of the jobs are put to sleep and the ones that remain in the waiting queue get run first.
Comment 1 Tod Creasey CLA 2008-01-04 13:34:18 EST
John this is the bug I was talking about
Comment 2 John Arthorne CLA 2008-01-08 10:19:51 EST
Created attachment 86396 [details]
Work in progress

My theory is:
 - multiple conflicting jobs are scheduled and added to the wait queue
 - the first job starts
 - Some jobs from the wait queue are removed and added to the  blocked list
 - The first job ends
 - The blocked jobs are re-inserted at the end of the wait queue
 - now the jobs at the end of the queue have lost their order relative to the jobs in the wait queue that were never blocked
Comment 3 Philipe Mulet CLA 2008-01-16 09:12:56 EST
Is this a regression over 3.3, or just an issue which has been in since day 1 ?
Comment 4 John Arthorne CLA 2008-01-16 10:01:16 EST
I believe this problem has existed since day 1 (Eclipse 3.0). I very much want to fix this, but I'm currently not optimistic for 3.3.2. For this kind of change I would want to have I-builds containing the fix for a few months to gain confidence in the fix before putting in a release. There is danger of introducing worse regressions than the current problem. I haven't come up with a working fix yet.
Comment 5 Philipe Mulet CLA 2008-01-16 12:32:54 EST
Would there be some workaround ? Like for jobs who care, then introduce some synchronization to ensure proper ordering ? 
Comment 6 John Arthorne CLA 2008-02-21 18:35:17 EST
Created attachment 90427 [details]
Updated patch

Fix one remaining bug from previous patch
Comment 7 John Arthorne CLA 2008-02-21 18:39:25 EST
The remaining bug was in JobManager#doSchedule:

//if it's a decoration job, don't run it right now if the system is busy
if (job.getPriority() == Job.DECORATE) {
	long minDelay = running.size() * 100;
	delay = Math.max(delay, minDelay);
}

This adds an extra delay on DECORATE priority jobs, to help avoid throttling the system when it's busy. The problem is, if you schedule 500 DECORATE jobs in a row, as in Dave's attached plugin, the priority gets shifted depending on the current number of running jobs. Example:

Job 1 is scheduled
Job 1 starts
Job 2 is scheduled - since running.size() == 1, it gets delayed by 100ms
Job 1 ends
Job 3 is scheduled - since running.size() == 0, it has no delay

-> Now Job 3 runs instead of Job 2.

My fix is to avoid this reordering optimization for jobs that have scheduling rules. This is the only way to maintain the invariant promised in the API (jobs with identical priority, delay, and conflicting rules will run in the order they were scheduled).
Comment 8 John Arthorne CLA 2008-02-21 18:53:03 EST
Fix and regression test released in 3.4 stream. I believe this is also the cause of many of our intermittent job test failures, so I will be re-enabling those tests once this fix is out.

I still believe this fix is too risky for 3.3.2. This is not a trivial change, and I don't think there is time for adequate testing to give me full confidence in the fix.