Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 319792

Summary: Performance regression in DependencyGraphImpl.waitForAllUpdates()
Product: [WebTools] WTP Java EE Tools Reporter: Jason Sholl <jsholl>
Component: jst.j2eeAssignee: Jason Sholl <jsholl>
Status: RESOLVED FIXED QA Contact: Chuck Bridgham <cbridgha>
Severity: normal    
Priority: P3 CC: alecross, david_williams
Version: 3.2Flags: david_williams: pmc_approved+
jsholl: pmc_approved? (raghunathan.srinivasan)
jsholl: pmc_approved? (naci.dai)
deboer: pmc_approved+
jsholl: pmc_approved? (neil.hauge)
jsholl: pmc_approved? (kaloyan)
cbridgha: review+
Target Milestone: 3.2.1   
Hardware: PC   
OS: Windows Server 2003   
Whiteboard: PMC_approved
Attachments:
Description Flags
patch for 3.2.1 none

Description Jason Sholl CLA 2010-07-13 17:07:17 EDT
An adopter product has found a performance regression in DependencyGraphImpl.waitForAllUpdates() in some performance tests which involve opening existing workspaces.  This regression was introduce by bug 317120.  In general the new code is much faster, but in this one particular scenario it is not.  The bug only affects single core machines and is very difficult to reproduce.

The while loop inside waitForAllUpdates() was designed with the assumption that the call to graphUpdateJob.schedule() would immediately schedule the job (and thus the loop exit condition of isUpdateNecessary() would then return false and the loop would exit), but this does not appear to be the case.  Instead, the loop runs very very tightly and completely hogs the processor with millions of wasted calls to jobILock.acquire() and jobILock.release().  These calls are expensive and account for minutes of wasted time.

The fix is to force a wait after the job is schedule (to ensure it has time to be schedule) with a call to graphUpdateJob.waitForRun(500).  The implementation is a polling based on 50ms waits for up to a total of 500ms.  There is some local synchronization code to ensure updates are not missed.  While a polling mechanism is not ideal, no pure solution presents itself.

This fix has been extensively tested by our performance team for over a week, has run through all JUnits, and has been tested locally for over a week.
Comment 1 Jason Sholl CLA 2010-07-13 17:09:38 EDT
Created attachment 174235 [details]
patch for 3.2.1
Comment 2 Chuck Bridgham CLA 2010-07-14 10:10:22 EDT
approved
Comment 3 Jason Sholl CLA 2010-07-14 10:34:09 EDT
    * Explain why you believe this is a stop-ship defect. Or, if it is a "hotbug" (requested by an adopter) please document it as such. 

Severe performance regression in some scenarios when starting an existing workspace.

    * Is there a work-around? If so, why do you believe the work-around is insufficient? 

No

    * How has the fix been tested? Is there a test case attached to the bugzilla record? Has a JUnit Test been added? 

JUnits, adhoc, and the specific performance tests that found this.

    * Give a brief technical overview. Who has reviewed this fix? 

Force a wait to avoid a super tight loop which prevents a job from every being scheduled.

    * What is the risk associated with this fix? 

Minimal
Comment 4 David Williams CLA 2010-07-14 11:47:30 EDT
I'll go ahead and mark as 'approved', but can you still describe how and who reviewed this fix? Its seems a bit of an odd fix to me ... I can't quite put my finger on it ... and just want to be sure some performance experts have reviewed. Has someone like Gary or Min reviewed and approved? Thanks.
Comment 5 Jason Sholl CLA 2010-07-14 12:11:14 EDT
Chuck and Min have both reviewed the fix.

code checked into head for wtp 3.2.1