Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 316988

Summary: [xpath2] [performance] Removing duplicates and putting into document order takes O(n^2)
Product: [WebTools] WTP Source Editing Reporter: Jesper Moller <jesper>
Component: wst.xpathAssignee: Jesper Moller <jesper>
Status: RESOLVED FIXED QA Contact: Jesper Moller <jesper>
Severity: normal    
Priority: P3 CC: david_williams, d_a_carver, raghunathan.srinivasan, thatnitind
Version: 3.2Flags: david_williams: pmc_approved+
raghunathan.srinivasan: pmc_approved+
jesper: pmc_approved? (naci.dai)
jesper: pmc_approved? (deboer)
jesper: pmc_approved? (neil.hauge)
jesper: pmc_approved? (kaloyan)
d_a_carver: review+
Target Milestone: 3.2.3   
Hardware: All   
OS: All   
Whiteboard: PMC_approved
Bug Depends on: 316986    
Bug Blocks: 316990    
Attachments:
Description Flags
Patch for this issue
none
Better test for this issue - against 3.2.x codebase
jesper: review?
Better performance fix for dealing with big ResultSequences - against 3.2.x codebase jesper: review?

Description Jesper Moller CLA 2010-06-15 19:28:57 EDT
As hinted in the review comments around the PsychoPath code base, the performance of long result sequences suffer quite a bit.

This patch ensures that these operations are no worse than O(n log n) by using mergesort, and the performance can be improved further by tracking which collections are indeed already sorted and dupe-free.
Comment 1 Jesper Moller CLA 2010-06-15 20:05:36 EDT
Created attachment 172008 [details]
Patch for this issue

Not the last word in XPath2 performance, but a lot better than what's there.
Use this as a starting point for the real fix, note that the copyright notices are not updated in this patch.
Comment 2 Jesper Moller CLA 2010-06-24 00:54:20 EDT
Copyrights aside, I'd like someone to review this patch for 3.2.1 as well.
But I can't add Mukul as reviewer?
Comment 3 Jesper Moller CLA 2010-08-17 05:16:20 EDT
Ping - David could you review thsi when you get the chance?
Comment 4 David Carver CLA 2010-08-26 13:33:23 EDT
With the following caveats:

1. since it is 3.2.2, it needs to run under 1.4 jdk.
2. Remove in the code the old comments that are still hanging around with:

//XXX lame

I thought we had gotten most of those out but looks like a few are still hanging about.


For the PsychoPath 2.0 version to go into head, I would suggest where possible converting any for loop to a for each syntax, to eliminate the extra code required for advancing the iterator and casting to the appropriate instance type.
Comment 5 Jesper Moller CLA 2011-01-18 17:46:55 EST
Created attachment 187059 [details]
Better test for this issue - against 3.2.x codebase

This patch adds a performance test for XPath2 operations on wide, deep or plain big trees.
Comment 6 Jesper Moller CLA 2011-01-18 17:50:43 EST
Created attachment 187061 [details]
Better performance fix for dealing with big ResultSequences - against 3.2.x codebase

Better fix for this issue, intended for the branch.

The fix is for Java 1.4, no generic stuff, and this fix is careful to not change public behaviour.
Comment 7 Jesper Moller CLA 2011-01-18 18:07:28 EST
Requesting PMC review for this fix for 3.2.3 (Helios SR2)

* Explain why you believe this is a stop-ship defect.
 - The XPath processor presently chokes on any "big" XML processing, due to some n^2 result traversals
 - This makes it unsuitable for real work with e.g. Xerces XML Schema 1.1 validation (adopter)

* Workaround: No, this is due to inefficient internal methods from the original XPath2 code base. There is no workaround

* How has the fix been tested? Is there a test case attached to the bugzilla record? Has a JUnit Test been added?
 - The added test case shows that the performance on big results (up to ~ 1 million elements) shows n log n performance were it was much worse before.
 - The usual battery of 8314 compliance and regression tests run without fail.

* Give a brief technical overview.
The old method of sorting and removing duplicates used two separate passes, one which used a nested loop iterating over the entire ResultSequence, and then afterwards sorts them.
The new method inserts the sequence into a TreeSet and then flattens that tree into a list.
Some of the axis iteration methods are also improved to add less overhead.

* Who has reviewed this fix?
 - David Carver reviewed this initially.

* What is the risk associated with this fix?
 - Low: This is an internal operation fix only, and has extensive test coverage due to the big XPath2 compliance test suite from W3C.
Comment 8 David Williams CLA 2011-01-18 18:20:48 EST
Excellent, thanks!
Comment 9 David Carver CLA 2011-01-27 23:38:48 EST
Jesper has this made it in yet?

Does the current head need the same patch?
Comment 10 David Carver CLA 2011-01-28 00:17:13 EST
Moving to the XPath component out of XSL.
Comment 11 Jesper Moller CLA 2011-02-03 04:03:33 EST
(In reply to comment #9)
> Jesper has this made it in yet?
> 
> Does the current head need the same patch?

Yes, it's tagged and mapped in the 3.2.x branch, and yes, HEAD needs a similar
patch, but the same patch would not apply cleanly due to generics differences.

I'll create an issue for this.
Comment 12 Nitin Dahyabhai CLA 2011-02-03 14:11:54 EST
(In reply to comment #11)
> Yes, it's tagged and mapped in the 3.2.x branch

Resolving.