Community
Participate
Working Groups
As hinted in the review comments around the PsychoPath code base, the performance of long result sequences suffer quite a bit. This patch ensures that these operations are no worse than O(n log n) by using mergesort, and the performance can be improved further by tracking which collections are indeed already sorted and dupe-free.
Created attachment 172008 [details] Patch for this issue Not the last word in XPath2 performance, but a lot better than what's there. Use this as a starting point for the real fix, note that the copyright notices are not updated in this patch.
Copyrights aside, I'd like someone to review this patch for 3.2.1 as well. But I can't add Mukul as reviewer?
Ping - David could you review thsi when you get the chance?
With the following caveats: 1. since it is 3.2.2, it needs to run under 1.4 jdk. 2. Remove in the code the old comments that are still hanging around with: //XXX lame I thought we had gotten most of those out but looks like a few are still hanging about. For the PsychoPath 2.0 version to go into head, I would suggest where possible converting any for loop to a for each syntax, to eliminate the extra code required for advancing the iterator and casting to the appropriate instance type.
Created attachment 187059 [details] Better test for this issue - against 3.2.x codebase This patch adds a performance test for XPath2 operations on wide, deep or plain big trees.
Created attachment 187061 [details] Better performance fix for dealing with big ResultSequences - against 3.2.x codebase Better fix for this issue, intended for the branch. The fix is for Java 1.4, no generic stuff, and this fix is careful to not change public behaviour.
Requesting PMC review for this fix for 3.2.3 (Helios SR2) * Explain why you believe this is a stop-ship defect. - The XPath processor presently chokes on any "big" XML processing, due to some n^2 result traversals - This makes it unsuitable for real work with e.g. Xerces XML Schema 1.1 validation (adopter) * Workaround: No, this is due to inefficient internal methods from the original XPath2 code base. There is no workaround * How has the fix been tested? Is there a test case attached to the bugzilla record? Has a JUnit Test been added? - The added test case shows that the performance on big results (up to ~ 1 million elements) shows n log n performance were it was much worse before. - The usual battery of 8314 compliance and regression tests run without fail. * Give a brief technical overview. The old method of sorting and removing duplicates used two separate passes, one which used a nested loop iterating over the entire ResultSequence, and then afterwards sorts them. The new method inserts the sequence into a TreeSet and then flattens that tree into a list. Some of the axis iteration methods are also improved to add less overhead. * Who has reviewed this fix? - David Carver reviewed this initially. * What is the risk associated with this fix? - Low: This is an internal operation fix only, and has extensive test coverage due to the big XPath2 compliance test suite from W3C.
Excellent, thanks!
Jesper has this made it in yet? Does the current head need the same patch?
Moving to the XPath component out of XSL.
(In reply to comment #9) > Jesper has this made it in yet? > > Does the current head need the same patch? Yes, it's tagged and mapped in the 3.2.x branch, and yes, HEAD needs a similar patch, but the same patch would not apply cleanly due to generics differences. I'll create an issue for this.
(In reply to comment #11) > Yes, it's tagged and mapped in the 3.2.x branch Resolving.