Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 364627 - Inefficient JP QL -> SQL code generation for correlated exist joins
Summary: Inefficient JP QL -> SQL code generation for correlated exist joins
Status: NEW
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Eclipselink (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P2 major with 2 votes (vote)
Target Milestone: ---   Edit
Assignee: Nobody - feel free to take it CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-23 12:09 EST by APB CLA
Modified: 2022-06-09 10:30 EDT (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description APB CLA 2011-11-23 12:09:13 EST
Build Identifier: 2.0.3.v201010191057


The following named JP QL query:

  @NamedQuery(name = "GAFAssertionTerm.TOUWithFunction", query =
  "SELECT DISTINCT a FROM GAFAssertionTerm a, FormulaTerm ft "
  + "WHERE (a.formula = ft) AND (ft.arg0 = :arg0) AND (ft.formulaType = :gaf) AND "
  + "EXISTS (SELECT fe FROM FormulaEntries fe WHERE (ft.arg2 = fe.formulaItemId) AND "
  + "(fe.arg0Term = :function))"), 

Generates the following extremely inefficient SQL code:

SELECT DISTINCT t0.TERM_ID AS a1, t0.ASSERTED_ARGUMENT AS a2, t0.ASSERTED_TIMESTAMP AS a3, t0.DIRECTION AS a4, t0.STRENGTH AS a5, t0.TRUTH AS a6, t0.ASSERTED_BY AS a7, t0.ASSERTED_REASON AS a8, t0.FORMULA_ID AS a9, t0.MICROTHEORY_ID AS a10 
FROM KB.GAF_ASSERTION_TERM t0, KB.FORMULA_TERM t1 
WHERE ((((t0.FORMULA_ID = t1.TERM_ID) AND (t1.ARG0 = 1407374883554030)) AND (t1.FORMULA_TYPE = 1)) AND 
EXISTS (SELECT 1 
FROM KB.ITEM t4, KB.FORMULA_TERM t3, KB.FORMULA_ENTRIES t2 
WHERE ((((t3.ARG2 = t4.ITEM_ID) AND (t2.ARG_0_TERM = 1407374883556273)) AND (t1.TERM_ID = t3.TERM_ID)) AND (t4.ITEM_ID = t2.ITEM_ID))) ) 
OFFSET 0 ROWS FETCH NEXT 2048 ROWS ONLY

When I believe it should be generating the following, vastly most efficient code:

SELECT DISTINCT t0.TERM_ID AS a1, t0.ASSERTED_ARGUMENT AS a2, t0.ASSERTED_TIMESTAMP AS a3, t0.DIRECTION AS a4, t0.STRENGTH AS a5, t0.TRUTH AS a6, t0.ASSERTED_BY AS a7, t0.ASSERTED_REASON AS a8, t0.FORMULA_ID AS a9, t0.MICROTHEORY_ID AS a10 
FROM KB.GAF_ASSERTION_TERM t0, KB.FORMULA_TERM t1 
WHERE ((((t0.FORMULA_ID = t1.TERM_ID) AND (t1.ARG0 = 1407374883554030)) AND (t1.FORMULA_TYPE = 1)) AND 
EXISTS (SELECT 1 
FROM KB.FORMULA_ENTRIES t2 
WHERE ((t1.ARG2 = t2.ITEM_ID) AND (t2.ARG_0_TERM = 1407374883556273))))
OFFSET 0 ROWS FETCH NEXT 2048 ROWS ONLY

Note that the inefficient version is adding redundant joins to KB.ITEM t4 and KB.FORMULA_ENTRIES t2.


Reproducible: Always

Steps to Reproduce:
1.See details
2.
3.
Comment 1 APB CLA 2011-11-23 15:54:33 EST
I just tried this with eclipselink 2.3.1 and hibernate  3.6.8. The issue looks better but not fully resolved:

SELECT DISTINCT t0.TERM_ID AS a1, t0.ASSERTED_ARGUMENT AS a2, t0.ASSERTED_TIMESTAMP AS a3, t0.DIRECTION AS a4, t0.STRENGTH AS a5, t0.TRUTH AS a6, t0.ASSERTED_BY AS a7, t0.ASSERTED_REASON AS a8, t0.FORMULA_ID AS a9, t0.MICROTHEORY_ID AS a10 
FROM KB.GAF_ASSERTION_TERM t0, KB.FORMULA_TERM t1 
WHERE ((((t0.FORMULA_ID = t1.TERM_ID) AND (t1.ARG0 = 1407374883554030)) AND (t1.FORMULA_TYPE = 1)) AND 
EXISTS (SELECT 1 
FROM KB.ITEM t3, KB.FORMULA_ENTRIES t2 
WHERE (((t1.ARG2 = t3.ITEM_ID) AND (t2.ARG_0_TERM = 1407374883557531)) AND (t3.ITEM_ID = t2.ITEM_ID))) ) OFFSET 0 ROWS FETCH NEXT 2048 ROWS ONLY


Only 1 superfluous join is added now KB.ITEM t3 instead of 2.
Comment 2 APB CLA 2011-11-23 17:38:55 EST
This is important because many of my queries are in this form and the extra join makes the queries run about 5 orders or magnitude slower that necessary.

Just using simple transitivity reasoning, it should be obvious that:

(t1.ARG2 = t3.ITEM_ID) AND (t3.ITEM_ID = t2.ITEM_ID)

Can be simplified to (t1.ARG2 = t2.ITEM_ID) and the extra join on t3 dropped.
Comment 3 APB CLA 2011-11-28 11:34:24 EST
(In reply to comment #2)
> This is important because many of my queries are in this form and the extra
> join makes the queries run about 5 orders or magnitude slower than necessary.
> 
> Just using simple transitivity reasoning, it should be obvious that:
> 
> (t1.ARG2 = t3.ITEM_ID) AND (t3.ITEM_ID = t2.ITEM_ID)
> 
> Can be simplified to (t1.ARG2 = t2.ITEM_ID) and the extra join on t3 dropped.

The above reasoning is wrong, however, the extra join on Item is still not needed and caused massive slowdowns.

For instance:
  t1.ARG2 = 0
  t2.ITEM_ID = 0
  t3.ITEM_ID = 1

(t1.ARG2 = t3.ITEM_ID) AND (t3.ITEM_ID = t2.ITEM_ID) evaluates to FALSE,
whereas (t1.ARG2 = t2.ITEM_ID) evaluates to TRUE, so the two expressions
don't seem to be equivalent.
Comment 4 Tom Ware CLA 2011-12-02 10:24:54 EST
Setting target and priority.  See the following page for the meanings of these fields:

http://wiki.eclipse.org/EclipseLink/Development/Bugs/Guidelines

Community: Please vote for this bug if it is important to you.  Votes are one of the main criteria we use to determine which bugs to fix next.
Comment 5 Maarten Cleemput CLA 2015-07-01 10:35:41 EDT
Double entry: see also bug with id 298494
Comment 6 Eclipse Webmaster CLA 2022-06-09 10:30:19 EDT
The Eclipselink project has moved to Github: https://github.com/eclipse-ee4j/eclipselink