Community
Participate
Working Groups
-- FEATURE REQUEST / NOT A BUG -- Currently EclipseLink makes a difference in the way JPA QL INNER JOINS and JPA QL OUTER JOINS are transformed into SQL. While an INNER JOIN is producing SQL like "FROM A t0, B t1 WHERE (t0.fk = t1.pk)", a typical OUTER JOIN is producing SQL like "FROM A t0 LEFT JOIN B t1 ON (t0.fk = t1.pk)". One could ask: What is the difference? There is a huge difference, and it is a problematic one! Actually, from the view of the core idea of the SQL syntax, only the OUTER JOINS are producing correct SQL, while the INNER JOINS are producing SQL that just ends up in the correct result (i. e. the incorrect SQL is producing the same correct result as the correct SQL would do itself). That means, if you just look at the data that comes out at the end, then the INNER JOINs currently produce a correct result. But when looking at the way the DBMS is asked to find the result by the actual SQL, there is a huge difference. And this difference can lead to major performance problems. In fact, "A t0, B t1 WHERE (t0.fk = t1.pk)" is NOT an INNER JOIN at all. It is a CROSS JOIN with result filtering returning the same result as an INNER JOIN would do. If we imagine a simple DBMS that has no optimizer (ok, most DBMS todays have very clever optimizers) then the difference is this: For simplicity of this sample, imagine that table A has 1000 rows, and B has 1000 rows. Each A points to exactly one B. Obviously a INNER JOIN between A and B will result in 1000 rows containing data of one A and B rows each. The performance obviously should be as fast as possible. A CROSS JOIN (carthesian product) is taking each row of A and combines it with each row from B, resulting in one million of combined rows (1 000 x 1 000 = 1 000 000 combinations). AFTER that is done, each such row combination is checked against the WHERE clause (i. e. one million rows get checked whether they match the constraint t0.fk = t1.pk). For 1000 rows this obviously is true (those are combinations of A and B that really 'belong together') while for the other 900 000 rows this obviously is wrong (because that are combinations of A and B that do not 'belong together'). So we scanned 900 000 row combinations for nothing but throwing them away afterwards. We assume that this complete operation needs 1 Mio. time units, of which just one thousand are used for a useful result. A INNER JOIN instead is working much more clever: For each row of A the primary key index of table B is scanned. The key index is organized as a B Tree so it is quite 'cheap' to find exactly the corresponding row in B that belongs to exactly this row of A. So to find that combinations of A and B that 'belong together', we assume that we need 1 000 lookups in A (full sequential table scan) and 1 000 lookups in the primarky key index of B, in sum 2 000 lookups. Let's assume that this needs two thousand time units. No more filtering is needed, the result is directly 'ready to use'. So a CROSS JOIN with filtering produces the same correct result than a INNER JOIN without any filtering would do. But for a much higher price. There are nice tools, like Sybase's "Query Optimizer Graph View" that can be used to run this test on your own (but be sure to switch off the optimizer before running such a test -- Sybase for example will rewrite CROSS JOINS automatically into INNER JOINS if possible, so you wouldn't see the difference). Even worse is the situation if more than two tables are joined, or when one table has much less rows than the others. Example: A contains ten rows, B contains one hundred rows and C contains one thousand rows. A CROSS JOIN will need approximately 10 x 100 x 1 000 = one million time units. A INNER JOIN will need just 10 + 10 + 10 = 30 time units. Why just ten? Because, as I said, only one table (here: A, since it is the shortest one) gets scanned sequentially, while all the others (here: B and C, since those have more rows than A) will get accessed by the primary key index (for each of the ten rows in A there is only one index lookup for the corresponding B row, and for this single row there is only one index lookup to find the corresponding C row: In sum, for each single A row you have just two additional rows to get). 30 time units, compared to one million time unit, that's a big difference, isn't it? As I said, modern DBMS often (but not necessarily) has a clever optimizer that can recognize CROSS JOINS with filtering and rewrite them into INNER JOINS. But not all DBMS do have this feature, also the feature is dependend on the actual SQL and database load. And, in some DBMS the optimizer goal is configurable and / or the optimizer can be disabled. So to guarantee that "best possible" execution plans will apply, "best possible" SQL should be produced by EclipseLink (obviously providing a perfect SQL will result in perfect execution, while providing a non-perfect SQL will just possibly result in perfect execution). That means, wherever possible, CROSS JOIN with filtering should be replaced by "real" INNER JOINS. Even if there currently is no supported DBMS to be found that doesn't optimize cleverly, this is no excuse to produce potentially slowly executed clauses. EclipseLink should be optimized to always create the best possible SQL.
Support has been added in 2.4 to generate INNER joins in the FROM clause. It is not enabled by default, but can be enabled in the platform by setting, platform.setPrintInnerJoinInWhereClause(false)
Created attachment 212223 [details] adds support for configuring inner joins in platform
SVN commit trunk: Bug#246356 inner joins https://bugs.eclipse.org/bugs/show_bug.cgi?id=246356 Adds support to configure INNER joins to be printed in the FROM clause similar to OUTER joins. Also fixes a performance regression. Code review: Andrei Changes: - Fix to ClassDescriptor multiple table initialization to ensure join expression is built for all tables in descriptor. (otherwise join is missed when outer/inner joining child) - Added support to Expressions to add INNER joins to FROM clause if configured to print INNER joins in the FROM clause. - Added printInnerJoinInWhereClause property to DatabasePlatform to allow inner join printing to be configurable. - Fixed inheritance views not working with outer/inner joins. - Fixed AggregateObjectMapping to only build wrapped CacheKey when required for protected cache. - Fixed ObjectLevelReadQuery equal to avoid properties check on cache query, as already checked elsewhere, and causes cache missed with prepared query. - Some cleanup.
This works for most joins, but there were some test failures enabling this. Seems to be some issues with subselects. Also, multiple table join will not be included in the from clause.
Parts of this have been addressed. Leaving open but deferring the rest.
Would be nice if this feature could be finished in the next release. :-)
The Eclipselink project has moved to Github: https://github.com/eclipse-ee4j/eclipselink