| Summary: | using inner joins for dotNode in JPQL outside the where clause filters out null results | ||||||
|---|---|---|---|---|---|---|---|
| Product: | z_Archived | Reporter: | Markus KARG <markus.karg> | ||||
| Component: | Eclipselink | Assignee: | Chris Delahunt <christopher.delahunt> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | |||||
| Severity: | critical | ||||||
| Priority: | P2 | CC: | adrian.goerler, bht237, christopher.delahunt, jandam, kwesi, tom.ware | ||||
| Version: | unspecified | ||||||
| Target Milestone: | --- | ||||||
| Hardware: | PC | ||||||
| OS: | Windows XP | ||||||
| Whiteboard: | |||||||
| Attachments: |
|
||||||
*** Bug 245655 has been marked as a duplicate of this bug. *** *** Bug 245657 has been marked as a duplicate of this bug. *** As mentioned, the states that "Path expression navigability is composed using “inner join” semantics". 'A.B' is a path expression, in this case a single_valued_association_path_expression as A has a OneToOne relation to B. So the expression navigates to B using Inner Join semantics. My interpretation of the next statement is that it is attempting to explain inner joins by giving a specific example and what it entails:
"That is, if the value of a non-terminal association-field in the path expression is null, the path is considered to have no value, and does
not participate in the determination of the result"
This is a simplistic description of what inner join does, but does not seem to attempt all possible effects of inner join.
In the JPQL string "SELECT COUNT(a.b) FROM A a", you are counting B objects. To me, if B doesn't exist, it cant be counted. So, in the event an A references null, I would not expect that to be in the result.
Since OneToOne relationships can be bidirectional, there is no way to be sure that the foreign key for this relationship is in the A table. For instance, if table B has a foreign key to table A, then "SELECT COUNT(a.b) FROM A a" absolutely requires a join.
If a user wants the SQL "SELECT COUNT(t0.BID) FROM A t0" then the "BID" column used in the relation needs to have a basic mapping. For instance:
@Entity
Class A {
..
@OneToOne
@JoinColumn(name="BID")
B b;
@Basic
@Column(name="BID", insertable=false, updatable=false)
int bid; //what ever type the foreign key to B is
..
}
The JPQL would then be "SELECT COUNT(a.bid) FROM A a".
JPA spec leader Linda DeMichiel confirmed that I am right and this is a valid bug report. Chris, please see the the email excerpt that I forwarded to you. The problem lies when path expressions are used out side of the where clause, as they are not intended to limit the results that are to be returned. EclipseLink's use of inner joins for path expressions found outside the where clause (ie select statements and group by) filters results counter to what the user may have intended. For instance Select a, count(a.b) From A a would be intended to return all A objects and the number of referenced Bs. Because EclipseLink uses inner joins which will filter out nulls, the user will instead get results as if they issued the following JPQL: Select a, count(a.b) From A a Where count(a.b)>0 Outer joins should be used for path expressions that are not included in the where clause. This bug fix did not make the cut off for 2.0.0. We are deferring the bugs to Future where we can properly sort them all together based on community votes and severity. We will then assign them accordingly to future patch sets and releases. Changing the priority of the bugs that have been recently triaged to future. Targetting them to P2 will differentiate them from the P3s that have been triaged into future earlier. taking ownership and targeting for 2.1 as of all unassigned bugs, this bug has the most votes. Currently looking into test cases and debugging the code. Created attachment 160820 [details]
proposed patch
EclipseLink already used an outer join when selecting on a relationship, ie:
"Select a.b from a". Unfortunately, it would not use an outer join for nested relationships such as "Select a.b.c from a". This patch expands it so that outer joins are used within group by, having and Selects - unless the same join is specified in the where clause which would force it to use an inner join.
Updated the summary to reflect the problem and the uploaded patch. Patch has been checked in At revision: 6714 Apart from the fact that I am surprised by Markus' remark that "SELECT COUNT(t0.BID) FROM A t0" should return 2 in the given example (since the SQL standard requires the database to eliminate null values before applying the COUNT function), I have difficulties to see how Chris' statement "The problem lies when path expressions are used out side of the where clause, as they are not intended to limit the results that are to be returned." goes in line with the quoted paragraph of section 4.4.4 of the JPA 1.0 (and 2.0) standard. To my understanding the wording of the spec says that inner join semantics applies to the non-terminal part of a path expression and so-to-say outer join semantics to the terminal association *irrespectively* where the path expression occurs (at least I can't find any restriction to having or where clauses), such that select a.b.c from A a would return the nulls corresponding to the c's (i.e. where a.b is not null and b.c is), but not the ones corresponding to to the b's (i.e. where a.b is null) since a.b.c's with a.b being null per spec do not "participate in the determination of the result". (In reply to comment #11) > Apart from the fact that I am surprised by Markus' remark that > "SELECT COUNT(t0.BID) FROM A t0" > should return 2 in the given example (since the SQL standard requires the > database to eliminate null values before applying the COUNT function), Can you please publish an URL supporting this thesis? > I have > difficulties to see how Chris' statement > "The problem lies when path expressions are used out side of the where clause, > as they are not intended to limit the results that are to be returned." > goes in line with the quoted paragraph of section 4.4.4 of the JPA 1.0 (and > 2.0) standard. To my understanding the wording of the spec says that inner join > semantics applies to the non-terminal part of a path expression and so-to-say > outer join semantics to the terminal association *irrespectively* where the > path expression occurs (at least I can't find any restriction to having or > where clauses), such that > select a.b.c from A a > would return the nulls corresponding to the c's (i.e. where a.b is not null and > b.c is), but not the ones corresponding to to the b's (i.e. where a.b is null) > since a.b.c's with a.b being null per spec do not "participate in the > determination of the result". JPA 1.0 describes the Query Language in Chapter 4. Chapter 4 splits the definition of the Query Language into Chapters 4.4 (FROM Clause), Chapter 4.5 (WHERE Clause), Chapter 4.7 (GROUP and HAVING Clauses), Chapter 4.8 (SELECT Clause). As Spec Lead Linda DeMichiel told us in September 2008, whatever told in 4.4.4 is only valid inside of 4.4, but has no impact on other chapters, especially not on Chapter 4.8. That means: As the constraint to use INNER joins is told in 4.4.4 (i. e. 4.4 FROM Clause), it is only valid for the FROM Clause, but NOT for the SELECT clause (which is in 4.8). It is invalid to assume that something told in 4.4.4 also is valid for 4.8. The specification by intention declares all rules in different sections, since they are not to be applied to other sections. >> Apart from the fact that I am surprised by Markus' remark that >> "SELECT COUNT(t0.BID) FROM A t0" >> should return 2 in the given example (since the SQL standard requires the >> database to eliminate null values before applying the COUNT function), > Can you please publish an URL supporting this thesis? I don't have a URL, but I may quote from SQL2003 Foundation section 10.9 General Rules "4) If <general set function> is specified, then: a) Let TX be the single-column table that is the result of applying the <value expression> to each row of T1 and eliminating null values. If one or more null values are eliminated, then a completion condition is raised: warning null value eliminated in set function. b) Case: i) If DISTINCT is specified, then let TXA be the result of eliminating redundant duplicate values from TX, using the comparison rules specified in Subclause 8.2, <comparison predicate> , to identify the redundant duplicate values. ii) Otherwise, let TXA be TX. c) Let N be the cardinality of TXA. d) Case: i) If COUNT is specified, then the result is N." > JPA 1.0 describes the Query Language in Chapter 4. Chapter 4 splits the > definition of the Query Language into Chapters 4.4 (FROM Clause), Chapter 4.5 > (WHERE Clause), Chapter 4.7 (GROUP and HAVING Clauses), Chapter 4.8 (SELECT > Clause). As Spec Lead Linda DeMichiel told us in September 2008, whatever told > in 4.4.4 is only valid inside of 4.4, but has no impact on other chapters, > specially not on Chapter 4.8. That means: As the constraint to use INNER joins > is told in 4.4.4 (i. e. 4.4 FROM Clause), it is only valid for the FROM > Clause, but NOT for the SELECT clause (which is in 4.8). It is invalid to > assume that something told in 4.4.4 also is valid for 4.8. The specification > by intention declares all rules in different sections, since they are not to > be applied to other sections. In that case, inner join semantics should not be applied inside WHERE or HAVING conditions either since 4.4.4 is neither part of 4.5, 4.6 nor 4.7. If we look more careful into those sections, however, (like e.g. 4.6.3) we see that they built upon the definitions given section 4.4.4. Therfore I dare to doubt the above interpretation of the spec. "Select count(a.b) from A a" will only return a count of 1 in the example given. What I believed was meant (from other channels than the bug description) was that "Select a, count(a.b) from A a" is expected to return 2 rows - one with a count of 1 and the other with a count of 0, but because of the inner join being used, is not. This expands further to things like: "Select a, count(a.b.c) from A a" etc etc. The count obviously should be 1 or 0, but the question here is, if A has a null B, should it be returned? If As with Bs without Cs should be returned, why not As without Bs. The intent of a user is likely to have been to get all As, not just As with Bs. Count is easy to pick on, but "Select a.b.c from A a" isn't. I am currently rolling back the patch as it fails EclipseLink tests, so the issue as described still exists. Coming from an inner join semantics for non-terminal fields of a path expression, I would expect
(I) "Select a.b.c from A a"
to return all c's to which an a and a b exist such that c = a.b.c, including null if b.c is null for the respective a. a.b with b being null would therefore not contribute.
(II) "Select count(a.b.c) from A a"
seems easy since an empty score yields 0 (if all a.b's are null). Now
(III) "Select a, a.b.c from A a"
to my understanding has no other chance than to return only those a's where a.b is not null to be consistent with the above. Consequently, one could argue that
(IV) "Select a, count(a.b.c) from A a group by a"
should only return a's with a.b not being null and that rather something like
(V) "Select a, (select count(aa.b.c) from A aa where aa = a) from A a
group by a"
would return all a's and in particular a count of 0 for those a's with a.b being null. (V) is currently (JPA 2.0) expressible through the criteria API (though it needs not to be supported).
The TCk for JPA 1.0 enforces that inner joins be used so section 4.4.4, whether intended or not, must apply outside the FROM clause. The wording of section 4.4.4 is such that it is left open for interpretation for what should be used for the terminal relationship. It states: "Path expression navigability is composed using “inner join” semantics." and then in the next sentence explains what inner join means to a non-terminal relationship. EclipseLink has chosen to look at JPQL as seeming to treat entities as tables are in SQL. Users should be expecting that a.b.c selects from the C table, which would require a join with section 4.4.4 indicating inner joins are the default. After investigating multiple potential solutions, there is no way to implement this feature request without breaking backward compatibility. As this is not a spec violation, I am closing this request as not feasible to fix. There is a related request for enhancement at https://bugs.eclipse.org/bugs/show_bug.cgi?id=312146 "Improve Support for ON Clause" With it, one can add conditions to the outer join. Also, Tom Ware has managed to get this feature on an initial list for consideration for the next JPA specification. Please vote for the enhancement! The Eclipselink project has moved to Github: https://github.com/eclipse-ee4j/eclipselink |
TopLink and EclipseLink both are incorrect in the handling of navigation expressions (a.k.a 'the DOT operator'). Obviously, when writing a navigation expression like "a.b.c", then you cannot navigate from a to b or b to c when a or b is NULL, while it is not a problem if c would NULL -- unless you want to go on and not just USE c for some purpose (like getting used in an aggregate, ordering or grouping) but actually NAVIGATE OVER it (what means: another dot is FOLLOWING c). To point out this "USE or NAVIGATE" difference, chapter 4.4.4 of "EJB 3.0 Persistence Specification" (a.k.a "JPA 1.0") clarifies this with the following clear words: "Path expression NAVIGABILITY is composed using “inner join” semantics. That is, if the value of a NON-TERMINAL association-field in the path expression is null, the path is considered to have no value, and does not participate in the determination of the result." (noticed the very essential words: 'NAVIGABILITY' and 'NON-TERMINAL'?). This clearly points out that it makes a huge difference between NAVIGATION access and NON-NAVIGATION access of fields, where NAVIGATION access is defined as "NOT beeing used as the TERMINAL field" -- so BEEING the TERMINAL field obviously is NOT NAVIGATION access and is NOT part of chapter 4.4.4! Or in other words: INNER JOINs are only to be used in this sole case of the field beeing FOLLOWED by a DOT -- but NOT in any other case (event NOT NECESSARILY in the case of beeing PREFIXED by a dot)! Unfortunately neither TopLink not EclipseLink take care of this fine but essential difference of "SOME usage" and especially "NAVIGATIONAL usage": These products ALWAYS create inner joins, despite the type of using (i. e. without taking care of whether actually a dot is following), leading to several bugs (at least to the EclipseLink bug reports #245655 and #245657). Example: @Entity public class A { @Id private int aid; @ManyToOne private B b; } @Entity public class B { @Id private int bid; } JPA-QL: "SELECT COUNT(a.b) FROM A a" Obviously this query does NOT navigate over field 'b', since 'b' is NOT NON-TERMINAL; it just USES field 'b'. So this is NOT what chapter 4.4.4 talks about. So NO JOIN must be used at all. Imagine that Table A contains two rows. The first row pointing to some row in B. The second row having field 'b' set to NULL. The NON-TERMINAL part is NOT NULL while only the TERMINAL part is NULL. Clearly the INNER JOIN problem described in chapter 4.4.4 does NOT apply here since NO navigation BY A NULL POINTER is done! No JOIN is to be used! The average user would expect to see this SQL to be created: Excepted SQL: "SELECT COUNT(t0.BID) FROM A t0" with this expected result: "2". But what actually happens is: Actual SQL: "SELECT COUNT(t0.BID) FROM B t0, A t1 WHERE (t0.BID = t1.B_BID)" With the surprising result: "1". (In fact it is not really surprising, since 'NULL = NULL' always is FALSE). From my point of view, the cause of this bug is a misunderstanding of the above spec excerpt: While it is obvious to some of us that INNER JOIN must only be used in case of NON-TERMINAL fields, others seem to think that it is an advice to use INNER JOIN for ANY use of fields as long as a dot is in front of. But this is wrong. Correct is: It is not about whether having a dot IN FRONT OF, it is solely a question of having a dot AFTER it! 'a' has a dot after it, yes, but 'a' is not an association field at all -- and chapter 4.4.4 ONLY talks about association fields! Got the point? The solutions is quite obvious: TopLink and EclipseLink must learn the difference between having a dot BEFORE or AFTER a field. Nobody cares about having a dot BEFORE an association field, or having a dot AFTER a anything besides association fields! In those places NO JOIN is needed. As long as there is NO dot after an association field (i. e. as long as the field is TERMINAL or it is NOT an association field), no JOIN is needed. (I know I repeat myself -- just to make you really understand). ONLY when there is a dot AFTER the field (i. e. the field is NON-TERMINAL) and ONLY the fields is an association field, an INNER JOIN must be used. (I think you got it now.) The difference gets clear as soon as a more complex example is used: While currently "SELECT COUNT(a.b.c) FROM A a" will result in "...FROM C t0, B t1, A t0 WHERE (t0.CID = t1.C_CID) AND (t1.BID = t0.B_BID)" (what is wrong and leads to several bugs like omitting NULL in COUNT or GROUP BY), in future the correct SQL "...FROM A t1 JOIN B t0 ON (t1.B_BID = t0.BID)" will be used -- as you can see, in the 'correct' SQL, table C is NOT JOINED at all -- since it is unnecessary unless it will be used for further NAVIGATION AWAY from C. I hope that my explanation was not too hard to understand and reading the excerpt from chapter 4.4.4 helps to see the problem. From my point of view, this is a very severe violation of the JPA 1.0 specification, since omitting NULL (a result of the unnecessary joining) can lead to lots of hidden and hard to detect problems. In fact, it took me a complete business day to drill down my own business problems to this single, obvious misunderstanding of the spec. It would be great if you could fix this ASAP, to prevent further worries and follow-up problems.