Community
Participate
Working Groups
I am reviewing the patch for bug 247564 deep enough to allow me to design some scenarios for white box testing. I'll post comments in batches so progress be made even as the review is still happening. Here is the first batch: (1) org.eclipse.jdt.core.tests.compiler/src/org/eclipse/jdt/core/tests/compiler/regression/BatchCompilerTest test311_warn_options() references the wrong bug id. (2) org.eclipse.jdt.core/compiler/org/eclipse/jdt/internal/compiler/ast/Block.java There are no changes in this file ?? This needn't have been released. (3) org.eclipse.jdt.compiler.tool/src/org/eclipse/jdt/internal/compiler/tool/Options.java org.eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/messages.properties org.eclipse.jdt.internal.compiler.lookup.VariableBinding (at least) These files are missing copyright change. (4) org.eclipse.jdt.internal.compiler.ast.Expression.java I think Expression.variableBinding(Scope) should be renamed to be getVariableBindingForNullAnalysis(Scope) (or some similar name.) The way the current name sounds, it has the feel of a general API, but it doesn't work that way. While javadoc can be improved to capture the fact that null can be returned by the method even where valid field binding exists, changing the method name is a better way. (5) resetNullInfoForFields should be done not only for MessageSends but also for other code artifacts executions. For example, in the following, there is an incorrect warning: public class X { static Object lastObject; public static void main(String [] args) { if (lastObject == null) { new X(); lastObject.toString(); } } static void foo() { } X() { lastObject = this; } } (6) Looking at SingleNameReference.variableBinding and QualifiedNameReference.variableBinding Should we cache the compiler options somehere ? We know it is expensive to look this up from deely nested context and to do this for every SNR and QNR in the program is hugely wasteful ?
(In reply to comment #0) > (5) resetNullInfoForFields should be done not only for > MessageSends but also for other code artifacts executions. > For example, in the following, there is an incorrect warning: The typical suspects would be MessageSend AllocationExpression QualifiedAllocationExpression ExplicitConstructorCall Am I missing any?
(In reply to comment #1) > The typical suspects would be > MessageSend > AllocationExpression > QualifiedAllocationExpression > ExplicitConstructorCall > > Am I missing any? That should be it. For ECC, since the call must be the first statement it should be a nop really. For QAE, we need to see if we want to handle anonymous types specially.
(In reply to comment #2) > (In reply to comment #1) > > > The typical suspects would be > > MessageSend > > AllocationExpression > > QualifiedAllocationExpression > > ExplicitConstructorCall > > > > Am I missing any? > > That should be it. For ECC, since the call must be the first statement > it should be a nop really. Take that back. Any nullness discovered during evaluation of argument expressions must be discarded so it is a not a nop. Likewise QAE with anonymous types would also call for throwing out nullness information for fields, as they involve a super class constructor call.
(7) We need to make sure that right expectations are set by documenting that it is only the receiver object whose fields get analyzed (add the other nuances here). I myself was surprised by this during my testing. Once you know the details of the implementation you can see why that is the case, but for an end user it will be a surprise too. (8) In the snippet below, why won't we not warn for the super line ? class Y { Object x; } public class X extends Y { public void main(String[] args) { if (this.x == null) { this.x.toString(); // we warn here } if (super.x == null) { super.x.toString(); // no warning here } if (x == null) { x.toString(); // warning here. } } }
I have completed the review and here are the next batch of comments. I'll continue to test it a bit more and post more comments if I have them. At the moment, this is it: (9) org.eclipse.jdt.core/compiler/org/eclipse/jdt/internal/compiler/flow/UnconditionalFlowInfo.java Various copy methods are doing shallow copy of extraConstantFieldMask. Is this intentional ? At a minimum, there should be a comment that says why it is OK. (10) In diamond shaped hierarchies, or otherwise when super interfaces are specified redundantly we could be double counting interface fields. I don't think it is a problem though. (11) I think this program produces very confusing messages: public class X { static final Object o = null; void foo() { if (o.toString() == "") {} if (o == null) {} if (o != null) {} } } I understand why it is happening, but we should certainly do better by perhaps not updating nullness information for static final fields on the go when possible. The "dereferenced here, so later it should be treated as non null" approach is problematic for static finals. (I agree the problem will still be there for the general case.) (12) Why does this code elicit only one warning ? import org.eclipse.jdt.annotation.Nullable; public class X { Object o; void foo(@Nullable Object p) { p.toString(); // warning here o = p; o.toString(); // no warning here } }
(In reply to comment #4) > (7) We need to make sure that right expectations are set by documenting > that it is only the receiver object whose fields get analyzed (add > the other nuances here). I myself was surprised by this during my testing. > Once you know the details of the implementation you can see why that > is the case, but for an end user it will be a surprise too. Thinking more about this, it sounds like a serious design flaw to me that we analyze only the current object's (i.e the receiver object's) fields for nullness. We should have considered a design that was flexible and powerful enough to have allowed us to analyze arbitrary usages of fields from any class. Right now the nullness information is tracked for all fields of current object (at least space is allocated to track) - this is wasteful. Given the program below: public class X { Object field0, field1, field2, field3, field4, field5, field6, field7, field8, field9, field10, field11, field12, field13, field14, field15, field16, field17, field18, field19, field20, field21, field22, field23, field24, field25, field26, field27, field28, field29, field30, field31, field32, field33, field34, field35, field36, field37, field38, field39, field40, field41, field42, field43, field44, field45, field46, field47, field48, field49, field50, field51, field52, field53, field54, field55, field56, field57, field58, field59, field60, field61, field62, field63, field64, field65, field66, field67, field68, field69, field70, field71, field72, field73, field74, field75, field76, field77, field78, field79, field80, field81, field82, field83, field84, field85, field86, field87, field88, field89, field90, field91, field92, field93, field94, field95, field96, field97, field98, field99; static final Object o; static final Object o1; static { o = null; o1 = new Object(); } void foo() { if (o.toString() == "") {} if (o == null) {} if (o != null) {} if (o1 == null) {} if (o1 != null) {} } } my understanding is that we reserve space for tracking all the fields of the receiver object, but only o and o1 need tracking at all. On the contrary given: class X { Object field; void foo(X x) { if (x.field == null) { // some code here. } x.field.toString() } } it would have been very useful to track the fields of the reference `X x' but we don't. I think this would have been possible, if we tracked the set of all "root references" in a method, assigned unique reference ids to them (at the method level) and then further as and when we encountered reference fields of these already id'd references, assigned unique reference ids to the fields and so on ... For example given: class X { // Gazillion fields here. Object f; void foo(X x1, X x2, Y y) { if (x1.f == null) { // some code ... } if (x2.f != null) { // some code } if (y.yfield == null) { // some code } if (this.f == null) { // some code ... } } } in the method foo, the root references are x1, x2, y and `this'. If we assign reference ids in the order of encountering references then x1 --> 1 x2 --> 2 y --> 3 x1.f --> 4 x2.f --> 5 y.yfield --> 6 this --> 7 this.f --> 8 So we would have to track the nullness of 8 references and we don't care about the gazillion other fields of X that are not referenced in this method. Of course we would have to extend the scheme for static objects, and also make this scheme play well with local and arguments, but I think it can be done. For analogy see that nullness tracking works for all arguments and all local variables of *any* reference types. It seems like the current implementation is seriously self-limiting. I don't see a reason why we should not diagnose arbitrary fields of arbitrary types. From a solution usability point of view, how often do we discover an NPE when the null reference strictly originates in the current object ?
(In reply to comment #6) > (In reply to comment #4) > > (7) We need to make sure that right expectations are set by documenting > > that it is only the receiver object whose fields get analyzed (add > > the other nuances here). I myself was surprised by this during my testing. > > Once you know the details of the implementation you can see why that > > is the case, but for an end user it will be a surprise too. > > Thinking more about this, it sounds like a serious design flaw to me > that we analyze only the current object's (i.e the receiver object's) > fields for nullness. Basically so there is clarity, against the example below, it is very surprising that there is no warning in the line so annotated: public class X { Object field; void goo(X x) { if (this.field == null) { this.field.toString(); // potential NPE warning here. } if (x.field == null) { x.field.toString(); // No warning here. } } }
One other observation. The current implementation basically adopts the ostrich approach when it comes to aliases. This may be OK since we only report potential problems/warnings. For example, the warning below is most likely bogus: // --- public class X { Object field; void goo(X x) { X that = this; if (this.field == null) { that.field = new Object(); this.field.toString(); // potential NPE warning here. } } } // --- We don't have any infrastructure for alias analysis and so this will have to be documented as a known limitation.
(In reply to comment #7) > (In reply to comment #6) > > (In reply to comment #4) > > > (7) We need to make sure that right expectations are set by documenting > > > that it is only the receiver object whose fields get analyzed (add > > > the other nuances here). I myself was surprised by this during my testing. > > > Once you know the details of the implementation you can see why that > > > is the case, but for an end user it will be a surprise too. > > > > Thinking more about this, it sounds like a serious design flaw to me > > that we analyze only the current object's (i.e the receiver object's) > > fields for nullness. > > Basically so there is clarity, against the example below, it is > very surprising that there is no warning in the line so annotated: > > public class X { > Object field; > void goo(X x) { > if (this.field == null) { > this.field.toString(); // potential NPE warning here. > } > if (x.field == null) { > x.field.toString(); // No warning here. > } > } > } I understand your surprise but I don't think we're able to significantly improve in this situation. A technical limitation: we already had to jump through hoops to define a safe mapping from variables to slots in FlowInfo when only considering fields reachable via some kind of 'this' plus locals from all lexically enclosing scopes. Linearizing fields from arbitrary classes into this mapping would be - difficult - to say the least. Conceptually, we're facing a similar limitation as also for the resource leak warnings: the analysis knows nothing about instances. We have no means to see that x.field and x.field (sic) are the same field, because we don't know if both references 'x' refer to the same instance. We could obviously start by making 'x' final and say the analysis could start from that. But then the next user will assign the final 'x' to a final 'y' and will expect that the analysis knows they're both the same instance (see also your point about aliases). The analysis knows nothing about instances - except: 'this'. It so happens that in Object Teams we have a notion of dependent types, where the type system needs to know that two instances are identical. In this context we actually have an analysis where assignments between finals are analysed, but that by itself was a big project, and I doubt, that solving field analysis just across final references would help for many situations. I see the big advantage in bug 247564 in the fact that we now have a strong infrastructure for including fields into the analysis. Unfortunately, for regular fields (neither final nor annotated) this alone won't buy you much, but with 'final' or null annotations this becomes a really powerful tool. So I strongly agree if you say we need to communicate what we can/cannot detect. In this message I would add the advice to make code better analyzable by using as much as reasonably possible: - final - null annotations - locals(!) OTOH, I know enough people who'd love to see ownership and/or alias control annotations in Java :) With that kind of information we could say a lot more about instances... these concepts and systems exist, but ...
Sorry for commenting out-of-order, I didn't fully read comment 6 before I started answering. (In reply to comment #6) > I think this would have been possible, if we tracked the set of all "root > references" in a method, assigned unique reference ids to them (at the > method level) and then further as and when we encountered reference fields > of these already id'd references, assigned unique reference ids to the fields > and so on ... I think the idea in the current implementation is that lookup in a flowInfo should use an int id that can be directly found in the variable's binding (for fields we only add an offset found in the declaringClass binding). By contrast, if we use per-method ids, these cannot easily be stored in the bindings which are shared across all of the compilation. To me it seems we would need to add another indirection (every problem can be solved by introducing another indirection, right?): a per-method mapping of variables to ids. Well, per method is not strictly true: the method must also "inherit" all id-assignments from any enclosing scopes. AFAICS this mapping would need to be implemented using some kind of hash table. I believe the current implementation took great care to avoid any hash tables - for performance reasons, because every lookup in any flowinfo would be penalized by this indirection. And my impression is that lookup in a flowinfo happens very often during analysis. Since I wasn't "here" during the initial implementation I can only try to interpret the rationale behind this implementation and try to play along those lines. Given all the complexity of following values across assignments (on some branches) and aliasing and concurrency, I doubt that possible improvements in this direction will ever stand in a good proportion to the efforts. Much more can be much easier achieved with annotations :) > [...] > For analogy see that nullness tracking works for all arguments and > all local variables of *any* reference types. I didn't understand this point.
(In reply to comment #9) > I understand your surprise but I don't think we're able to significantly > improve in this situation. > > A technical limitation: we already had to jump through hoops to define a safe > mapping from variables to slots in FlowInfo when only considering fields > reachable via some kind of 'this' plus locals from all lexically enclosing > scopes. Linearizing fields from arbitrary classes into this mapping would be - > difficult - to say the least. This is what the id assignment using root references accomplishes. I would like to see where that discussion goes - we _may_ end up with a much simpler scheme. > Conceptually, we're facing a similar limitation as also for the resource leak > warnings: the analysis knows nothing about instances. We have no means to see > that x.field and x.field (sic) are the same field, because we don't know if > both references 'x' refer to the same instance. If we don't see an intervening assignment to x or a method call, why won't we assume x.field is same x.field ? (ignoring the case that x itself could be y.x and could be modified in another thread) > We could obviously start by > making 'x' final and say the analysis could start from that. But then the next > user will assign the final 'x' to a final 'y' and will expect that the analysis > knows they're both the same instance (see also your point about aliases). [...] > It so happens that in Object Teams we have a notion of dependent types, where > the type system needs to know that two instances are identical. In this context > we actually have an analysis where assignments between finals are analysed, but > that by itself was a big project, and I doubt, that solving field analysis just > across final references would help for many situations. No, I am not suggesting that we address the aliasing issue. (In reply to comment #10) > I think the idea in the current implementation is that lookup in a flowInfo > should use an int id that can be directly found in the variable's binding (for > fields we only add an offset found in the declaringClass binding). Co-incidentally, as I was reading this passage, I thought of the maxim "every problem can be solved by ..." only to scroll down and see that you have used it yourself :) > To me it seems we would need to add another indirection (every problem can be > solved by introducing another indirection, right?): a per-method mapping of > variables to ids. Well, per method is not strictly true: the method must also > "inherit" all id-assignments from any enclosing scopes. Do you have a short code snippet example that shows where this is required ? Given that a method should not not carry over any initialization status from other methods, it looks like you are talking about some initialization block related nullness info may be ? I am not clear. So a code example and a reference to JDT/Core class that handle this part in the current implementation would help. >AFAICS this mapping > would need to be implemented using some kind of hash table. I believe the > current implementation took great care to avoid any hash tables - for > performance reasons, because every lookup in any flowinfo would be penalized by > this indirection. And my impression is that lookup in a flowinfo happens very > often during analysis. Yes, I was thinking of a hash table, but interposing some cache lines between an interested party and the table would/should address performance concern by taking advantage of locality of references. In any case, we can't on account of anticipated performance worries settle for limitations. (not saying you are suggesting it) > Since I wasn't "here" during the initial implementation I can only try to > interpret the rationale behind this implementation and try to play along those > lines. > > Given all the complexity of following values across assignments (on some > branches) and aliasing and concurrency, I doubt that possible improvements in > this direction will ever stand in a good proportion to the efforts. I think we should explore the id assignment via root references approach. I think ~90% of implementation (ok, a bit handwaving there) is good and can readily be carried out to the alternate approach. > > [...] > > > For analogy see that nullness tracking works for all arguments and > > all local variables of *any* reference types. > > I didn't understand this point. A tautological observation that a method foo of type X gets its locals analyzed even though they may be of type Y or Z.
(In reply to comment #11) > (In reply to comment #9) > Well, per method is not strictly true: the method must also > > "inherit" all id-assignments from any enclosing scopes. > > Do you have a short code snippet example that shows where this is > required ? Given that a method should not not carry over any initialization > status from other methods, I guess you mean local and/or anonymous class methods referencing names from enclosing methods. I think this can be made to work in a straightforward way.
(In reply to comment #11) > > Conceptually, we're facing a similar limitation as also for the resource leak > > warnings: the analysis knows nothing about instances. We have no means to see > > that x.field and x.field (sic) are the same field, because we don't know if > > both references 'x' refer to the same instance. > > If we don't see an intervening assignment to x or a method call, why > won't we assume x.field is same x.field ? Currently, the analysis doesn't have a notion of "no intervening assignment to x between to expressions". That's requires an entirely new analysis. Secondly, I'm afraid we might end up with a set of pre-conditions for this kind of analysis that will never be met by the majority of code. > (ignoring the case that x > itself could be y.x and could be modified in another thread) Hm, do you see any reasons why this ostrich is fit for living? :) Looking deeper, already for 'this.f' references and even in single thread scenarios we're on unsafe grounds: if (this.f == null) { y.f = z; this.f.op(); // looks like NPE but if this == y and z != null its not. } Do you want to extend this to x.f and y.f, or should we invalidate all field information on every field assignment? (Note that the analysis doesn't see the name "f" to draw a connection between x.f and y.f). > [...] > No, I am not suggesting that we address the aliasing issue. I agree, but then any analysis that follows any chains of references will be *very* vague (should I say: guess work?) Locals are the only variables owned by the current method. Including 'this' based references is a compromise that already has some weakness. I wouldn't expect any useful results from looking deeper into the object graph. > > [...] Well, per method is not strictly true: the method must also > > "inherit" all id-assignments from any enclosing scopes. > > Do you have a short code snippet example that shows where this is > required ? Given that a method should not not carry over any initialization > status from other methods, it looks like you are talking about some > initialization block related nullness info may be ? I am not clear. > So a code example and a reference to JDT/Core class that handle this > part in the current implementation would help. As for a code reference: please see any location where a flowInfo is passed from a type into its methods (carrying information from initializers), from a method into local types (carrying information also about local variables accessible to the local class) and from there to methods of the local class and so on. > Yes, I was thinking of a hash table, but interposing some cache lines > between an interested party and the table would/should address performance > concern by taking advantage of locality of references. The most efficient solution I could see is: Continue storing ids in variable bindinds, but strictly separate analysis ids from code gen ids. Initialize all ids to -1, to enable on-demand assignment of a fresh id for the current context. Maintain a list of all variables for which the current method has assigned an id. At the end of analyzing a method reset all ids of all referenced variables to -1. That should actually perform better then any hash table approach, no? Still I see two issues: id assignment happens more frequently (per method) than now (per type), if anything in this management is expensive, we'll pay. Secondly, it will break the assumption that we have a clear boarder line between fields and locals, which is used in some flow info operations that split field related info from local related info. Solvable, for sure, by introducing yet another mask in flowInfo. > > Since I wasn't "here" during the initial implementation I can only try to > > interpret the rationale behind this implementation and try to play along those > > lines. > > > > Given all the complexity of following values across assignments (on some > > branches) and aliasing and concurrency, I doubt that possible improvements in > > this direction will ever stand in a good proportion to the efforts. > > I think we should explore the id assignment via root references > approach. I think ~90% of implementation (ok, a bit handwaving there) > is good and can readily be carried out to the alternate approach. Re-writing 10% of the implementation sounds like a huge project to me. My main point is: given the inherent weakness of any analysis into the object graph I don't see the efforts balanced by correspondingly large benefits.
(In reply to comment #13) > (In reply to comment #11) > > > Conceptually, we're facing a similar limitation as also for the resource > > If we don't see an intervening assignment to x or a method call, why > > won't we assume x.field is same x.field ? > > Currently, the analysis doesn't have a notion of "no intervening assignment to > x between to expressions". That's requires an entirely new analysis. Not so. We already track every single assignment and update null info. Otherwise the flow analysis won't work at all whether it be for locals or for fields of the current object. I am not clear why you think entirely new analysis is needed. Also see below. > > (ignoring the case that x > > itself could be y.x and could be modified in another thread) > > Hm, do you see any reasons why this ostrich is fit for living? :) I was merely calling out certain inherent assumptions in the current approach, which will also carry over to any alternate design. > Looking deeper, already for 'this.f' references and even in single thread > scenarios we're on unsafe grounds: > if (this.f == null) { > y.f = z; > this.f.op(); // looks like NPE but if this == y and z != null its not. > } > Do you want to extend this to x.f and y.f, or should we invalidate all field > information on every field assignment? (Note that the analysis doesn't see the > name "f" to draw a connection between x.f and y.f). This is the same issue pointed out in comment#8, No I don't want to (a) invalidate field information on every field assignment (b) Nor do I want to come up with any alias analysis framework. As comment#8 says, we should simply document this away. > > [...] > > No, I am not suggesting that we address the aliasing issue. > > I agree, but then any analysis that follows any chains of references will be > *very* vague (should I say: guess work?) No, not at all. The static analysis frame work has two inherent limitations: (a) it is unaware of aliases (b) it is unaware of what other threads could be doing. This is equally true of the current design and of the alternate design I am suggesting. In both cases and for both designs the suggested remedy is the same: document the limitation and also be as conservative as possible in diagnostics. Over and above that, I don't see any vagueness or guesswork here. So if we see a x.y.z = null; // some code x.y.z.toString() AND if the block `// some code' does not contain any method calls AND does not have any assignment to x or to x.y or to x.y.z then it is not vague or guesswork to issue a potential NPE at the toString call. Sure, there could be aliases and other threads that make the warning wrong, but it is reasonable to warn of a potential NPE. > Locals are the only variables owned by the current method. Including 'this' > based references is a compromise that already has some weakness. I wouldn't > expect any useful results from looking deeper into the object graph. Why ? If you would get two NPE warnings in public class X { Object field; void goo(X x) { if (this.field == null) { this.field.toString(); // potential NPE warning here. } if (x.field == null) { x.field.toString(); // Potential NPE warning here. } } } why is not useful ? On the contrary I would argue that by imposing the current limitation around analyzing only the receiver's fields we have seriously impaired the usefulness of this feature. > That should actually perform better then any hash table approach, no? Perhaps, I am sure these implementation details could be worked out. > Still I see two issues: id assignment happens more frequently (per method) than > now (per type), if anything in this management is expensive, we'll pay. > Secondly, it will break the assumption that we have a clear boarder line > between fields and locals, which is used in some flow info operations that > split field related info from local related info. I don't think this assumption needs to be broken at all. It is an implementation issue that could be worked on. > Re-writing 10% of the implementation sounds like a huge project to me. [...] > My main point is: given the inherent weakness of any analysis into the object > graph I don't see the efforts balanced by correspondingly large benefits. My main concern is around the surprise factor: saying we support null analysis of fields is way more general and taller claim than what we actually do which is to track the fields of a single object per method out of the multitudes of objects out there. Let us continue the discussion. I think it can be addressed and the 10% is the first 10% and not the last 10% (you know the joke about how the first 90% of the project takes 90% of the time and the last 10% also takes 90% of the time :))
Let me break this down in smaller pieces :) (In reply to comment #14) > (In reply to comment #13) > > (In reply to comment #11) > > > > Conceptually, we're facing a similar limitation as also for the resource > > > > If we don't see an intervening assignment to x or a method call, why > > > won't we assume x.field is same x.field ? > > > > Currently, the analysis doesn't have a notion of "no intervening assignment to > > x between to expressions". That's requires an entirely new analysis. > > Not so. We already track every single assignment and update null > info. Otherwise the flow analysis won't work at all whether it be > for locals or for fields of the current object. I am not clear why you > think entirely new analysis is needed. Also see below. Example: (1) X x1 = new X(true), x2 = new X(false); (2) if (x1.field != null) { (3) x1 = x2; (4) System.out.println(x1.field); (5) } From the POV of both our analyses line 3 is a nop because it neither changes the assignedness nor the null status of x1. The flowInfo will remain unchanged across this assignment. However, line 3 breaks the analysis for line 4, because there is one relevant piece of information missing: *which* X instance is bound to x1 and is it the same instance in lines 2 and 4? To make things worse: (1) X x1 = new X(true), x2 = new X(false); (2) if (x1.field == null) return; (3) if (flag1) (4) x1 = x2; (5) System.out.println(x1.field); Now line 5 needs to account for the fact that x1 may *perhaps* still be the same instance. And for all these you'll need new points of reference ("since xyz"). Assignedness analysis checks whether a variable has *ever* been assigned. For indirect field references we need to multiple this for every potential xzy in "since xyz", to answer questions like: has x1 been assigned since line 2? Two fundamental points missing: - no concept of instances - analysis accumulates absolute state, cannot make statements about relation between the states in two different locations. Do you have a magic wand up your sleeve? I don't.
(In reply to comment #15) > Example: > (1) X x1 = new X(true), x2 = new X(false); > (2) if (x1.field != null) { > (3) x1 = x2; > (4) System.out.println(x1.field); > (5) } > From the POV of both our analyses line 3 is a nop because it neither changes > the assignedness nor the null status of x1. The flowInfo will remain unchanged > across this assignment. It should NOT remain unchanged across this assignment. This is the same observation I made earlier in comment#14, to excerpt from there: //------ begin excerpt So if we see a x.y.z = null; // some code x.y.z.toString() AND if the block `// some code' does not contain any method calls AND does not have any assignment to x or to x.y or to x.y.z then it is not vague or guesswork to issue a potential NPE at the toString call. //-- end excerpt But where we do see an intervening assignment as in your current example, the flow info should not remain unchanged (and we should not issue a potential NPE.) You are looking at what code is doing *today* to come to your conclusion i.e per the code today, the flowinfo will remain unchanged and so we will issue a wrong warning. This is true, but the code once null support for fields is in place should adapt and not merely continue to do what it does today. To wit, at 3.8 M4 time, for the purposes of null analysis a reference object is simply an opaque handle/token. Support for null analysis for fields means a reference object cannot remain opaque and we need to crack open the shell, peer inside and track the state of affairs inside. This calls for new rules to be defined and new behavior to be implemented for how we initialize, propagate, merge, reset and update nullness information across assignments and control flow constructs. For the assignment you cite i.e x1 = x2; the right behavior is to (a) toss out everything we know about the nullness of x1 *and* all its tracked (id'd) member references (transitively) and carry over the information from RHS i.e if RHS happens to be a tracked (id'd) reference, carry over the nullness of all its tracked (id'd) member references (transitively)(making allowance for upcasting) New code ? yes, new behavior ? yes, entirely new analysis ? No. > To make things worse: [...] > Now line 5 needs to account for the fact that x1 may *perhaps* still be the > same instance. This case is already covered by the merger rules for flow info for control flow constructs. > And for all these you'll need new points of reference ("since xyz"). > Assignedness analysis checks whether a variable has *ever* been assigned. For > indirect field references we need to multiple this for every potential xzy in > "since xyz", to answer questions like: has x1 been assigned since line 2? I don't understand this point well enough to say whether this is already covered by the explanation above. Perhaps if could clarify, we can discuss further. > Two fundamental points missing: > - no concept of instances This is covered. > - analysis accumulates absolute state, cannot make statements about relation > between the states in two different locations. Again this point is not clear to me to say whether it is addressed by the explanation. > Do you have a magic wand up your sleeve? I don't. I don't have a magic wand, but nothing we have discussed so far seems to demand any magic or any extensive new analysis. I see two valid concerns articulated so far: - Doing this means more work, unplanned for so far: true - State tracking via root references and their member references and the new rules for propagation may have performance implications: again true, but I think it can be managed such that there is zero penalty when the option if off and only within reason minimal penalty when it is on. Looking at the broader null analysis topic, I see a few impediments to its wide adoption (given a reasonably bug free implementation) - Perceived or real high set up/initial investment costs. - Overly aggressive analysis resulting in numerous spurious false positive diagnostics. - Under aggressive self limited analysis that misses reporting obvious cases. IMHO, the subject at hand triggers the third impediment. See that even for the current object, we track only `this.field` and not `this.field.field`. Working within the same set of caveats stipulated and true for current work viz: - We don't know anything about aliases - We don't know anything about threads - Absence of warning does not imply absence of NPE. I think we can significantly raise the bar here.
Just so there is clarity on this point: I do view the current solution to bug 247564 as being correct and internally consistent. It is not broken in any sense of the term. For what I feel is a lost opportunity to have delivered a more general solution, first and foremost and for the most part I hold myself responsible :) The earliest implementation was posted as a patch eons ago on 2010-12-20. I had marked myself as a secondary reviewer on 2011-12-14, but only got to start the review on 2012-01-24 :( My excuse is being overworked and not having any time to spare. Shoulda woulda coulda asked for a design review checkpoint of course. Anyways the point has already been that 90% of the implementation would be common between the current solution and a more general one if we choose to pursue that. Let us continue the discussion to see how best we could handle this.
(In reply to comment #16) > I think we can significantly raise the bar here. That's why I'm working on null annotations :) (longer answer to follow later)
(13) We seem to have not followed up on the suggestion from https://bugs.eclipse.org/bugs/show_bug.cgi?id=247564#c2 for making stronger warnings in the case of nonvolatile fields. public class X { Object val; void foo() { if (val == null) { int otherVal; for (int i=0; i<1000000; i++) otherVal = 1234 / i * 4234; if (val.hashCode() == 0) // Why should this be a potential NPE ? System.out.println(""); } } } i.e there are no intervening assignment and also no method call. I think this feature will be more useful if we start with stronger assertions for non volatile fields and downgrade them to potential warnings upon encountering a function call (or where we see an assignment by suitably altering the nullness status) This should also improve Ayush's situation described in https://bugs.eclipse.org/bugs/show_bug.cgi?id=247564#c68, which is going to be all too common as OOP style encourage numerous method calls.
(In reply to comment #18) > (In reply to comment #16) > > I think we can significantly raise the bar here. > > That's why I'm working on null annotations :) That is very welcome Stephan. Null annotations support calls for a dedicated user who is willing to invest a certain amount of time and energy to painstakingly annotate elements and/or start with a broad brush in the form of defaults and work through all the tweaks that would be called for. Given the initial investments and ongoing discipline and rigor, the motivated user could expect to reap handsome dividends that could be more than commensurate. Contrariwise, bug 247564 is about flipping on a switch and spotting a bunch of bugs and therefore embodies an altogether different kind of user experience. The situation described by Ayush in bug# 247564 comment# 68 may well prove to be typical, but it doesn't have to be: I see three possible ways to improve the state of affairs: -- Widen the scope of analysis to include "alien" object references -- Experiment with starting with stronger assertions for non-volatile fields and diluting them only by a notch when needed. -- Per https://bugs.eclipse.org/bugs/show_bug.cgi?id=247564#c67 consider providing a way for users to annotate their methods as being "pure" or "side effect free". To have to toss out what we gleaned about the nullness of a field on account of a getter method being called is insane, but we cannot not do it unless we have some guarantees. Of course this last part is no longer about flipping a switch. Now that these same points have been made and remade ad nauseam :), I promise to limit future discussions to any follow ups on the technical merits of the alternate proposal either from a purely academic p.o.v or as candidate proposal for implementation at some point in future. For completeness it should be mentioned that the earliest patch posted on 2010-12-20 as called out in comment# 17, simply carried forward from where the exercise in archeology documented in https://bugs.eclipse.org/bugs/show_bug.cgi?id=247564#c3, found things to be. OIOW, the code base had rudiments of the current design even before work on bug 247564 started and so it may have appeared "natural" to pick up from there and run with it.
From an academic pov I agree that we can work out a possible solution, however hard it may be. Though practically I agree more with Stephan. I do not yet see a clear cut solution yet to problems that Stephan has pointed out already. I can imagine moving the type-specific bit stream to be object-specific, but that will take too much memory. The "root reference" scheme looks theoretically straightforward, but will not be so when we actually sit and implement it. Even if we do, assuming that assignments and method calls render the null info doubtful will make sure that we have more new "pot. NPE" warnings than definite NPE ones. Consider the following points - Bug 247564 was supposed to be a very basic (and not aggressive) extension of null analysis to fields and as null annotations work took off, it was supposed to create the underlying infrastructure to support annotations for fields, which it has successfully done. - Fields accessed directly via an object is not a good OOP practice. I would imagine most accesses to take place via getters, and most modifications to take place via helper methods. So, I don't see a sweeping advantage of including alien object references. - As said above, the no. of new pot. NPE warnings will far exceed that of new definite NPE warnings. IMHO, people even now, apart from power users, are not too inclined to turn on the "pot. NPE" warnings, unless they use null annotations. And if they use null annotations anyway, our effort here won't be too useful. - This effort will take more than 1 milestone atleast. Given that we're already into the later half of 3.8, I would give null annotations higher priority than this overhaul of null analysis, because I see real enthusiasm about null annotations out there. The excitement in the community about the fact that they can just use findbugs annotations and yet use JDT for analysis far exceeds the excitement they have for this feature. Ofcourse, I'm ok to get back to this in 3.9 timeframe.
(In reply to comment #21) > From an academic pov I agree that we can work out a possible solution, however > hard it may be. Though practically I agree more with Stephan. I do not yet see > a clear cut solution yet to problems that Stephan has pointed out already. It is my understanding that every technical objection raised so far has been addressed and/or explained away. I am not aware of any open issue other than (a) aliases (b) threads (c) absence of warning does not mean absence of NPE - all of which are true now too. > I > can imagine moving the type-specific bit stream to be object-specific, but that > will take too much memory. We will not track a single reference more than necessary, as opposed to the present situation where we track all fields of `this` irrespective of whether they are used or not. The memory requirement thus will be the intrinsic minimum required. >The "root reference" scheme looks theoretically > straightforward, but will not be so when we actually sit and implement it. That is a rather sweeping statement :) without suitable backing arguments at the moment. > - Fields accessed directly via an object is not a good OOP practice. True, but we can't just design tools for good programming practices. In fact it is `the not so good yet prevalent programming practices` that lets one down and call for better tool support. > - This effort will take more than 1 milestone atleast. Given that we're already > into the later half of 3.8, I would give null annotations higher priority than > this overhaul of null analysis, because I see real enthusiasm about null > annotations out there. The excitement in the community about the fact that they > can just use findbugs annotations and yet use JDT for analysis far exceeds the > excitement they have for this feature. Ofcourse, I'm ok to get back to this in > 3.9 timeframe. I am fine with this being spawned off as a separate bug and investigated in 3.9 time frame. Thanks.
(In reply to comment #21) > - Bug 247564 was supposed to be a very basic (and not aggressive) extension of > null analysis to fields and as null annotations work took off, it was supposed > to create the underlying infrastructure to support annotations for fields, > which it has successfully done. OK, my primary worry about is JDT appearing to promise more than it can deliver. Right now the label on the UI check box reads: "Include fields in null analysis". This really should have been "Include fields of `this` in null analysis" or "include fields of current object in null analysis" or some variant thereof. I don't know if it makes sense to change it now in the light of the imminent null annotations support. Stephan, what do you think ? > - Fields accessed directly via an object is not a good OOP practice. I would > imagine most accesses to take place via getters, and most modifications to take > place via helper methods. So, I don't see a sweeping advantage of including > alien object references. This ignores collaboration among classes in the same package to implement some functionality via default access. > - As said above, the no. of new pot. NPE warnings will far exceed that of new > definite NPE warnings. There is no evidence to show that signal to noise ratio would be any different. That being said, I agree that users may not always view things that way: the denominator in and of itself counts too. So this point is has some validity, but automatic application of this premise would scale down the scope and extent of what we do in any situation. (In reply to comment #22) > (In reply to comment #21) > > I > > can imagine moving the type-specific bit stream to be object-specific, but that > > will take too much memory. > > We will not track a single reference more than necessary, as opposed to the > present situation where we track all fields of `this` irrespective of whether > they are used or not. The memory requirement thus will be the intrinsic > minimum required. Also most of the state tracked will be on per method body basis and can be disposed of immediately after analysis of each method.
I scanned through the review comments and here is a way to take it forward: - Retain the current bug for (1), (2), (3), (4), (5) (7) - documentation clarification part (8) (9) (10) (11) (12) We will target 3.8 M6 for these. Some of these may not involve a code change. - Open a fresh follow up bug for (6) with target set to 3.8 M6. - Open separate follow up bugs for (13) and (7) (code change part see test case in comment# 7), with references to here. Set white board to "3.9 candidate" - Follow ups in respective bugs.
(In reply to comment #24) > - Open a fresh follow up bug for (6) with target set to 3.8 M6. bug 370182. > - Open separate follow up bugs for (13) and (7) (code change part > see test case in comment# 7), with references to here. > Set white board to "3.9 candidate" For 7 - bug 370185 for 13 - bug 370183
(14) Found a small bug while testing further public class X extends Y { static final Object a = null; public void main(String[] args) { int i = 0; loop: while (true) { if (i == 10) break loop; i++; } a.toString(); // see here } } Here, the warning we get is a pot. NPE and not an NPE as should be expected. This is because in LaballedStatement.analyseCode(), we add the initializations from main flow info into the merged info after wiping out the field's null info. This means that in org.eclipse.jdt.internal.compiler.flow.FlowInfo.unconditionalFieldLessCopy(), we should again read the constantFieldsMask and prevent wiping out null info of constant fields. Phew.
(In reply to comment #26) > (14) Found a small bug while testing further The problem may actually be in org.eclipse.jdt.internal.compiler.flow.UnconditionalFlowInfo.addPotentialNullInfoFrom(UnconditionalFlowInfo), where currently the definite null info for constant fields also gets converted to a weaker info.
(In reply to comment #5) > (11) I think this program produces very confusing messages: > public class X { > static final Object o = null; > void foo() { > if (o.toString() == "") {} > if (o == null) {} > if (o != null) {} > } > } I'm not sure about this. The current behaviour even for local variables is similar. Why is either more/less confusing than the other? See public class X { void foo() { Object o = null; if (o.toString() == "") {} if (o == null) {} if (o != null) {} } } And this is true as well - o can never actually be null at the second if statement. Confusing - maybe at first glance. Correct - Always!
(In reply to comment #28) > (In reply to comment #5) > > (11) I think this program produces very confusing messages: > > public class X { > > static final Object o = null; > > void foo() { > > if (o.toString() == "") {} > > if (o == null) {} > > if (o != null) {} > > } > > } > > I'm not sure about this. The current behaviour even for local variables is > similar. Why is either more/less confusing than the other? I agree. Both are equally confusing :) > See > public class X { > void foo() { > Object o = null; > if (o.toString() == "") {} > if (o == null) {} > if (o != null) {} > } > } > > And this is true as well - o can never actually be null at the second if > statement. Not so. The second if statement is virtual world which can't be stepped into. So making an assertion "o can never actually be null" about a final variable which is patently initialized to null in the real world is questionable. > Confusing - maybe at first glance. Correct - Always! Akin to the Schrödinger's cat being neither alive nor dead, this message is neither correct nor incorrect. Which is why it is confusing. I am fine with this particular item being closed without any change. I agree a user will quickly figure out the rationale and the initial confusion will fade away.
(In reply to comment #29) > I am fine with this particular item being closed without any change. I > agree a user will quickly figure out the rationale and the initial > confusion will fade away. Ok, I'll leave it as it is for now. (15) Yet another case we haven;t handled yet package p; public class Snippet { String test; public void foo(String[] args) { test = null; test += "... some text"; test.toString(); // Incorrect pot. NPE warning } } The fix should be trival, in CompoundAssignment.
Created attachment 210441 [details] proposed fix v1.0 + regression tests Fixed all points relevant to this bug except (10) We might pay more in terms of performance by determining which interface we've already counted than we're paying in terms on memory now, so leaving this alone. (11) This is consistent with the current story for locals, so leaving this as is (12) This actually works as expected. p.toString() called in the first line makes the status of p as non null for later statements. Some other points - (9) although this will not have caused any visible side effects, since field id's are guaranteed to be unique for every type enclosed in the main type, it may cause extra memory allotted for an inner type to persist outside as well. So, changed to a deep copy. (14) This is fixed by changing the unconditionalFieldLessCopy(..) method to not discard the field info as it was doing, but to instead call resetNullInfoForFields(..), which now knows how to deal with fields. Srikanth, although everything else is trivial, I'd like your +1 for the fix for (14). All tests pass
(In reply to comment #0) > (5) resetNullInfoForFields should be done not only for > MessageSends but also for other code artifacts executions. > For example, in the following, there is an incorrect warning: With the latest patch, I still see a warning in: public class X { static Object lastObject; static X x; public static void main(String [] args) { if (lastObject == null) { x.new Inner(); lastObject.toString(); } } class Inner { Inner (){ lastObject = this; } } } I think QAE needs fixing too.
(In reply to comment #31) > Created attachment 210441 [details] > proposed fix v1.0 + regression tests Overall changes look good. > Srikanth, although everything else is trivial, I'd like your +1 for the fix for > (14). If you need a scrutiny of the changes in UnconditionalFlowInfo.java and LoopingFlowContext.java, Stephan would be a better person. So, the only issues yet to be addressed are : (7) - documentation part. The disclaimers should be documented. and (In reply to comment #23) > Right now the label on the UI check box reads: "Include fields > in null analysis". This really should have been "Include fields of `this` > in null analysis" or "include fields of current object in null analysis" > or some variant thereof. > > I don't know if it makes sense to change it now in the light of the > imminent null annotations support. Stephan, what do you think ? Once we hear from Stephan on this, we will know whether the UI label should change or not.
(In reply to comment #33) > (7) - documentation part. The disclaimers should be documented. I'd like to wait till bug 331649 to change the doc. We can then comprehensively state what works and what not. I will file a separate bug for this. > Once we hear from Stephan on this, we will know whether the UI label > should change or not. AFAICS in bug 331649, we do report warning for alien objects' fields too. So the UI looks ok.
(In reply to comment #34) > (In reply to comment #33) > > (7) - documentation part. The disclaimers should be documented. > I'd like to wait till bug 331649 to change the doc. We can then comprehensively > state what works and what not. I will file a separate bug for this. Bug 370552
(In reply to comment #34) > (In reply to comment #33) > > (7) - documentation part. The disclaimers should be documented. > I'd like to wait till bug 331649 to change the doc. We can then comprehensively > state what works and what not. I will file a separate bug for this. > > > Once we hear from Stephan on this, we will know whether the UI label > > should change or not. > AFAICS in bug 331649, we do report warning for alien objects' fields too. So > the UI looks ok. Indeed, annotated fields are considered regardless their owner. I'll give some more explanation on that in bug 331649.
(In reply to comment #36) > Indeed, annotated fields are considered regardless their owner. Stephan, can you also quickly check the fix for point (14) as mentioned in comment 31? Thanks!
As we have reopened https://bugs.eclipse.org/bugs/show_bug.cgi?id=247564 and backed out the implementation from there, the current bug is not relevant anymore.
Verified for 3.8M6