Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 328358

Summary: [DOM] The DOM bindings could be resolved lazily
Product: [Eclipse Project] JDT Reporter: Satyam Kandula <satyam.kandula>
Component: CoreAssignee: JDT-Core-Inbox <jdt-core-inbox>
Status: RESOLVED WONTFIX QA Contact:
Severity: normal    
Priority: P3 CC: amj87.iitr, daniel_megert, eclipse, markus.kell.r, Olivier_Thomann, srikanth_sankaran
Version: 3.6   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on:    
Bug Blocks: 301894    
Attachments:
Description Flags
A patch
none
Another patch
none
Another patch
none
Proposed patch
none
Proposed fix
none
Revised patch
none
Revised patch none

Description Satyam Kandula CLA 2010-10-21 10:32:05 EDT
Whenever a DOM/AST client requests ASTs with bindings, JDT/Core resolves all the bindings and keep them in memory. This occupy lots of memory. Hence, it should be lazily resolved. 

This is actually a part of bug 301894.
Comment 1 Satyam Kandula CLA 2010-10-21 10:55:26 EDT
Created attachment 181403 [details]
A patch

I plan to release a fix for 3.7M3. This is not the exact patch I will like to release. 

These are some TODO's that I will put in the patch
1. One test still fails (actually it passes but I see some exception in the console). I will fix this. 
2. SourceTypeBinding#isTypeSame() needs to be implemented completely.
3. This code gets into action based on a flag. As of now, I plan to turn this flag on only if IGNORE_METHOD_BODIES is used. As I know APT is the main client which uses this flag. I am sure that bindings will fail for method bodies. However, surprisingly even JDT/Core tests doesn't catch them. 
4. Add some tests -- I will not be able to be comprehensive but will add some. 

What are your comments?
Comment 2 Olivier Thomann CLA 2010-10-21 11:27:21 EDT
This should target M4. This area is really sensitive and making such a big change a week before M3 is too late.
Comment 3 Satyam Kandula CLA 2010-10-21 11:30:55 EDT
(In reply to comment #2)
> This should target M4. This area is really sensitive and making such a big
> change a week before M3 is too late.
I am fine with this. At the same time, I was wondering if we can put this under some JVM System variable, so that some APT clients can try out. What do you think?
Comment 4 Satyam Kandula CLA 2010-10-30 13:03:14 EDT
Created attachment 182103 [details]
Another patch

A better patch and yes there are still some more things left. Added a test in ASTConverterBindingsTest() to compare the bindings between the lazy resolution and the normal mode. They are still some failures but it is better.
Comment 5 Olivier Thomann CLA 2010-11-09 11:54:27 EST
(In reply to comment #3)
> I am fine with this. At the same time, I was wondering if we can put this under
> some JVM System variable, so that some APT clients can try out. What do you
> think?
As a temporary option, yes this is doable if this helps to get this fixed.
Comment 6 Satyam Kandula CLA 2010-11-15 11:30:31 EST
Created attachment 183137 [details]
Another patch

Improved patch - The newly added test still fails for some types. Other JDT/Core and JDT/UI tests run good. Some more cleanup pending..
Comment 7 Olivier Thomann CLA 2010-11-16 11:39:32 EST
(In reply to comment #6)
> Improved patch - The newly added test still fails for some types. Other
> JDT/Core and JDT/UI tests run good. Some more cleanup pending..
I don't see why org.eclipse.jdt.core.dom.AnnotationBinding is now public.

Beside this the patch is pretty big. We still don't have a complete separation between compiler bindings and dom bindings.

I was hoping that we could completely separate both by using the compiler'S binding key as the link from DOM bindings to compiler bindings. The binding environment would be used to create compiler bindings using the binding key.

Maybe the actual patch improves things well enough so that we don't need to go further for now.

When all remaining issues have been fixed with the current patch, it would be interesting to check how much this improves the memory usage for the APT case and also check how much slower this would be vs the actual implementation.

Please provide numbers before and after the patch in the bug report itself.

Thanks.
Comment 8 Satyam Kandula CLA 2010-11-16 12:21:06 EST
(In reply to comment #7)
> I don't see why org.eclipse.jdt.core.dom.AnnotationBinding is now public.
This is no longer required. As I kept the newly added files in a separate folder, I had to do it earlier but yes it is not longer necessary. 

> Beside this the patch is pretty big. We still don't have a complete separation
> between compiler bindings and dom bindings.
I actually planned or thought this as part of phase 3 as mentioned in bug 301894 comment 11. However, to fix the issue with ITypeBinding#isAssignmentCompatible(), I started to keep around the compiler bindings around. The Variable bindings are not be cached though and yes they could be seperated. I was mostly focusing on getting the compiler AST nodes off memory.

> 
> I was hoping that we could completely separate both by using the compiler'S
> binding key as the link from DOM bindings to compiler bindings. The binding
> environment would be used to create compiler bindings using the binding key.

I probably don't understand what you mean by binding environment. I thought it is a collection of already processed compiler bindings. 

> Maybe the actual patch improves things well enough so that we don't need to go
> further for now.
> When all remaining issues have been fixed with the current patch, it would be
> interesting to check how much this improves the memory usage for the APT case
> and also check how much slower this would be vs the actual implementation.

> Please provide numbers before and after the patch in the bug report itself.
Sure I will get you the numbers 

> Thanks.
Comment 9 Olivier Thomann CLA 2010-11-19 17:15:34 EST
(In reply to comment #8)
> I probably don't understand what you mean by binding environment. I thought it
> is a collection of already processed compiler bindings.
What I meant by binding environment is a way to create compiler bindings on demand. So if the binding was already created in that environment, it would be returned, but if the binding is not available, it is created, cached and then returned.
The binding environment can be flushed to release memory.

Doing so, it gives a good control to the user over the memory used.
Comment 10 Satyam Kandula CLA 2010-11-29 08:25:12 EST
Created attachment 184034 [details]
Proposed patch

I have not included the tests in this patch. I am cleaning up the tests. Once done, I will add them too. 

The main change is to avoid keeping the DOM-Compiler nodes in the memory and then not resolving them when they are created. All the other changes are essentially to make things work even otherwise. 

And here are the other changes...
Whenever the mappings are required, the file is parsed to find the corresponding compiler node at the given DOM location. I have used the visitor to do this. The parsed and resolved file is kept in the cache. As of now, the size of the cache is only one. 
As there were occasions when the compiler type bindings are being compared using ==, the LookupEnvironment is being reused while reparsing the file. This required making sure that the typebindings are not created again but the other bindings are. 

We do want to return the same DOM binding object whenever requested for a binding of a DOM node. To retain this behavior, we have to keep this bindings in a table. We used to use store this bindings w.r.t compiler bindings and binding keys. However, primarily we used to use the compiler bindings as the key. In this changes, I have refactored the code to look at the key mappings first when lazy bindings are used.

CompilationUnit#findDeclaringNode(binding) returns the declaring node for the given bindings. When the resolution used to happen during the creation of the DOM nodes, the mappings from binding to DOM nodes were also kept in the table. However, in this approach, whenever the bindings are created, we may not have the DOM nodes. Hence, I had to add some code based on combination of BindingKeyParser and DOM/Visitor for getting the appropriate DOM node for a given binding. The tables are still searched for before this code is used. 

I have done some decent testing for all this. However the tests are not yet in a shape to be added to the suite. I will clean them up and add it. 

Please look at the patch and give me your comments.
Comment 11 Olivier Thomann CLA 2010-11-29 14:21:54 EST
A few comment:
- I changed some indentation issues. I might have missed a few.
- I would like to bits used in TagBits to be as low as possible. I think bits 21 and 22 are free for types. If this is true, they should be used for the values of:
long SourceLevelMoreThan15 = ASTNode.Bit54L;
long RebuildBindings = ASTNode.Bit55L;

Patch looks good even if I might not go as far as I thought. DOM type bindings still point to compiler type bindings.

Since this is really a big patch, I would like more than one week of testing before a milestone. So I am setting M5 as the milestone for this bug, but I would like the code to be released as soon as possible after M4 is declared.

Srikanth, please go through all the changes with Satyam. We need a closer second look on this one as this is quite big. Note all changes in this patch seems to be related to this issue.
I am referring to:
- changes in org.eclipse.jdt.internal.compiler.ast.Javadoc
- changes in org.eclipse.jdt.core.dom.Javadoc (this clearly frees some memory, but it is not directly related to the bindings being lazily initialized)

I am fine to keep these changes in this patch as they clearly refer to the improvements for memory consumption during APT processing.

I'll post a new patch that is only fixing some indentation.
Comment 12 Olivier Thomann CLA 2010-11-29 14:22:19 EST
Created attachment 184068 [details]
Proposed fix
Comment 13 Srikanth Sankaran CLA 2010-11-29 22:37:25 EST
(In reply to comment #11)

[...]

> Since this is really a big patch, I would like more than one week of testing
> before a milestone. So I am setting M5 as the milestone for this bug, but I
> would like the code to be released as soon as possible after M4 is declared.
> 
> Srikanth, please go through all the changes with Satyam. We need a closer
> second look on this one as this is quite big. 

Yes, As soon as I have https://bugs.eclipse.org/bugs/show_bug.cgi?id=322817
reasonably cooked and ready to be put in the back burner, I'll work with
Satyam to get this reviewed. I agree with the proposed time frame for release
and will facilitate from my end -- Thanks.
Comment 14 Srikanth Sankaran CLA 2010-12-20 10:17:57 EST
Created attachment 185549 [details]
Revised patch

I have completed background study/research and can look at
this patch now. The revised patch contains an initial set of
clean ups. These changes are not tested - Satyam, study these
and holler if you don't agree with any of the changes. 

    - Renamed org.eclipse.jdt.internal.compiler.Compiler.bindingsMayPresent
to be bindingsMayExist instead.
    - Similarly changed all other fields, parameters, locals in all the places
this value gets tunnelled to.
    - In the interest of minimizing changes I have changed calls to LookupEnvironment.buildTypeBindings(CompilationUnitDeclaration,AccessRestriction, false) to LookupEnvironment.buildTypeBindings(CompilationUnitDeclaration, AccessRestriction)
    - I got rid of the two methods org.eclipse.jdt.internal.compiler.ast.CompilationUnitDeclaration.cleanUpAllMethodBodies() and org.eclipse.jdt.internal.compiler.ast.CompilationUnitDeclaration.cleanUpAllMethodBodies(TypeDeclaration).
       By minimally modifying the existing cleanUp methods, I could achieve
the same thing.
    - Fixed a bunch of indentation issues. Cleaned up several noise diffs
(empty lines added, deleted, nothing but white space changes in a line etc)

Two questions:

    - I see several changes in SourceTypeBinding where a line of the form

	boolean isSource15 = this.scope.compilerOptions().sourceLevel >= ClassFileConstants.JDK1_5;

is changed to

     	boolean isSource15 = ((this.tagBits & TagBits.SourceLevelMoreThan15) != 0);

Are these changes integral to the fix ? If not I would get rid of them.

    - I also think the tag bit is misnamed. As you can see in the snippet
below, it is confusing to say more than 15 and check for >= JDK1_5 

    if (this.scope.compilerOptions().sourceLevel >= ClassFileConstants.JDK1_5) {
		this.tagBits |= TagBits.SourceLevelMoreThan15;
    }

    - What is the need for this tag bit ? Is it that scope is nulled out
at some point and so the old method of consulting the source level is not
available any more ?

    - I think we should leave the existing constructor of ASTConverter alone
and introduce a new one as needed. The old one can be made to use the new
one. This way we can get rid of the diffs in AST.java & ASTParser.java

Still scratching the surface, will continue on this.
Comment 15 Satyam Kandula CLA 2010-12-21 01:33:48 EST
(In reply to comment #14)
Srikanth, Thanks for the revised patch and the comments. I will look at the changes in the code and get back to you. I agree with the changes as per the description. 

>     - What is the need for this tag bit ? Is it that scope is nulled out
> at some point and so the old method of consulting the source level is not
> available any more ?
Yes, this scopes are nulled out and hence this is needed and yes, the name should be better.
Comment 16 Srikanth Sankaran CLA 2010-12-21 22:13:19 EST
Created attachment 185687 [details]
Revised patch

(In reply to comment #14)

>     - I think we should leave the existing constructor of ASTConverter alone
> and introduce a new one as needed. The old one can be made to use the new
> one. This way we can get rid of the diffs in AST.java & ASTParser.java

This is taken care of in the latest patch.

Notwithstanding the developments documented in bug 301894 comment 16
I will continue to review this patch. I would prefer we release this
only when we have a full solution though so as to allow ourselves appropriate
room for backtracking and considering an alternate fix.
Comment 17 Srikanth Sankaran CLA 2010-12-23 06:48:27 EST
(In reply to comment #16)
> Created an attachment (id=185687) [details]
> Revised patch

I have gone through the changes and they look good as far
as I can tell (need to caveat about not being an expert in this
area though)

The earlier review comment around lowered numbered bits for the
new Tag bits is still not implemented.

The changes around getAnnotationInstance() method usage needs to
be clearly documented with examples - As it stands it is unclear
what these changes do.

I would be interested in the outcome of the discussion initiated
in bug 301894 comment 16. If the affected product uses only plugin
based processors and APT doesn't support batch processing for plugin
based processors, then perhaps a reasonable solution is to expose
a full build to APT as a collection of incremental builds. 

If such a targeted solution can be devised and can unblock the customer 
concerned and the solution is small, well contained, etc, it may be
a better approach than attempting to solve the problem in all its
generality using a relatively much more complex solution. That is
hindsight 20/20 talking of course.

Part of this sentiment admittedly comes from where we are in 3.7
and how far/long it has taken to come up with the improvement (
which despite being very significant fails to qualify as a "solution")  
which can be seen a measure of intrinsic complexity of the proposed
approach - The changes are so pervasive that to shake out all issues
and stabilize this may take a while - that does induce a bit of
nervousness.
Comment 18 Walter Harley CLA 2010-12-28 00:04:26 EST
(In reply to comment #17)
> I would be interested in the outcome of the discussion initiated
> in bug 301894 comment 16. If the affected product uses only plugin
> based processors and APT doesn't support batch processing for plugin
> based processors, then perhaps a reasonable solution is to expose
> a full build to APT as a collection of incremental builds. 

It would be better to disable APT (ie put up an error or warning saying that due to the size of the project there is not enough memory to run APT) than to present partial, which is to say incorrect, information via the APT API.

It's admittedly a subtle problem.  Many processors will not request information on anything other than the types that are annotated with the annotations they process; in these cases, presenting a series of incremental images will be fine.  Some processors will request information about types that are not annotated (e.g. all types in a given package), or about all types annotated with an annotation that they do not claim; these processors will do the wrong thing if presented with partial information.  As a trivial but horrid example, consider a processor whose job is to output a text file containing the names and members of all types contained in every package whose package-info.java file contains a particular annotation.  There is in general no way to satisfy these processors without parsing those types and saving at least the information needed to answer questions about members, annotations, and hierarchy.  There is also no way to tell one kind of processor from the other.

APT also does not support batch processing for Java 6 processors, regardless of whether or not they are wrapped in a plugin.

I think the challenge is to determine how to fail gracefully rather than to provide partial (thus false) information.  The nature of the APT API is that it does not scale to large projects.  We can work to push the boundary - as Satyam has been doing - but there will always be a boundary.  That said, I am not sure how; wrapping parsing inside a catch (OutOfMemoryError e) {} is not likely the answer.

As I understand it, Satyam's changes are about lazily evaluating the ASTs, but they do not do anything to dereference them when not needed.  Given that even most processors that try to get information on a large number of types move through the results serially, is there anything we can do to throw away the ASTs in between APT API calls, and re-create them on the next call if needed?  This would slow down performance but greatly expand the effective amount of memory in most realistic cases.  Perhaps it is too complex, though.
Comment 19 Satyam Kandula CLA 2011-01-03 00:36:05 EST
(In reply to comment #18)
> It would be better to disable APT (ie put up an error or warning saying that
> due to the size of the project there is not enough memory to run APT) than to
> present partial, which is to say incorrect, information via the APT API.

Are you telling that it is incorrect to give only a subset of files as files being processed? Yes, if the processors asks for all the details it could potentially run out of memory, but this solution should work with many processors. 

> APT also does not support batch processing for Java 6 processors, regardless > of whether or not they are wrapped in a plugin.
I do understand this. As of now, with Java 6 processors, the compiler calls APT with only 2000 files at once. This is what I am telling that we could do with Java 5 processors also if they are only file based processors. 

> As I understand it, Satyam's changes are about lazily evaluating the ASTs, but
> they do not do anything to dereference them when not needed.  Given that even
> most processors that try to get information on a large number of types move
> through the results serially, is there anything we can do to throw away the
> ASTs in between APT API calls, and re-create them on the next call if needed? 
> This would slow down performance but greatly expand the effective amount of
> memory in most realistic cases.  Perhaps it is too complex, though.
This should be possible if there are no references to the given AST -- and I am trying out this possibility!
Comment 20 Walter Harley CLA 2011-01-08 19:51:25 EST
(In reply to comment #19)
> Are you telling that it is incorrect to give only a subset of files as files
> being processed? 

Methods like AnnotationProcessingEnvironment.getTypesAnnotatedWith() are not well specified in Java 5.  The method is documented just to return "the types included in this round" - in batch compilation (javac) this means everything, but in an IDE it is ambiguous.  The "batch mode" flag on a processor is a workaround for this; file-mode processors (and plugin-based processors) by definition should be able to cope with incomplete sets.

The most important thing is that the type navigation methods - like TypeDeclaration.getSuperinterfaces() - must return complete information. Failing to get that right would result in unpredictable, and generally ugly, failures.  

For types that are recovered, it is preferable to be consistent about whether AST-level information is available - that is, it's bad to randomly show Java Model level versus AST level info from build to build.  For example, people expect Declaration.getSourcePosition() to return a non-null value whenever source code for the declaration exists.  There is probably a bit of leeway here because it is already a bit subtle.

Java 6 processors are more like well-behaved Java 5 processors: it is well documented that not all types will necessarily be recompiled, and therefore that they won't all be presented in a given round.  But in either case we need the typesystem navigation to be complete.
Comment 21 Olivier Thomann CLA 2011-01-19 15:37:31 EST
Since the current solution for APT memory issue doesn't touch the DOM bindings themselves. This is now out of the plate for 3.7.
The change would be too big to be released now for 3.7.
Comment 22 Eclipse Genie CLA 2019-02-28 05:01:44 EST
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

--
The automated Eclipse Genie.
Comment 23 Dani Megert CLA 2019-02-28 05:05:50 EST
No one is working on this. We can reopen again later if needed.