| Summary: | [compiler][null][implementation][external] Performance measurements regarding external annotations | ||
|---|---|---|---|
| Product: | [Eclipse Project] JDT | Reporter: | Stephan Herrmann <stephan.herrmann> |
| Component: | Core | Assignee: | Stephan Herrmann <stephan.herrmann> |
| Status: | VERIFIED FIXED | QA Contact: | |
| Severity: | enhancement | ||
| Priority: | P3 | CC: | jal, jarthana, lieven.lemiengre, manoj.palat, srikanth_sankaran, timo.kinnunen |
| Version: | 4.4 | Keywords: | helpwanted |
| Target Milestone: | 4.5 M6 | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Bug Depends on: | |||
| Bug Blocks: | 331651 | ||
|
Description
Stephan Herrmann
(In reply to comment #0) > I'm slightly concerned about the difference between declaration annotations and > type annotations: I'd like to base our decisions on data about type annotations > (because those can be bigger in volume and are more complex to process), but I > don't know if we can get a body of type annotations from any existing tool. > Anybody? I whipped up a quick project at https://github.com/Overruler/NullableExtract to read .class files and output the source code for any methods which have some @Nullable-like type annotations. I used it on the annotated JDK 8 library (jdk8.jar) which comes with the Checker Framework distribution. It generated 156 Java class files over 8 java.* packages which have at least one method with a @Nullable annotation. It's probably missing some annotations with generics and inner classes but for proof-of-concept testing the stub format (fully valid Java code, how nice!) and for other testing purposes ONLY it's something, at least. The jdk8.jar (full path for it is \checker-framework-1.8.3\checker\dist\jdk8.jar) is licensed the same as the libraries from the JDK so using it directly as a source of type annotations for tests might be OK in general, but I'm not a lawyer. Otherwise using Daikon from http://plse.cs.washington.edu/daikon/download/doc/daikon.html#AnnotateNullable to generate @Nullable annotations from source code could be an option too, but I haven't used that myself and don't know how well that works. I can already read KAnnotator files in my POC code, so once the Eclipse format is defined I can create the conversion for that. I also tested my implementation with KAnnotator's output on the JDK's main jar (rt.jar). I can also jump in to write the testing code (perhaps with some help, as Eclipse's tests is a new area for me ;-) All right, where are we on this bug? Has any of you guys succeeded in massively annotating, say, the JRE? At this point I wouldn't care about 100% semantic correctness, but the annotations should "feel" real. I say "massively" because a dozen or so annotations certainly doesn't support a realistic performance experiment. I'm afraid we may have to settle with the old subset of declaration annotations, or has anyone seen a tool to infer type annotations? I'd love to stress-test the compiler with *lots* of external annotations! :) Once we have one version of annotated JRE, we should share that and convert from what we have into other formats we want to investigate. @Frits: do you want to hack your reader to dump annotations in the format of bug 440474 comment 18? Note, that I pushed some updates today. Alternatively, if you show me your internal representation, I'd be happy to contribute the dumping part. Regarding Daikon: the downside of this tool is: you need to execute an application to create traces as an input to the tool. I'm not aware of an application that exercises all (or large areas) of the JRE :( I also gave a quick try to Nit [http://nit.gforge.inria.fr/] but make failed with an incompatibility in OCaml libraries, not the easy pie. Finally, JastAdd-NonNullInference is almost installable (need to fetch some additional tools, notice that install guide was written for ppl. with case-insensitive file systems etc. - eventually come down to a plain compile error in the tool). I even got the tool to run, but the result was: no output plus a few crashes. IOW: we're still accepting bets :) Hi Stephan, best wishes for 2015 ;-) It has been a while, so I will need to do some catching up. The only tool I can get to run so far is "kannotator", https://github.com/jetbrains/kannotator. This is supposed to infer both nullable and nonnull annotations. But playing with it I found out that it only seems to infer @Nonnull, and it has trouble with that too: it annotates Map, which is clearly impossible, and it annotates HashMap clearly incorrect. So the tool itself seems flawed. I reported a few bugs and apparently Jetbrains has deprecated the tool for something else. But- it should work for performance testing at least. I cannot easily see from your grammar where the changes are since the last time we talked. I did send you zipped annotation files and can do that again if you want; I can fix whatever is wrong with them if you can tell me ;-) I had started to do investigation into other tools than KAnnotator for inferring nullities but I too found out that anything I could found was either not very useful (daecon - if I have to run a tool to detect if values are null I might as well wait for the NPE) or they were very old and/or unmaintained. The only thing I found that was more or less active was/is Jetbrain's work. They have added something new to their new IDE which is part open source too, so I will look there... I fear that's more important than the annotation format because I do not realistically see the JDK being annotated by human effort. Anyway... I will get you the data provided you can point out what needs to change... Ok? (In reply to Frits Jalvingh from comment #4) > Hi Stephan, best wishes for 2015 ;-) It has been a while, so I will need to > do some catching up. Welcome back :) (also I was distracted by other issues ..) > The only tool I can get to run so far is "kannotator", > https://github.com/jetbrains/kannotator. This is supposed to infer both > nullable and nonnull annotations. But playing with it I found out that it > only seems to infer @Nonnull, and it has trouble with that too: it annotates > Map, which is clearly impossible, and it annotates HashMap clearly > incorrect. So the tool itself seems flawed. I reported a few bugs and > apparently Jetbrains has deprecated the tool for something else. That's a pity! Why would they abandon such a tool? > I cannot easily see from your grammar where the changes are since the last > time we talked. I did send you zipped annotation files and can do that again > if you want; I can fix whatever is wrong with them if you can tell me ;-) Meanwhile I found the files you previously sent me, and apparently they *are* already "conforming" to the updated format :) Old: class <K:V:>java/lang/Map New: class java/lang/Map <K:V:> By separating the class's type parameters onto a separate line (plus an optional annotated version of the same), it's easier to handle type parameters as optional. So, your files don't have any type parameters, and the compiler doesn't require them and all is nice. Plus: I'm not sure the third line per method was optional, last time we talked. But it seems you completely omit any methods without annotations, right? > I had started to do investigation into other tools than KAnnotator OK, to coordinate our efforts: I will only try to contact the authors of JastAdd-NonNullInference, some of whom I actually know, to check if they can help regarding the problems reported in comment 3. > But- it should work for performance testing at least. Yes: the files you sent via email can be used for performance measurements. I'm still practising how to get some statistically valid results - deviation between test runs is pretty high. What would be a good experiment? Perhaps this: - use unziped sources of JRE 8 - it contains these toplevels: com java javax launcher org - compile only "com javax launcher org" - reading "java/*" from jar+eea Should be a realistic mix of compiling lots of stuff plus reading lots of library classes (plus their annotations). I should probably instrument the compiler to print the number of annotation files read or s.t. Any other suggestions? BTW: by performing these experiments on Linux I hope that the access to many files will not spoil the measurement. I agree that we should also be able to read just one big zip. > > The only tool I can get to run so far is "kannotator".... > That's a pity! Why would they abandon such a tool? It's main use was for their new language "Kotlin" which has nullable in it's type system and has hard checks on it. They used KAnnotator to handle interop with java, because otherwise every java call needed to be checked (in Kotlin those checks were not optional as in Java). But they have now made the decision to more or less skip checking declared nullity in Kotlin when the code targeted is Java. A design blunder of epic proportions as far as I'm concerned - 100% of the code out here is Java, not Kotlin, so anything talking with libraries has no use of that hard typing... Anyway, because of this they do not "need" kannotator anymore. At the same time they have made something else that is integrated in their IDE and runs what appears to be a simpler inference mechanism in the background. I will look at that for a bit, and at the theoretical foundations for this work. > Plus: I'm not sure the third line per method was optional, last time we > talked. But it seems you completely omit any methods without annotations, > right? Yes, I only emit stuff that has changes/annotations, nothing else. There seems to be no need since we "adjust" the type, so adding useless info just makes parsing slower.. > > I had started to do investigation into other tools than KAnnotator > > OK, to coordinate our efforts: I will only try to contact the authors of > JastAdd-NonNullInference, some of whom I actually know, to check if they can > help regarding the problems reported in comment 3. Check. > > > But- it should work for performance testing at least. > > Yes: the files you sent via email can be used for performance measurements. > I'm still practising how to get some statistically valid results - deviation > between test runs is pretty high. That's actually odd.... What platform do you develop on? > What would be a good experiment? Perhaps this: > - use unziped sources of JRE 8 > - it contains these toplevels: > com java javax launcher org > - compile only "com javax launcher org" > - reading "java/*" from jar+eea It's worth a try; I have no way of predicting how much references are done from the other packages to java/*. If we see too little difference between compiling with and without external annotations we can try something else. If we have something easy to use I can try compilation on other code bases too. Running a full compile on my work's code base takes more than a minute so we should see something there ;-) > BTW: by performing these experiments on Linux I hope that the access to many > files will not spoil the measurement. I agree that we should also be able to > read just one big zip. Running on Linux means the small files will not be a problem. But we need to differentiate between "hot" and "cold" testing perhaps because the first time to read those files can be slow. After that they are in memory. Do you expect the biggest performance problems (or the loss in performance compared to the compiler's performance without externals) to be in reading the annotations or are there other considerations inside the compiler too? If the biggest concern is the files we can of course just run the reader code. Some first data, finally. Partial compilation of JRE8 as mentioned above. 3 Experiments: (A) no null analysis (B) annotation based null analysis enabled but no annotations (C) annotation based null analysis enabled and .eea files read Each experiment executed 10 times Time measured using unix command 'time': real, user, system Machine: Quad Core 3.1GHz, SSD, plenty of RAM OS: Linux (Kubuntu) Results: (A) Average times: real 22.82 user 64.33 system 2.55 Aver. deviation: 1.81 1.92 0.12 (B) Average times: real 22.50 user 66.27 system 2.64 Aver. deviation: 1.76 2.28 0.07 Penalty wrt (A): -1.40% 3.00% 3.58% (C) Average times: real 25.18 user 72.33 system 3.09 Aver. deviation: 2.05 2.40 0.15 Penalty wrt (A): 11.34% 12.56% 21.94% During (C) we were reading annotations for 634 classes and 4437 methods. Later I found that excluding also javax from the compilation gives a better ratio, reading annotations for 843 classes and 5707 methods (while at the same time performing less compilation). Aside from the pure business of reading external annotations, the actual null analysis has a significant impact. Not when there are no annotations - here penalties are in the range of statistical deviation. However, with external null annotations, we are reporting 14501 additional warnings, which gives a pretty good hint that analysis *is* performing significant work here. 12.56% overall penalty is significant. While these are useful realistic experiments, I'll quickly hack a test driver that only reads all the class files & annotations, without performing actual compilation. I think the interesting part is: the full stretch from reading a file up-to and including synthesizing the annotations on BinaryTypeBinding and MethodBinding. That doesn't sound bad at all, considering there is a real amount of extra work to do that produces valuable extra code quality. Congratulations! Perhaps, instead of hacking a loader of the data per se you can use a profiler? This seems like a task for that before spending a lot of work on perhaps a non-problematic part. Much better: simply compile a class that declares variables of all types from java/** and javax/** (3701 types producing 3701 fields). I distributed these variables into separate methods with 10 variables per method. Compiling this class triggers reading annotations for 2034 classes and 11013 methods. Compile errors: 503 (invisible classes) Measurements: Without external annotations: Average time: real 1.49 user 4.44 system 0.14 Aver. deviation: 0.05 0.07 0.02 With external annotations Average time: real 2.00 user 5.90 system 0.28 Aver. deviation: 0.08 0.19 0.04 Penalty 34.16% 32.83% 102.16% Now we have a nicely low average deviation and we can clearly see the penalty incurred by reading and evaluating the external annotation without much efforts going into downstream compilation phases. Next I determined the minimum required heap: 19MB, and repeated the experiment with -Xmx19m (at 18m it already throws OOME(GC overhead limit exceeded)). Results: Without external annotations: Average time: real 2.46 user 7.81 system 0.18 Aver. deviation: 0.11 0.36 0.03 With external annotations Average time: real 3.91 user 12.57 system 0.32 Aver. deviation: 0.26 0.91 0.03 Penalty 58.65% 60.91% 78.77% Ergo: under heavy load (having to perform significan gc) the penalty aggrevates. I wonder, how much of this penalty is due to the reading / evaluating part vs. the creation of actual annotated types. Or more interestingly: will measurements differ when using a different file format? Anyone ready to hack an ExternalAnnotationProvider for another format? :) FWIW: my version has 500 lines of code, but those are somewhat complex lines, given we want to deal with all of JSR 308 (current impl. is >80%, <100% complete in this regard). Thinking more about the experiment, we see that lack of true JSR 308 annotations means we are only challenging two methods of the TypeAnnotationWalker: toMethodParameter(int) and toMethodReturn(). On the one hand this means, hacking another ExternalAnnotationsProvider supporting just this subset (just for the experiment) should be easy. OTOH, I start to doubt that we will actually see a performance difference at this narrow slice of behavior. So, who's next? :) Hint: for other formats, the hard part might be ExternalAnnotationProvider.forMethod(): how do you associate an annotation entry to a method (of which we only have selector and binary signature)? Wonders of performance work: I counted the number of instances of annotation walkers created, saw more than 42000 instances, started to optimize, observed a performance regression! - because I had left a sysout in the compiler :) After cleanup reducing the number of allocations brings this result: Normal load: With external annotations Average time: real 1.98 user 5.77 system 0.24 Aver. deviation: 0.09 0.26 0.05 Penalty 33.09% 29.86% 73.38% With tight memory (19m heap): With external annotations Average time: real 3.56 user 11.47 system 0.27 Aver. deviation: 0.26 0.84 0.04 Penalty 44.44% 46.88% 48.04% With plenty of heap space, there's a small gain (1 or 2%) which get's more significant when operating near the limit of OOME - lot's of gc activity: Cutting down from 60% penalty to 46% penalty isn't too bad :) I've pushed this (simple) optimization to the branch. More like that _might_ still be possible in various branches of the walk - to be remembered if we run into performance trouble with lots of JSR 308 annotations. Meanwhile I implemented simple support for zip files as an alternative and repeated the experiments from comment 9 f.. Normal load: With zipped external annotations Average time: real 1.85 user 5.63 system 0.15 Aver. deviation: 0.06 0.19 0.03 Penalty 18.88% 19.41% 27.97% With tight memory (19m heap): With zipped external annotations Average time: real 4.22 user 14.07 system 0.21 Aver. deviation: 0.22 0.78 0.02 Penalty 37.88% 40.88% 23.35% (Penalty computed relative to today's reference of plain compilation without null annotations; absolute numbers slightly vary between previous measurements and today). => At normal load, using a zip file even on linux with SSD cuts down the penalty by more than 1/3 (compared to penalty in comment 10). => With tight memory the improvement is still noticeable but less. I've pushed the changes for zip-file support to the branch. @Frits: heads-up: I had to change some internal signatures, notably ExternalAnnotationProvider is now constructed with a InputStream ready for reading. Hope this doesn't interfere with your efforts to implement an alternate provider. Summing up:
We know how to stress tests the implementation: simply force loading of thousands of classes which all have external annotations attached, without performing any interesting compilation.
We have some rough figures of the worst-case performance penalty incurred by external annotations in various use cases:
Most influential factors:
- accessing individual files vs. entries in a zip file
- plenty of RAM vs. tight RAM forcing frequent GC runs
Worst-case penalties:
plenty of RAM tight RAM
Files 30% 45%
Zip 20% 40%
My interpretation is that file I/O and GC are setting a lower bound on the required effort.
Other than that there's little we can learn from these "absolute" figures, since we didn't have any alternative implementations to compare with.
Verified for 4.5 M6 with build I20150317-2000 |