| Summary: | [performance] improve getInputStreamAsCharArray | ||
|---|---|---|---|
| Product: | [Eclipse Project] JDT | Reporter: | Jörg Kubitz <jkubitz-eclipse> |
| Component: | Core | Assignee: | Jörg Kubitz <jkubitz-eclipse> |
| Status: | VERIFIED FIXED | QA Contact: | |
| Severity: | enhancement | ||
| Priority: | P3 | CC: | jarthana, manoj.palat |
| Version: | 4.20 | ||
| Target Milestone: | 4.21 M1 | ||
| Hardware: | All | ||
| OS: | All | ||
| See Also: |
https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/178470 https://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/?id=680081fceaccc04731bfd58151734b054356a6d1 |
||
| Whiteboard: | |||
| Bug Depends on: | 573239 | ||
| Bug Blocks: | |||
New Gerrit change created: https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/178470 Gerrit change https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/178470 was merged to [master]. Commit: http://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/?id=680081fceaccc04731bfd58151734b054356a6d1 Thanks Joerg for the fix; thanks Sravan for the review. Verified for 4.21 M1. |
getInputStreamAsCharArray is mainly used for reading .java files. Those file are typically small ~30kB. Therefore i suggest an implementation without streaming. I tested with the most used decodings and different file sizes. Measured on Windows 10, OpenJDK 64-Bit Server VM, 15.0.2+7 Measures in us/op: (decoding) (fileName) eclipse replacement ISO-8859-1 ISO-8859-1/100b.txt 66,286 +27,919 ISO-8859-1 ISO-8859-1/1k.txt 58,389 +29,639 ISO-8859-1 ISO-8859-1/10k.txt 63,795 +38,053 ISO-8859-1 ISO-8859-1/Util.java 129,951 +123,083 ISO-8859-1 ISO-8859-1/100k.txt 186,189 +147,224 ISO-8859-1 ISO-8859-1/1MB.txt 1438,149 +1280,615 ISO-8859-1 ISO-8859-1/10MB.txt 23749,335 +18320,580 //without BOM: UTF-8 100b.txt 60,636 +33,622 UTF-8 1k.txt 55,819 +32,100 UTF-8 10k.txt 76,664 +47,570 UTF-8 Util.java 117,312 +111,992 UTF-8 100k.txt +273,896 300,274 UTF-8 1MB.txt 2330,372 +2117,897 UTF-8 10MB.txt 56600,400 +54329,798 //with BOM: UTF-8 UTF-8/100b.txt 53,718 +29,147 UTF-8 UTF-8/1k.txt 57,248 +42,134 UTF-8 UTF-8/10k.txt 83,892 +48,226 UTF-8 UTF-8/Util.java 151,298 +83,403 UTF-8 UTF-8/100k.txt 310,702 +291,605 UTF-8 UTF-8/1MB.txt 2458,625 +2118,515 UTF-8 UTF-8/10MB.txt +39629,080 48314,118 ±2833 The new implementation is up to 2x faster for small files. For large files the conversion is mainly limited by RAM speed due to cache missess. This leads to less to no speedup on large files. There have been rare cases where the new implementation was slightly slower on big files, but since this is not the typical usecase we can ignore that.