Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 572372

Summary: [performance] improve getInputStreamAsCharArray
Product: [Eclipse Project] JDT Reporter: Jörg Kubitz <jkubitz-eclipse>
Component: CoreAssignee: Jörg Kubitz <jkubitz-eclipse>
Status: VERIFIED FIXED QA Contact:
Severity: enhancement    
Priority: P3 CC: jarthana, manoj.palat
Version: 4.20   
Target Milestone: 4.21 M1   
Hardware: All   
OS: All   
See Also: https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/178470
https://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/?id=680081fceaccc04731bfd58151734b054356a6d1
Whiteboard:
Bug Depends on: 573239    
Bug Blocks:    

Description Jörg Kubitz CLA 2021-03-28 13:15:55 EDT
getInputStreamAsCharArray is mainly used for reading .java files.
Those file are typically small ~30kB. Therefore i suggest an implementation without streaming.

I tested with the most used decodings and different file sizes.
Measured on Windows 10, OpenJDK 64-Bit Server VM, 15.0.2+7
Measures in us/op:

(decoding)           (fileName)    eclipse replacement
ISO-8859-1  ISO-8859-1/100b.txt     66,286    +27,919
ISO-8859-1    ISO-8859-1/1k.txt     58,389    +29,639
ISO-8859-1   ISO-8859-1/10k.txt     63,795    +38,053
ISO-8859-1 ISO-8859-1/Util.java    129,951   +123,083
ISO-8859-1  ISO-8859-1/100k.txt    186,189   +147,224
ISO-8859-1   ISO-8859-1/1MB.txt   1438,149  +1280,615
ISO-8859-1  ISO-8859-1/10MB.txt  23749,335 +18320,580
//without BOM:	   
     UTF-8             100b.txt     60,636    +33,622
     UTF-8               1k.txt     55,819    +32,100
     UTF-8              10k.txt     76,664    +47,570
     UTF-8            Util.java    117,312   +111,992
     UTF-8             100k.txt   +273,896    300,274
     UTF-8              1MB.txt   2330,372  +2117,897
     UTF-8             10MB.txt  56600,400 +54329,798
//with BOM:	   
     UTF-8       UTF-8/100b.txt     53,718    +29,147
     UTF-8         UTF-8/1k.txt     57,248    +42,134
     UTF-8        UTF-8/10k.txt     83,892    +48,226
     UTF-8      UTF-8/Util.java    151,298    +83,403
     UTF-8       UTF-8/100k.txt    310,702   +291,605
     UTF-8        UTF-8/1MB.txt   2458,625  +2118,515
     UTF-8       UTF-8/10MB.txt +39629,080  48314,118 ±2833

The new implementation is up to 2x faster for small files. For large files the conversion is mainly limited by RAM speed due to cache missess. This leads to less to no speedup on large files.
There have been rare cases where the new implementation was slightly slower on big files, but since this is not the typical usecase we can ignore that.
Comment 1 Eclipse Genie CLA 2021-03-28 13:28:04 EDT
New Gerrit change created: https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/178470
Comment 3 Manoj N Palat CLA 2021-06-17 09:39:18 EDT
Thanks Joerg for the fix; thanks Sravan for the review.
Comment 4 Jay Arthanareeswaran CLA 2021-07-08 01:28:52 EDT
Verified for 4.21 M1.