Community
Participate
Working Groups
Currently, the HTMLResourceEncodingDetector does a ready() check before reading a byte. For sufficiently large files, or when the detector has to run over several files, this can cause the checkheuristics() method to take up a considerable amount of time. I don't think anything is bought by bailing out when a resource isn't ready and may be more troublesome since we can't check the heuristics to determine the encoding. In running tests, I noticed a performance gain of nearly 82% from just not doing the ready() check.
Created attachment 186446 [details] patch
This isn't new, but what if fReader.read() returns -1?
Created attachment 186668 [details] patch with end-of-stream check
Code checked in. Thanks.