| Summary: | Gzip encoded site deliver non content | ||
|---|---|---|---|
| Product: | z_Archived | Reporter: | nils.thieme |
| Component: | Smila | Assignee: | Project Inbox <smila.irms-inbox> |
| Status: | CLOSED WONTFIX | QA Contact: | |
| Severity: | enhancement | ||
| Priority: | P3 | CC: | andreas.schank, daniel.stucky |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Whiteboard: | |||
We have noticed that we set a property "MaxLengthBytes" in the web.xml file to 0. We assumed that this means that there is no restriction to the size. It would be nice if this actual behaviour can be changed. Hi Nils, I added a check if sizeLimit is > 0 to GZIPUtils.unzipBestEffort(byte[] in, int sizeLimit). Could you please check if this solves your problem ? Thanks, Daniel The Connectivity framework was replaced by new Importing framework. The new importing framework uses java.util.zip.GZIPInputStream directly. So IMHO, this bug entry is no longer relevant. Bye, Andreas |
Hello, if a website that is gzip compressed is crawled with the SMILA web crawler no content is received. This is due to a bug in the GZIPUtils class (method "unzipBestEffort(byte[], int)". The important line is 98: if ((written + size) > sizeLimit) { outStream.write(buf, 0, sizeLimit - written); ... } "sizeLimit" is set to 0 and "written" is also 0. So zero bytes will be written. We have crawld the single site: www,wanderkompass.de . The "sizeLimit" comes from a static property file (not the web.xml) what is strange because all properties of the crawler should be configurable by the web.xml file.