Community
Participate
Working Groups
On September 16th, at 12:52:59, orionhub.org started reporting problems with running out of file handles. This was after about five weeks of continuous uptime, so it looks like we have a slow leak somewhere. I will attach a log file with some sample stack traces. It generated 75MB of error logs over the three days before it was noticed but they all appear roughly the same.
Created attachment 203621 [details] Partial log
This one is particularly interesting: 2011-09-19 14:07:25.299 [Finalizer thread] ERROR o.apache.solr.update.SolrIndexWriter - SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
This happened again on orion.eclipse.org, but this time after only five days of uptime. I'll investigate.
Created attachment 204177 [details] Handle log file This log prints total file handle usage of the orion.eclipse.org server, using lsof. The first column is the date/time, second column is total handles used by the process, and the third column is the handles used by solr/lucene. It's pretty clear there is a slow but steady leak of handles by the search indexer. The search indexer went from 90 to 312 handles used, with lots of ups and downs along the way. Since all our Orion interaction with search happens via HTTP requests to the solr server, I don't think there is anything in our code that could cause this. From reading solr 1.4.1 release notes it seems there were a number of known file handle leaks in solr 1.4.0 that are likely behind this: https://issues.apache.org/jira/browse/SOLR-1744 https://issues.apache.org/jira/browse/SOLR-1745 https://issues.apache.org/jira/browse/SOLR-1746 https://issues.apache.org/jira/browse/SOLR-1747 https://issues.apache.org/jira/browse/SOLR-1748
Chart of handle usage over time for orion.eclipse.org <img src="https://docs.google.com/spreadsheet/oimg?key=0ArqzCH6xAv4wdHJJaFE5aExnTnpOUkJwOURUS013eGc&oid=2&zx=t17pjonxtg8e" />
Here is an updated chart of handle growth on January 26, 2012. Running candidate build for Orion 0.4 M2: https://docs.google.com/spreadsheet/ccc?key=0ArqzCH6xAv4wdGwwTnM3M1h4SDRCTWMxckVmTUN5Zmc#gid=1 The non-search file handles spike up and down but overall stay roughly flat. There are handle counts after 5pm that are lower than it was at 8:50am. The search handles on the other hand inch steadily upwards all day.
This was resolved by moving to Solr/Lucene 3.5. I haven't seen signs of a leak since upgrading Lucene.