Community
Participate
Working Groups
Running the latest Orion build in the console, I am getting the following periodic error: !ENTRY org.eclipse.orion.server.core.search 4 0 2011-05-13 16:09:28.672 !MESSAGE Error during search indexing !STACK 0 org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.common.SolrException: An invalid XML character (Unicode: 0xfffe) was found in the element content of the document. at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:153) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64) at org.eclipse.orion.internal.server.search.Indexer.indexProject(Indexer.java:141) at org.eclipse.orion.internal.server.search.Indexer.run(Indexer.java:201) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54) Caused by: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.common.SolrException: An invalid XML character (Unicode: 0xfffe) was found in the element content of the document. at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:141) ... 5 more Caused by: org.apache.solr.common.SolrException: An invalid XML character (Unicode: 0xfffe) was found in the element content of the document. at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:72) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139) ... 5 more Caused by: javax.xml.stream.XMLStreamException: An invalid XML character (Unicode: 0xfffe) was found in the element content of the document. at com.ibm.xml.xlxp.api.stax.msg.StAXMessageProvider.throwWrappedXMLStreamException(StAXMessageProvider.java:73) at com.ibm.xml.xlxp.api.stax.XMLStreamReaderImpl.produceFatalErrorEvent(XMLStreamReaderImpl.java:2103) at com.ibm.xml.xlxp.api.stax.XMLStreamReaderImpl.reportFatalError(XMLStreamReaderImpl.java:2109) at com.ibm.xml.xlxp.scan.DocumentEntityScanner.reportFatalError(DocumentEntityScanner.java:479) at com.ibm.xml.xlxp.scan.DocumentEntityScanner.scanContentBuffered(DocumentEntityScanner.java:1897) at com.ibm.xml.xlxp.api.util.SimpleScannerHelper.scanContentBuffered(SimpleScannerHelper.java:1233) at com.ibm.xml.xlxp.scan.DocumentEntityScanner.stateBufferedContent(DocumentEntityScanner.java:602) at com.ibm.xml.xlxp.scan.DocumentEntityScanner.produceEvent(DocumentEntityScanner.java:650) at com.ibm.xml.xlxp.api.stax.XMLStreamReaderImpl.getNextScannerEvent(XMLStreamReaderImpl.java:1645) at com.ibm.xml.xlxp.api.stax.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:536) at com.ibm.xml.xlxp.api.stax.XMLInputFactoryImpl$XMLStreamReaderProxy.next(XMLInputFactoryImpl.java:180) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:273) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) ... 9 more
Is this on a brand new empty workspace, or after you have content in there?
This is not happening on a fresh install with clean workspace. I have not been able to recreate a workspace that produces these errors. I still have the workspace where the error is occuring, there was content here, however it has somehow all disappeared and my navigator is empty. At one point it contained links on the filesystem to the org.eclipse.orion.client/bundles repository as well as the contents of requirejs-0.24.0 and the results of a local test build which would have contained binary files.
A debugger on the stream I believe solr is trying to read shows: req : SolrRequestParsers$1 -> streams -> ContentStreamBase$StringStream -> str : <add><doc boost="1.0"><field name="Id">file:/eclipse/git/web/org.eclipse.orion.server/releng/org.eclipse.orion.releng/build/N201105121647/repo/binary/org.eclipse.equinox.executable_root.win32.win32.ia64_3.5.0.v20110505-7P7NFUFFLWUl76mam1</field><field name="Name">org.eclipse.equinox.executable_root.win32.win32.ia64_3.5.0.v20110505-7P7NFUFFLWUl76mam1</field><field name="Length">120729</field><field name="Directory">false</field><field name="LastModified">1305233511000</field><field name="Location">/file/C/releng/org.eclipse.orion.releng/build/N201105121647/repo/binary/org.eclipse.equinox.executable_root.win32.win32.ia64_3.5.0.v20110505-7P7NFUFFLWUl76mam1</field><field name="Text">PK#3;#4;#20;#0;#8;#0;#8;#0;= ..... <binary content> This file "org.eclipse.equinox.executable_root.win32.win32.ia64_3.5.0.v20110505-7P7NFUFFLWUl76mam1" is is a binary zip archive.
I should add "no file extension" to the list of file extensions excluded by the indexer. Our editor can't open files with no extension anyway.
http://git.eclipse.org/c/e4/org.eclipse.orion.server.git/commit/?id=683ca47be7762c6446d5e2121247537c5a7d8fa2
(In reply to comment #4) > I should add "no file extension" to the list of file extensions excluded by the > indexer. Our editor can't open files with no extension anyway. Is this file really no extension? Or does it have extension "v20110505-7P7NFUFFLWUl76mam1" ?
(In reply to comment #6) > Is this file really no extension? Or does it have extension > "v20110505-7P7NFUFFLWUl76mam1" ? True. I'm inclined to leave it though. We log an exception if the indexer fails to index something, but there are no other side-effects (we just skip the single file and go onto the next one). In the long run we need some kind of content-type analysis to auto-detect binary vs text.