Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 345782

Summary: [server] Periodic SolrException: An invalid XML character (Unicode: 0xfffe) in console
Product: [ECD] Orion Reporter: Andrew Niefer <aniefer>
Component: ClientAssignee: John Arthorne <john.arthorne>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3    
Version: 0.2   
Target Milestone: 0.2   
Hardware: PC   
OS: Linux   
Whiteboard:

Description Andrew Niefer CLA 2011-05-13 16:11:49 EDT
Running the latest Orion build in the console, I am getting the following periodic error:



!ENTRY org.eclipse.orion.server.core.search 4 0 2011-05-13 16:09:28.672
!MESSAGE Error during search indexing
!STACK 0
org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.common.SolrException: An invalid XML character (Unicode: 0xfffe) was found in the element content of the document.
	at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:153)
	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64)
	at org.eclipse.orion.internal.server.search.Indexer.indexProject(Indexer.java:141)
	at org.eclipse.orion.internal.server.search.Indexer.run(Indexer.java:201)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
Caused by: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.common.SolrException: An invalid XML character (Unicode: 0xfffe) was found in the element content of the document.
	at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:141)
	... 5 more
Caused by: org.apache.solr.common.SolrException: An invalid XML character (Unicode: 0xfffe) was found in the element content of the document.
	at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:72)
	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
	at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139)
	... 5 more
Caused by: javax.xml.stream.XMLStreamException: An invalid XML character (Unicode: 0xfffe) was found in the element content of the document.
	at com.ibm.xml.xlxp.api.stax.msg.StAXMessageProvider.throwWrappedXMLStreamException(StAXMessageProvider.java:73)
	at com.ibm.xml.xlxp.api.stax.XMLStreamReaderImpl.produceFatalErrorEvent(XMLStreamReaderImpl.java:2103)
	at com.ibm.xml.xlxp.api.stax.XMLStreamReaderImpl.reportFatalError(XMLStreamReaderImpl.java:2109)
	at com.ibm.xml.xlxp.scan.DocumentEntityScanner.reportFatalError(DocumentEntityScanner.java:479)
	at com.ibm.xml.xlxp.scan.DocumentEntityScanner.scanContentBuffered(DocumentEntityScanner.java:1897)
	at com.ibm.xml.xlxp.api.util.SimpleScannerHelper.scanContentBuffered(SimpleScannerHelper.java:1233)
	at com.ibm.xml.xlxp.scan.DocumentEntityScanner.stateBufferedContent(DocumentEntityScanner.java:602)
	at com.ibm.xml.xlxp.scan.DocumentEntityScanner.produceEvent(DocumentEntityScanner.java:650)
	at com.ibm.xml.xlxp.api.stax.XMLStreamReaderImpl.getNextScannerEvent(XMLStreamReaderImpl.java:1645)
	at com.ibm.xml.xlxp.api.stax.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:536)
	at com.ibm.xml.xlxp.api.stax.XMLInputFactoryImpl$XMLStreamReaderProxy.next(XMLInputFactoryImpl.java:180)
	at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:273)
	at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138)
	at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
	... 9 more
Comment 1 John Arthorne CLA 2011-05-13 17:01:01 EDT
Is this on a brand new empty workspace, or after you have content in there?
Comment 2 Andrew Niefer CLA 2011-05-13 17:25:41 EDT
This is not happening on a fresh install with clean workspace.  I have not been able to recreate a workspace that produces these errors.

I still have the workspace where the error is occuring, there was content here, however it has somehow all disappeared and my navigator is empty.  At one point it contained links on the filesystem to the org.eclipse.orion.client/bundles repository as well as the contents of requirejs-0.24.0 and the results of a local test build which would have contained binary files.
Comment 3 Andrew Niefer CLA 2011-05-13 17:40:22 EDT
A debugger on the stream I believe solr is trying to read shows:

req : SolrRequestParsers$1
  -> streams
       -> ContentStreamBase$StringStream
           -> str :
<add><doc boost="1.0"><field name="Id">file:/eclipse/git/web/org.eclipse.orion.server/releng/org.eclipse.orion.releng/build/N201105121647/repo/binary/org.eclipse.equinox.executable_root.win32.win32.ia64_3.5.0.v20110505-7P7NFUFFLWUl76mam1</field><field name="Name">org.eclipse.equinox.executable_root.win32.win32.ia64_3.5.0.v20110505-7P7NFUFFLWUl76mam1</field><field name="Length">120729</field><field name="Directory">false</field><field name="LastModified">1305233511000</field><field name="Location">/file/C/releng/org.eclipse.orion.releng/build/N201105121647/repo/binary/org.eclipse.equinox.executable_root.win32.win32.ia64_3.5.0.v20110505-7P7NFUFFLWUl76mam1</field><field name="Text">PK#3;#4;#20;#0;#8;#0;#8;#0;= ..... <binary content>


This file "org.eclipse.equinox.executable_root.win32.win32.ia64_3.5.0.v20110505-7P7NFUFFLWUl76mam1" is is a binary zip archive.
Comment 4 John Arthorne CLA 2011-05-16 10:28:10 EDT
I should add "no file extension" to the list of file extensions excluded by the indexer. Our editor can't open files with no extension anyway.
Comment 6 Andrew Niefer CLA 2011-05-16 15:10:28 EDT
(In reply to comment #4)
> I should add "no file extension" to the list of file extensions excluded by the
> indexer. Our editor can't open files with no extension anyway.

Is this file really no extension?  Or does it have extension "v20110505-7P7NFUFFLWUl76mam1" ?
Comment 7 John Arthorne CLA 2011-05-16 15:18:40 EDT
(In reply to comment #6)
> Is this file really no extension?  Or does it have extension
> "v20110505-7P7NFUFFLWUl76mam1" ?

True. I'm inclined to leave it though. We log an exception if the indexer fails to index something, but there are no other side-effects (we just skip the single file and go onto the next one). In the long run we need some kind of content-type analysis to auto-detect binary vs text.