Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 345782 - [server] Periodic SolrException: An invalid XML character (Unicode: 0xfffe) in console
Summary: [server] Periodic SolrException: An invalid XML character (Unicode: 0xfffe) i...
Status: RESOLVED FIXED
Alias: None
Product: Orion
Classification: ECD
Component: Client (show other bugs)
Version: 0.2   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 0.2   Edit
Assignee: John Arthorne CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-05-13 16:11 EDT by Andrew Niefer CLA
Modified: 2011-09-01 11:42 EDT (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Niefer CLA 2011-05-13 16:11:49 EDT
Running the latest Orion build in the console, I am getting the following periodic error:



!ENTRY org.eclipse.orion.server.core.search 4 0 2011-05-13 16:09:28.672
!MESSAGE Error during search indexing
!STACK 0
org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.common.SolrException: An invalid XML character (Unicode: 0xfffe) was found in the element content of the document.
	at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:153)
	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64)
	at org.eclipse.orion.internal.server.search.Indexer.indexProject(Indexer.java:141)
	at org.eclipse.orion.internal.server.search.Indexer.run(Indexer.java:201)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
Caused by: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.common.SolrException: An invalid XML character (Unicode: 0xfffe) was found in the element content of the document.
	at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:141)
	... 5 more
Caused by: org.apache.solr.common.SolrException: An invalid XML character (Unicode: 0xfffe) was found in the element content of the document.
	at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:72)
	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
	at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139)
	... 5 more
Caused by: javax.xml.stream.XMLStreamException: An invalid XML character (Unicode: 0xfffe) was found in the element content of the document.
	at com.ibm.xml.xlxp.api.stax.msg.StAXMessageProvider.throwWrappedXMLStreamException(StAXMessageProvider.java:73)
	at com.ibm.xml.xlxp.api.stax.XMLStreamReaderImpl.produceFatalErrorEvent(XMLStreamReaderImpl.java:2103)
	at com.ibm.xml.xlxp.api.stax.XMLStreamReaderImpl.reportFatalError(XMLStreamReaderImpl.java:2109)
	at com.ibm.xml.xlxp.scan.DocumentEntityScanner.reportFatalError(DocumentEntityScanner.java:479)
	at com.ibm.xml.xlxp.scan.DocumentEntityScanner.scanContentBuffered(DocumentEntityScanner.java:1897)
	at com.ibm.xml.xlxp.api.util.SimpleScannerHelper.scanContentBuffered(SimpleScannerHelper.java:1233)
	at com.ibm.xml.xlxp.scan.DocumentEntityScanner.stateBufferedContent(DocumentEntityScanner.java:602)
	at com.ibm.xml.xlxp.scan.DocumentEntityScanner.produceEvent(DocumentEntityScanner.java:650)
	at com.ibm.xml.xlxp.api.stax.XMLStreamReaderImpl.getNextScannerEvent(XMLStreamReaderImpl.java:1645)
	at com.ibm.xml.xlxp.api.stax.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:536)
	at com.ibm.xml.xlxp.api.stax.XMLInputFactoryImpl$XMLStreamReaderProxy.next(XMLInputFactoryImpl.java:180)
	at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:273)
	at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138)
	at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
	... 9 more
Comment 1 John Arthorne CLA 2011-05-13 17:01:01 EDT
Is this on a brand new empty workspace, or after you have content in there?
Comment 2 Andrew Niefer CLA 2011-05-13 17:25:41 EDT
This is not happening on a fresh install with clean workspace.  I have not been able to recreate a workspace that produces these errors.

I still have the workspace where the error is occuring, there was content here, however it has somehow all disappeared and my navigator is empty.  At one point it contained links on the filesystem to the org.eclipse.orion.client/bundles repository as well as the contents of requirejs-0.24.0 and the results of a local test build which would have contained binary files.
Comment 3 Andrew Niefer CLA 2011-05-13 17:40:22 EDT
A debugger on the stream I believe solr is trying to read shows:

req : SolrRequestParsers$1
  -> streams
       -> ContentStreamBase$StringStream
           -> str :
<add><doc boost="1.0"><field name="Id">file:/eclipse/git/web/org.eclipse.orion.server/releng/org.eclipse.orion.releng/build/N201105121647/repo/binary/org.eclipse.equinox.executable_root.win32.win32.ia64_3.5.0.v20110505-7P7NFUFFLWUl76mam1</field><field name="Name">org.eclipse.equinox.executable_root.win32.win32.ia64_3.5.0.v20110505-7P7NFUFFLWUl76mam1</field><field name="Length">120729</field><field name="Directory">false</field><field name="LastModified">1305233511000</field><field name="Location">/file/C/releng/org.eclipse.orion.releng/build/N201105121647/repo/binary/org.eclipse.equinox.executable_root.win32.win32.ia64_3.5.0.v20110505-7P7NFUFFLWUl76mam1</field><field name="Text">PK#3;#4;#20;#0;#8;#0;#8;#0;= ..... <binary content>


This file "org.eclipse.equinox.executable_root.win32.win32.ia64_3.5.0.v20110505-7P7NFUFFLWUl76mam1" is is a binary zip archive.
Comment 4 John Arthorne CLA 2011-05-16 10:28:10 EDT
I should add "no file extension" to the list of file extensions excluded by the indexer. Our editor can't open files with no extension anyway.
Comment 6 Andrew Niefer CLA 2011-05-16 15:10:28 EDT
(In reply to comment #4)
> I should add "no file extension" to the list of file extensions excluded by the
> indexer. Our editor can't open files with no extension anyway.

Is this file really no extension?  Or does it have extension "v20110505-7P7NFUFFLWUl76mam1" ?
Comment 7 John Arthorne CLA 2011-05-16 15:18:40 EDT
(In reply to comment #6)
> Is this file really no extension?  Or does it have extension
> "v20110505-7P7NFUFFLWUl76mam1" ?

True. I'm inclined to leave it though. We log an exception if the indexer fails to index something, but there are no other side-effects (we just skip the single file and go onto the next one). In the long run we need some kind of content-type analysis to auto-detect binary vs text.