Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 357802

Summary: Web Crawler: depends on fixed "Url" attribute mapping (NullPointerException)
Product: z_Archived Reporter: Nadine Ausländer <nadine.auslaender>
Component: SmilaAssignee: Juergen Schumacher <juergen.schumacher>
Status: CLOSED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: igor.novakovic
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows 7   
Whiteboard:

Description Nadine Ausländer CLA 2011-09-15 09:47:07 EDT
NullPointerException when the attribute mapping in the data source does not contain the fixed mapping of attribute "Url" to field attribute "Url".

Changed the mapping in the default data source "web.xml" (see "configuration\org.eclipse.smila.connectivity.framework")

from:

<DataSourceConnectionConfig ...>
  <DataSourceID>web</DataSourceID>
  <SchemaID>org.eclipse.smila.connectivity.framework.crawler.web</SchemaID>
  <Attributes>
    <Attribute Type="String" Name="Url" KeyAttribute="true">
      <FieldAttribute>Url</FieldAttribute>
    </Attribute>
    ...
  </Attribute>
</DataSourceConnectionConfig>


to:

<DataSourceConnectionConfig ...>
  <DataSourceID>web</DataSourceID>
  <SchemaID>org.eclipse.smila.connectivity.framework.crawler.web</SchemaID>
  <Attributes>
    <Attribute Type="String" Name="MyUrl" KeyAttribute="true">
      <FieldAttribute>Url</FieldAttribute>
    </Attribute>
    ...
  </Attribute>
</DataSourceConnectionConfig>


Error message is:

errorBuffer: "[--- 2011-09-15 15:34:11.587 --- org.eclipse.smila.connectivity.framework.CrawlerException: java.lang.NullPointerException at org.eclipse.smila.connectivity.framework.crawler.web.WebCrawler.getMetadata(WebCrawler.java:353) at org.eclipse.smila.connectivity.framework.util.internal.DataReferenceImpl.getRecord(DataReferenceImpl.java:126) at org.eclipse.smila.connectivity.framework.impl.CrawlThread.updateDataReference(CrawlThread.java:389) at org.eclipse.smila.connectivity.framework.impl.CrawlThread.processDataReference(CrawlThread.java:342) at org.eclipse.smila.connectivity.framework.impl.CrawlThread.processDataReferences(CrawlThread.java:308) at org.eclipse.smila.connectivity.framework.impl.CrawlThread.run(CrawlThread.java:194) Caused by: java.lang.NullPointerException at org.apache.commons.codec.digest.DigestUtils.md5(DigestUtils.java:86) at org.apache.commons.codec.digest.DigestUtils.md5Hex(DigestUtils.java:108) at org.eclipse.smila.connectivity.framework.crawler.web.WebCrawler.getRecord(WebCrawler.java:575) at org.eclipse.smila.connectivity.framework.crawler.web.WebCrawler.getMetadata(WebCrawler.java:351) ... 5 more , --- 2011-09-15 15:34:12.041 ---

Used the following config as a work-around:

<DataSourceConnectionConfig ...>
  <DataSourceID>web</DataSourceID>
  <SchemaID>org.eclipse.smila.connectivity.framework.crawler.web</SchemaID>
  <Attributes>
    <Attribute Type="String" Name="Url" KeyAttribute="true">
      <FieldAttribute>Url</FieldAttribute>
    </Attribute>
    <Attribute Type="String" Name="MyUrl" KeyAttribute="true">
      <FieldAttribute>Url</FieldAttribute>
    </Attribute>
    ...
  </Attribute>
</DataSourceConnectionConfig>
Comment 1 Juergen Schumacher CLA 2011-09-21 07:20:52 EDT
fixed in rev. 1683
Comment 2 Andreas Weber CLA 2013-04-15 11:50:07 EDT
Closing this