Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 325449

Summary: JDBC Crawler: grouping causes "record not in cache" crawler exception
Product: z_Archived Reporter: arne anka <eclipse-bugs>
Component: SmilaAssignee: Andreas Weber <Andreas.Weber>
Status: CLOSED WONTFIX QA Contact:
Severity: normal    
Priority: P3 CC: igor.novakovic, svoigt.brox
Version: unspecifiedKeywords: helpwanted
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Whiteboard:

Description arne anka CLA 2010-09-16 08:18:43 EDT
Build Identifier: 

i got my jdbc crawler working w/o grouping and processing about 11000 entries in one go works.
enabling grouping for the very same query makes SMILA stop after 3 or 4 groups and print 

 2010-09-16 13:47:08,848 ERROR [Thread-11                                    ]  impl.CrawlThread                              
	- Error while processing record with Id jdbc of dataSourceId 
	 org.eclipse.smila.connectivity.framework.CrawlerException: The requested record with id [src:jdbc|key:<FOO>] was not found in the Crawler's cache
        at org.eclipse.smila.connectivity.framework.crawler.jdbc.JdbcCrawler.getMObject(JdbcCrawler.java:1118)
        at org.eclipse.smila.connectivity.framework.util.internal.DataReferenceImpl.getRecord(DataReferenceImpl.java:100)
        at org.eclipse.smila.connectivity.framework.impl.CrawlThread.processDataReferences(CrawlThread.java:352)
        at org.eclipse.smila.connectivity.framework.impl.CrawlThread.run(CrawlThread.java:235)

the crawler does not exit, but waits forever until i kill it.


Reproducible: Always
Comment 1 arne anka CLA 2010-09-17 04:17:51 EDT
not sure, if it is limited to grouping.
at least, it stops after a while and never restarts (after 15 hrs it still was in "Crawl States... {jdbc=Running}.") -- which seems to be caused by waiting at the synchronized block for the lock:

JdbcCrawler.getNext() {
...
      synchronized (_openedMonitor) {
            final List<DataReference> tempList = new ArrayList<DataReference>();
            tempList.add(dataRef);
            final int size = _internalQueue.drainTo(tempList, MAX_QUEUE_SIZE - 1);
            _performanceCounters.incrementBy(POC_DATA_REFS_RETRIEVED_BY_CLIENT, size + 1);
            return tempList.toArray(new DataReference[size + 1]);
      }
}

whereas the "record not in cache" error seems to be caused by deleting the entry just before processing it:
befor each exception, dispose() is called on the very same id, which indicates yet anther threading bug.

i set the importance to "blocker", since the jdbc crawler is unsuable.
Comment 2 arne anka CLA 2010-10-07 03:43:52 EDT
feels, like i am talking to myself here, even with a severity of "blocker".
SMILA's dead, is it?

anyway. 
new findings:
- seems to boild down to a database request not coming back. the table i crawl is rather large (1.4+ mio rows) and the query has several left joins.
- due to the usual poor documentation i cannot figure out, were to set a reasonable timeout, but nevertheless 16h _should_ trigger a timeout and an exception instead of idly sitting around waiting for kingdom come.

- limiting the threads to 1 (no thanks to that feature being not documented anywhere), limiting the connection to exactly one in code (not sure, how much any config file has any effect), replacing those, imo, rather inefficient BETWEEN ... AND ... groupings with simple LIMIT ... OFFSET ..., and removing all synchronized(...), the crawler went over 1 mio rows in 16h, without being stuck.

imo, there are severe design flaws in the  current jdbc crawler, which is rather complex and not very legible.
Comment 3 Sebastian Voigt CLA 2010-10-25 04:16:18 EDT
thanks for your detailed bug-report.

Smila is not dead :),
but we are also working a lot on customer projects (around smila), had vacations and were really really busy...

I hope i'll have free resources to have a look on your bug, and we are looking forward to work with you and to improve the jdbc crawler.

If you have in the meanwhile solved some things or have changed the code (improvements) feel free to report it here. We will check it and commit it.
Comment 4 arne anka CLA 2010-10-25 06:55:47 EDT
well, i stopped working with SMILA (moved on to other stuff).
but one last remark:
to me, i t turned out to be one of the synchronized() clauses, in getNext() i think.
removing the clause and making the whole method synchronized solved the lock (though i am not sure, inhowfar the synchronization is warranted, the crawler being limited to a single thread).

when the crawler fails, eg because of heap space errors or a disk filling up, apparently there are stored many messages somewhere (no clear indication as to where) and tried to process when the crawler or smila are restarted -- since the backing data is gonte, it never succeeeds, driving the cpu to over 100% and filling up the log rapidly with several hundred of "error in ... processMessage" lines. there's a cryptic mentioning of a setting "vm://localhost?retryAttempts=2" or such like (from the top of my head, no code available) but it does not work reliable, the only really solution being to remove the whole workspace/ folder created below SMILA.application/ -- which is obviously no viable solution for a production environment.

not sure, inhowfar it is related. but i encountered it all in the same setting.
Comment 5 Igor Novakovic CLA 2010-11-30 06:29:29 EST
Daniel, can you please take a look at this?
Comment 6 Daniel Stucky CLA 2010-12-01 12:19:32 EST
I tried to take a look at it today, but this seems to be a major issue. I'm not sure if its a "simple" synchronization bug or a design bug. This problem requires thorough analysis. I currently do not have the resources (database + lots of data) nor time to do this by myself. It would be great if the initial committer (Michael Breidenband) could take a look at this.
Comment 7 Igor Novakovic CLA 2011-02-07 10:45:09 EST
Hi Thomas,

any chance of you taking a look at this?

Cheers
Igor
Comment 8 thomas menzel CLA 2011-02-07 11:03:13 EST
ATM i dont see this on my road map for the near future unless this becomes part of my project. but we can leave it assigned to me and if someone else needs it in the community and wants to do this himself I can be the contact person.
Comment 9 Andreas Weber CLA 2012-10-15 11:42:00 EDT
Implementation doesn't exist anymore. Connectivity framework was replaced by new Importing framework. Grouping will be reimplemented.