| Summary: | While long crawling massive temporary files | ||
|---|---|---|---|
| Product: | z_Archived | Reporter: | nils.thieme |
| Component: | Smila | Assignee: | Andreas Weber <Andreas.Weber> |
| Status: | CLOSED FIXED | QA Contact: | |
| Severity: | enhancement | ||
| Priority: | P3 | CC: | daniel.stucky, marco.strack |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Whiteboard: | |||
|
Description
nils.thieme
Yes, this is an important issue. We will address it in the next release when we redesign the connectivity concept. Connectivity has been replaced by new Importing framework. But we still have problems with massive temporary files during import. So I leave this issue open and just changed its component setting. We have the following problem: "run once" jobs (e.g. crawl jobs) only have one workflow run. Temp objects in the job management are removed not before a whole workflow run is completed, so they are not removed after a succesful task. The reason for this is, that the input object could be shared between workers in the workflow. The idea: We try to identify if a workflow has workers (resp. actions) that share the same input bucket. If that's not the case, we call these workflows "non-forking". For non-forking workflows, we change the clean up of the temp objects in the job mgmt: After each successful task the input object of the worker can be removed (cause there will be no other worker working on the same object). Typically, crawl workflows are non-forking. The above is implemented now: (Non-persistent) input objects of workers from non-forking workflows will be removed after the worker has successfully completed its task. For most cases (especially typical crawl workflows) this should be sufficient to avoid the massive accumulation of temp objects in the objectstore. However, it could be further improved (a) by checking the non-forking condition not on the workflow but more granular for each workflow bucket and (b) by implementing a logic for forking buckets too (-> do clean up if all workers using that input bucket have successfully finished their task). added SMILA Documentation: http://wiki.eclipse.org/SMILA/Documentation/WorkerAndWorkflows#Non-forking_workflows closed |