Community
Participate
Working Groups
if gyrex is shutdown, the running jobs get canceled, but their job status still stays hanging on the last status, which blocks further executions of a scheduled job after restart or scheduler enable. Furthermore the duplicate saving of job status can lead to inconsistent information: - the job status is kept in the context tree on the job itself and on the nodes for each status, like /gyrex/prefs/cloud/org.eclipse.gyrex.jobs/status/WAITING - if this two places get out of sync for whatever reasons, its hard to operate the system correctly
A simple "hung" detection has been implemented in bug 356799. The remaining issue is indeed the duplication of status. This has been introduced for performance reasons because there is no efficient way to query for a specific status in the preference tree.
Fixed. The state node has been removed all together. This makes lookups by state more expensive but it avoids duplication. We may reconsider when lookups actually become a bottleneck. Additionally, the hung detection has been further improved to also discover jobs hung in the WAITING state which are no longer in a queue but already scheduled on a worker.