Community
Participate
Working Groups
The cleanup of old jobs currently happens outside the job lock. The idea was that this is not necessary. However, given the nature of a distributed environment it may happen that another node modifies the state of a job while the cleanup is running. The other node will then get an exception. Example Stack-Trace: ..e = NoNode for /gyrex/prefs/cloud/..jobs/jobs/myjobid/history/1320761969322 at ..cloud.internal.preferences.ZooKeeperBasedPreferences.createBackingStoreException(ZooKeeperBasedPreferences.java:262) at ..cloud.internal.preferences.ZooKeeperBasedPreferences.flush(ZooKeeperBasedPreferences.java:481) at ..jobs.internal.manager.JobManagerImpl.setJobState(JobManagerImpl.java:718) at ..jobs.internal.manager.JobManagerImpl.queueJob(JobManagerImpl.java:455) at ..jobs.internal.scheduler.SchedulingJob.execute(SchedulingJob.java:131) at org.quartz.core.JobRunShell.run(JobRunShell.java:216) Caused by: BackingStoreException: Error flushing node (node /cloud/..jobs/jobs/9myjobid/history). Error flushing node (node /cloud/..jobs/jobs/myjobid/history/1320761969322). KeeperErrorCode = NoNode for /gyrex/prefs/cloud/..jobs/jobs/myjobid/history/1320761969322 at ..cloud.internal.preferences.ZooKeeperBasedPreferences.createBackingStoreException(ZooKeeperBasedPreferences.java:262) at ..cloud.internal.preferences.ZooKeeperBasedPreferences.flush(ZooKeeperBasedPreferences.java:481) at ..cloud.internal.preferences.ZooKeeperBasedPreferences.saveChildren(ZooKeeperBasedPreferences.java:1191) at ..cloud.internal.preferences.ZooKeeperBasedPreferences.flush(ZooKeeperBasedPreferences.java:478) at ..jobs.internal.manager.JobManagerImpl.setJobState(JobManagerImpl.java:718) at ..jobs.internal.manager.JobManagerImpl.queueJob(JobManagerImpl.java:455) Caused by: BackingStoreException: Error flushing node (node /cloud/..jobs/jobs/9myjobid/history/1320761969322). KeeperErrorCode = NoNode for /gyrex/prefs/cloud/..jobs/jobs/myjobid/history/1320761969322 at ..cloud.internal.preferences.ZooKeeperBasedPreferences.createBackingStoreException(ZooKeeperBasedPreferences.java:262) at ..cloud.internal.preferences.ZooKeeperBasedPreferences.flush(ZooKeeperBasedPreferences.java:481) at ..cloud.internal.preferences.ZooKeeperBasedPreferences.saveChildren(ZooKeeperBasedPreferences.java:1191) at ..cloud.internal.preferences.ZooKeeperBasedPreferences.flush(ZooKeeperBasedPreferences.java:478) at ..cloud.internal.preferences.ZooKeeperBasedPreferences.saveChildren(ZooKeeperBasedPreferences.java:1191) at ..cloud.internal.preferences.ZooKeeperBasedPreferences.flush(ZooKeeperBasedPreferences.java:478) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /gyrex/prefs/cloud/..jobs/jobs/myjobid/history/1320761969322 at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038) at ..cloud.internal.preferences.ZooKeeperPreferencesService$WriteProperties.call(ZooKeeperPreferencesService.java:647) at ..cloud.internal.preferences.ZooKeeperPreferencesService$WriteProperties.call(ZooKeeperPreferencesService.java:1) at ..cloud.internal.zk.ZooKeeperBasedService$ZooKeeperCallable.call(ZooKeeperBasedService.java:37)
I added the lock and also moved the history out of the jobs node. This should reduce the conflicts during flush.