Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 315923

Summary: ConcurrentModificationException when shutting down PE resource manager
Product: [Tools] PTP Reporter: Greg Watson <g.watson>
Component: RM.PEAssignee: Randy Roberts <rsqrd>
Status: CLOSED WONTFIX QA Contact:
Severity: critical    
Priority: P1 CC: rsqrd
Version: unspecified   
Target Milestone: ---   
Hardware: Macintosh   
OS: Mac OS X   
Whiteboard:

Description Greg Watson CLA 2010-06-07 00:35:01 EDT
On the latest build I got the following error when shutting down the PE resource manager. I don't think this is PE specific however.

java.util.ConcurrentModificationException
	at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
	at java.util.HashMap$ValueIterator.next(HashMap.java:822)
	at org.eclipse.ptp.internal.core.elements.PJob.removeProcessesByJobRanks(PJob.java:381)
	at org.eclipse.ptp.internal.core.elements.PQueue.removeJobs(PQueue.java:181)
	at org.eclipse.ptp.rmsystem.AbstractResourceManager.removeQueues(AbstractResourceManager.java:1145)
	at org.eclipse.ptp.rmsystem.AbstractResourceManager.cleanUp(AbstractResourceManager.java:815)
	at org.eclipse.ptp.rmsystem.AbstractResourceManager.shutdown(AbstractResourceManager.java:580)
	at org.eclipse.ptp.ui.actions.StopResourceManagersObjectActionDelegate.run(StopResourceManagersObjectActionDelegate.java:60)
	at org.eclipse.ptp.ui.actions.AbstractResourceManagerSelectionActionDelegate.runWithEvent(AbstractResourceManagerSelectionActionDelegate.java:51)
	at org.eclipse.ui.internal.PluginAction.runWithEvent(PluginAction.java:241)
	at org.eclipse.jface.action.ActionContributionItem.handleWidgetSelection(ActionContributionItem.java:584)
	at org.eclipse.jface.action.ActionContributionItem.access$2(ActionContributionItem.java:501)
	at org.eclipse.jface.action.ActionContributionItem$5.handleEvent(ActionContributionItem.java:411)
	at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
	at org.eclipse.swt.widgets.Display.sendEvent(Display.java:3764)
	at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1343)
	at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1366)
	at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1351)
	at org.eclipse.swt.widgets.Widget.notifyListeners(Widget.java:1163)
	at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3610)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3265)
	at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2624)
	at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2588)
	at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2422)
	at org.eclipse.ui.internal.Workbench$7.run(Workbench.java:670)
	at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332)
	at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:663)
	at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)
	at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:115)
	at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
	at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
	at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:369)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:619)
	at org.eclipse.equinox.launcher.Main.basicRun(Main.java:574)
	at org.eclipse.equinox.launcher.Main.run(Main.java:1407)
	at org.eclipse.equinox.launcher.Main.main(Main.java:1383)
Comment 1 Randy Roberts CLA 2010-06-07 12:23:42 EDT
Is this reliably reproducible?

I can try to fix this, but with ConcurentModificationException's it
can be hit or miss.
Comment 2 Randy Roberts CLA 2010-06-07 12:26:12 EDT
(In reply to comment #1)
> Is this reliably reproducible?
> 
> I can try to fix this, but with ConcurentModificationException's it
> can be hit or miss.

I cannot reproduce with OpenMPI RM.
Comment 3 Greg Watson CLA 2010-06-07 12:47:41 EDT
I only saw it once.
Comment 4 Greg Watson CLA 2010-06-07 12:51:03 EDT
The only way I can see this occurring is if getAttributeIndexSet is called during removeProcessesByJobRanks
Comment 5 Greg Watson CLA 2010-06-07 12:52:04 EDT
So you probably need to synchronize processAttributesMap
Comment 6 Randy Roberts CLA 2010-06-07 12:55:25 EDT
(In reply to comment #5)
> So you probably need to synchronize processAttributesMap

I've synchronized the following PJob methods
  getAttributeIndexSet(D)
  getProcessAttributeKeys(BitSet)
  getProcessAttributes(BitSet)
  getProcessAttributes(String, BitSet)
  removeProcessesByJobRanks(BitSet)

I've committed these.
Comment 7 Greg Watson CLA 2010-06-07 13:00:55 EDT
Thanks. Let's hope that fixes the problem. I'll reopen if I see the problem again.
Comment 8 Greg Watson CLA 2010-08-19 15:25:45 EDT
Unfortunately this now causes a deadlock in the model when the debugger launches. Currently launching more than a one process debug job deadlocks the UI in HEAD.

The problem is that the resource manager calls PJob#addProcessesByJobRanks to add new processes to the job. This in turn calls PJob#fireNewProcesses to indicate the new processes have been created. The debug view has registered a handler on the job to get notified of process changes and is called when this happens. The handler in turn calls getAttribute() to find the status of the process (in order to display the correct icon). This calls PProcess#getAttribute which calls PJob#getProcessAttribute. Since both PJob#addProcessesByJobRanks and PJob#getProcessAttribute are synchronized, we get a deadlock.

The synchronization used on PJob (and the other model element classes) needs to be checked to ensure that this kind of deadlock can't occur. I'd say you need to use finer grain synchronization than the whole method, particularly on methods that invoke handlers.
Comment 9 Randy Roberts CLA 2010-08-19 15:31:26 EDT
(In reply to comment #8)

Do you think I can just exclude the firing of the handlers
from the synchronization?

> Unfortunately this now causes a deadlock in the model when the debugger
> launches. Currently launching more than a one process debug job deadlocks the
> UI in HEAD.
> 
> The problem is that the resource manager calls PJob#addProcessesByJobRanks to
> add new processes to the job. This in turn calls PJob#fireNewProcesses to
> indicate the new processes have been created. The debug view has registered a
> handler on the job to get notified of process changes and is called when this
> happens. The handler in turn calls getAttribute() to find the status of the
> process (in order to display the correct icon). This calls
> PProcess#getAttribute which calls PJob#getProcessAttribute. Since both
> PJob#addProcessesByJobRanks and PJob#getProcessAttribute are synchronized, we
> get a deadlock.
> 
> The synchronization used on PJob (and the other model element classes) needs to
> be checked to ensure that this kind of deadlock can't occur. I'd say you need
> to use finer grain synchronization than the whole method, particularly on
> methods that invoke handlers.
Comment 10 Randy Roberts CLA 2010-08-19 15:40:57 EDT
(In reply to comment #9)
If so I will just unsynchronize
  PJob#addProcessAttributes(BitSet processIds, AttributeManager attributes)
  PJob#removeProcessesByJobRanks(BitSet processJobRanks)
and
  PJob#addProcessesByJobRanks(BitSet newProcessJobRanks, AttributeManager attrs)

and do the main stuff in a 

   synchronized (this) {
      block
   }

followed by an unsynchronized call
to the respective fireXXXProcesses(...) call.

> (In reply to comment #8)
> 
> Do you think I can just exclude the firing of the handlers
> from the synchronization?
>
Comment 11 Greg Watson CLA 2010-08-19 16:12:12 EDT
I would have thought that you'd only want to synchronize access to the data structures, so calling the handlers outside the synchronization is definitely a good idea. It seems to have fixed the problem anyway.
Comment 12 Greg Watson CLA 2012-10-26 09:59:19 EDT
Superceded by new RM framework.