Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 315923 - ConcurrentModificationException when shutting down PE resource manager
Summary: ConcurrentModificationException when shutting down PE resource manager
Status: CLOSED WONTFIX
Alias: None
Product: PTP
Classification: Tools
Component: RM.PE (show other bugs)
Version: unspecified   Edit
Hardware: Macintosh Mac OS X
: P1 critical (vote)
Target Milestone: ---   Edit
Assignee: Randy Roberts CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-07 00:35 EDT by Greg Watson CLA
Modified: 2012-10-26 09:59 EDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Greg Watson CLA 2010-06-07 00:35:01 EDT
On the latest build I got the following error when shutting down the PE resource manager. I don't think this is PE specific however.

java.util.ConcurrentModificationException
	at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
	at java.util.HashMap$ValueIterator.next(HashMap.java:822)
	at org.eclipse.ptp.internal.core.elements.PJob.removeProcessesByJobRanks(PJob.java:381)
	at org.eclipse.ptp.internal.core.elements.PQueue.removeJobs(PQueue.java:181)
	at org.eclipse.ptp.rmsystem.AbstractResourceManager.removeQueues(AbstractResourceManager.java:1145)
	at org.eclipse.ptp.rmsystem.AbstractResourceManager.cleanUp(AbstractResourceManager.java:815)
	at org.eclipse.ptp.rmsystem.AbstractResourceManager.shutdown(AbstractResourceManager.java:580)
	at org.eclipse.ptp.ui.actions.StopResourceManagersObjectActionDelegate.run(StopResourceManagersObjectActionDelegate.java:60)
	at org.eclipse.ptp.ui.actions.AbstractResourceManagerSelectionActionDelegate.runWithEvent(AbstractResourceManagerSelectionActionDelegate.java:51)
	at org.eclipse.ui.internal.PluginAction.runWithEvent(PluginAction.java:241)
	at org.eclipse.jface.action.ActionContributionItem.handleWidgetSelection(ActionContributionItem.java:584)
	at org.eclipse.jface.action.ActionContributionItem.access$2(ActionContributionItem.java:501)
	at org.eclipse.jface.action.ActionContributionItem$5.handleEvent(ActionContributionItem.java:411)
	at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
	at org.eclipse.swt.widgets.Display.sendEvent(Display.java:3764)
	at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1343)
	at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1366)
	at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1351)
	at org.eclipse.swt.widgets.Widget.notifyListeners(Widget.java:1163)
	at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3610)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3265)
	at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2624)
	at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2588)
	at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2422)
	at org.eclipse.ui.internal.Workbench$7.run(Workbench.java:670)
	at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332)
	at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:663)
	at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)
	at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:115)
	at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
	at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
	at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:369)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:619)
	at org.eclipse.equinox.launcher.Main.basicRun(Main.java:574)
	at org.eclipse.equinox.launcher.Main.run(Main.java:1407)
	at org.eclipse.equinox.launcher.Main.main(Main.java:1383)
Comment 1 Randy Roberts CLA 2010-06-07 12:23:42 EDT
Is this reliably reproducible?

I can try to fix this, but with ConcurentModificationException's it
can be hit or miss.
Comment 2 Randy Roberts CLA 2010-06-07 12:26:12 EDT
(In reply to comment #1)
> Is this reliably reproducible?
> 
> I can try to fix this, but with ConcurentModificationException's it
> can be hit or miss.

I cannot reproduce with OpenMPI RM.
Comment 3 Greg Watson CLA 2010-06-07 12:47:41 EDT
I only saw it once.
Comment 4 Greg Watson CLA 2010-06-07 12:51:03 EDT
The only way I can see this occurring is if getAttributeIndexSet is called during removeProcessesByJobRanks
Comment 5 Greg Watson CLA 2010-06-07 12:52:04 EDT
So you probably need to synchronize processAttributesMap
Comment 6 Randy Roberts CLA 2010-06-07 12:55:25 EDT
(In reply to comment #5)
> So you probably need to synchronize processAttributesMap

I've synchronized the following PJob methods
  getAttributeIndexSet(D)
  getProcessAttributeKeys(BitSet)
  getProcessAttributes(BitSet)
  getProcessAttributes(String, BitSet)
  removeProcessesByJobRanks(BitSet)

I've committed these.
Comment 7 Greg Watson CLA 2010-06-07 13:00:55 EDT
Thanks. Let's hope that fixes the problem. I'll reopen if I see the problem again.
Comment 8 Greg Watson CLA 2010-08-19 15:25:45 EDT
Unfortunately this now causes a deadlock in the model when the debugger launches. Currently launching more than a one process debug job deadlocks the UI in HEAD.

The problem is that the resource manager calls PJob#addProcessesByJobRanks to add new processes to the job. This in turn calls PJob#fireNewProcesses to indicate the new processes have been created. The debug view has registered a handler on the job to get notified of process changes and is called when this happens. The handler in turn calls getAttribute() to find the status of the process (in order to display the correct icon). This calls PProcess#getAttribute which calls PJob#getProcessAttribute. Since both PJob#addProcessesByJobRanks and PJob#getProcessAttribute are synchronized, we get a deadlock.

The synchronization used on PJob (and the other model element classes) needs to be checked to ensure that this kind of deadlock can't occur. I'd say you need to use finer grain synchronization than the whole method, particularly on methods that invoke handlers.
Comment 9 Randy Roberts CLA 2010-08-19 15:31:26 EDT
(In reply to comment #8)

Do you think I can just exclude the firing of the handlers
from the synchronization?

> Unfortunately this now causes a deadlock in the model when the debugger
> launches. Currently launching more than a one process debug job deadlocks the
> UI in HEAD.
> 
> The problem is that the resource manager calls PJob#addProcessesByJobRanks to
> add new processes to the job. This in turn calls PJob#fireNewProcesses to
> indicate the new processes have been created. The debug view has registered a
> handler on the job to get notified of process changes and is called when this
> happens. The handler in turn calls getAttribute() to find the status of the
> process (in order to display the correct icon). This calls
> PProcess#getAttribute which calls PJob#getProcessAttribute. Since both
> PJob#addProcessesByJobRanks and PJob#getProcessAttribute are synchronized, we
> get a deadlock.
> 
> The synchronization used on PJob (and the other model element classes) needs to
> be checked to ensure that this kind of deadlock can't occur. I'd say you need
> to use finer grain synchronization than the whole method, particularly on
> methods that invoke handlers.
Comment 10 Randy Roberts CLA 2010-08-19 15:40:57 EDT
(In reply to comment #9)
If so I will just unsynchronize
  PJob#addProcessAttributes(BitSet processIds, AttributeManager attributes)
  PJob#removeProcessesByJobRanks(BitSet processJobRanks)
and
  PJob#addProcessesByJobRanks(BitSet newProcessJobRanks, AttributeManager attrs)

and do the main stuff in a 

   synchronized (this) {
      block
   }

followed by an unsynchronized call
to the respective fireXXXProcesses(...) call.

> (In reply to comment #8)
> 
> Do you think I can just exclude the firing of the handlers
> from the synchronization?
>
Comment 11 Greg Watson CLA 2010-08-19 16:12:12 EDT
I would have thought that you'd only want to synchronize access to the data structures, so calling the handlers outside the synchronization is definitely a good idea. It seems to have fixed the problem anyway.
Comment 12 Greg Watson CLA 2012-10-26 09:59:19 EDT
Superceded by new RM framework.