Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 327170

Summary: ConcurrentModificationException in DiscoveredServiceTrackerImpl
Product: [RT] ECF Reporter: Bryan Hunt <bhunt>
Component: ecf.remoteservicesAssignee: Scott Lewis <slewis>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: bugs.eclipse.org, slewis
Version: 3.3.0   
Target Milestone: 3.4.0   
Hardware: All   
OS: All   
Whiteboard:
Attachments:
Description Flags
YourKit exception stack screenshot
none
YourKit full exception stack screenshot
none
proposed fix
none
mylyn/context/zip none

Description Bryan Hunt CLA 2010-10-06 23:35:13 EDT
Created attachment 180382 [details]
YourKit exception stack screenshot

I've been trying out YourKit on my server code and happened across a ConcurrentModificationException in DiscoveredServiceTrackerImpl.removeProxyServiceRegistrations().  If the problem is not obvious, I can try to debug further.
Comment 1 Scott Lewis CLA 2010-10-07 00:15:29 EDT
Thanks for the report.  It would be helpful to have the entire stack trace (the screen shot seems to only have a partial trace) if you have it.

And an explanation of the use case would also be helpful (i.e your understanding of what's happening WRT service undiscovery when this occurs). 

Also...does this occur consistently with some scenario?  Or is it non-deterministic?
Comment 2 Bryan Hunt CLA 2010-10-12 10:29:50 EDT
Created attachment 180672 [details]
YourKit full exception stack screenshot
Comment 3 Bryan Hunt CLA 2010-10-12 10:32:53 EDT
Scott, I'm not sure why my services are being undiscovered.  The remote services and consumer are all running on the same host machine.  My only thought right now is that it might be related to the problems I'm having with my IP phone.  I'm going to unplug my workstation from the IP phone and plug it straight into the wall outlet to see if anything changes.

It's not clear if the exception always happens when a service is undiscovered or not.  I'll try to investigate as I have more time.
Comment 4 Scott Lewis CLA 2010-10-12 10:46:06 EDT
(In reply to comment #3)
> Scott, I'm not sure why my services are being undiscovered.  The remote
> services and consumer are all running on the same host machine.  My only
> thought right now is that it might be related to the problems I'm having with
> my IP phone.  I'm going to unplug my workstation from the IP phone and plug it
> straight into the wall outlet to see if anything changes.
> 
> It's not clear if the exception always happens when a service is undiscovered
> or not.  I'll try to investigate as I have more time.

Ok, thanks for the report and full stack trace.  I'll take a look.  Any more info (e.g. about how to reproduce, etc) would be helpful and appreciated.

One question:  is the distribution using ECF generic, r-osgi, or some other provider?
Comment 5 Bryan Hunt CLA 2010-10-12 10:49:01 EDT
I'm using ECF generic for distribution and zookeeper for discovery.
Comment 6 Scott Lewis CLA 2010-10-12 19:48:41 EDT
(In reply to comment #5)
> I'm using ECF generic for distribution and zookeeper for discovery.

I've released to HEAD a fix that I believe should eliminate this concurrent modification exception.  Bryan I would appreciate testing in your environment to verify the fix, since I cannot currently reproduce the problem in my environment.

This does not/will not address the cause of the notifications from the zookeeper discovery provider...I have not had a chance to look at that and it probably will require examination by the zookeeper provider implementers...Wim and/or Ahmed...but this fix *should* address the concurrent modification exception thrown in the stack trace attached to this bug.

I'll leave the bug unresolved until I hear back WRT the verification.  If at all possible, I would appreciate testing this in your environment before our scheduled 3.4 release (Oct 20).

The only change was to bundle org.eclipse.ecf.osgi.services.distribution, but you might need several of the other bundles (e.g. org.eclipse.ecf.remoteservices, org.eclipse.ecf.provider, and perhaps zookeeper) because of newly added dependencies...depending upon what base version of ECF you are currently working with.  In any case, you can get a build including this fix here https://ecf2.osuosl.org/hudson/job/C-HEAD-sdk.feature/  build 1293 or after.
Comment 7 Bryan Hunt CLA 2010-10-12 23:28:57 EDT
Scott,

The ConcurrentModificationException seems to be fixed.  I also suspect that this fixed a problem I just found with ECF leaking threads.  Thanks.
Comment 8 Scott Lewis CLA 2010-10-12 23:39:09 EDT
(In reply to comment #7)
> Scott,
> 
> The ConcurrentModificationException seems to be fixed.  I also suspect that
> this fixed a problem I just found with ECF leaking threads.  Thanks.

Ok, resolving this bug as fixed then.
Comment 9 Bryan Hunt CLA 2010-10-13 13:50:26 EDT
YourKit captured another ConcurrentModificationException with the same stack trace.  I've set the debugger to trigger a breakpoint and hopefully get additional information.
Comment 10 Scott Lewis CLA 2010-10-13 14:30:39 EDT
(In reply to comment #9)
> YourKit captured another ConcurrentModificationException with the same stack
> trace.  I've set the debugger to trigger a breakpoint and hopefully get
> additional information.

Additional information would be very helpful...e.g. is there more than one thread that's accessing this data structure to cause this to happen?

Given the changes recently released I actually don't understand how a concurrent modification exception could be generated...especially with same stack...but if it's doing so then as much information about this would be appreciated...since I can't currently reproduce.
Comment 11 Bryan Hunt CLA 2010-10-13 15:43:54 EDT
I just hit the breakpoint, and it's on DiscoveredServiceTrackerImpl line 443 ID containerID = (ID) i.next();

Just looking at the code, I see at line 452 that you remove an element from the Map that you are iterating over.  If you aren't removing the last element, wouldn't that cause the CME?
Comment 12 Bryan Hunt CLA 2010-10-13 15:50:13 EDT
Maybe there's just a break; missing to get out of the iterator loop?
Comment 13 Scott Lewis CLA 2010-10-13 16:38:15 EDT
(In reply to comment #12)
> Maybe there's just a break; missing to get out of the iterator loop?

Perhaps you are right.  Perhaps I have been misinterpreting the effect that the remove has on the iterator.  I'll have another go at it.
Comment 14 Scott Lewis CLA 2010-10-13 16:49:37 EDT
Created attachment 180828 [details]
proposed fix

this new proposed fix takes the Map.remove(containerID) *out* of the iterator, and so should eliminate the ConcurrentModificationException.

Bryan if you could test and see how this works I would appreciate it.  Apologies for the issues...I should have realized this was the issue the first time.
Comment 15 Bryan Hunt CLA 2010-10-14 10:45:04 EDT
I think this problem is fixed now.
Comment 16 Markus Kuppe CLA 2010-10-14 10:51:32 EDT
+1, fixes this one too

log;+0200 2010.10.14 16:48:27:387;INFO;org.eclipse.ecf.remoteservice;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.remoteservice;code=0;message=No async remote service interface found with name=org.eclipse.ecf.services.quotes.QuoteServiceAsync for proxy service class=org.eclipse.ecf.services.quotes.QuoteService;severity2;exception=null;children=[]]]
[log;+0200 2010.10.14 16:48:30:129;ERROR;org.eclipse.ecf.osgi.services.distribution;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.distribution;code=4;message=org.eclipse.ecf.internal.osgi.services.distribution.DiscoveredServiceTrackerImpl:serviceChanged:UNAVAILABLE;severity4;exception=java.util.ConcurrentModificationException;children=[]]]
java.util.ConcurrentModificationException
	at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
	at java.util.HashMap$KeyIterator.next(HashMap.java:828)
	at org.eclipse.ecf.internal.osgi.services.distribution.DiscoveredServiceTrackerImpl.removeProxyServiceRegistrations(DiscoveredServiceTrackerImpl.java:443)
	at org.eclipse.ecf.internal.osgi.services.distribution.DiscoveredServiceTrackerImpl.serviceChanged(DiscoveredServiceTrackerImpl.java:175)
	at org.eclipse.ecf.internal.osgi.services.discovery.ServicePublicationHandler.notifyDiscoveredServiceTrackers(ServicePublicationHandler.java:124)
	at org.eclipse.ecf.internal.osgi.services.discovery.ServicePublicationHandler.serviceUndiscovered(ServicePublicationHandler.java:113)
	at org.eclipse.ecf.provider.discovery.CompositeDiscoveryContainer$CompositeContainerServiceListener.serviceUndiscovered(CompositeDiscoveryContainer.java:63)
	at org.eclipse.ecf.discovery.AbstractDiscoveryContainerAdapter.fireServiceUndiscovered(AbstractDiscoveryContainerAdapter.java:153)
	at org.eclipse.ecf.provider.jmdns.container.JMDNSDiscoveryContainer.fireUndiscovered(JMDNSDiscoveryContainer.java:355)
	at org.eclipse.ecf.provider.jmdns.container.JMDNSDiscoveryContainer$3.run(JMDNSDiscoveryContainer.java:349)
	at org.eclipse.ecf.provider.jmdns.container.JMDNSDiscoveryContainer$1.run(JMDNSDiscoveryContainer.java:125)
	at java.lang.Thread.run(Thread.java:619)
Comment 17 Scott Lewis CLA 2010-10-14 11:14:10 EDT
I've committed the proposed fixed to my master branch in local repository, but the push to git.eclipse.org master branch is being rejected...for reasons I can't determine.
Comment 18 Markus Kuppe CLA 2010-10-14 11:46:43 EDT
Fix released to master
Comment 19 Markus Kuppe CLA 2010-10-14 11:46:45 EDT
Created attachment 180889 [details]
mylyn/context/zip