Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 348487 - JMDNS Discovery Thread is slowing down shutdown
Summary: JMDNS Discovery Thread is slowing down shutdown
Status: RESOLVED FIXED
Alias: None
Product: ECF
Classification: RT
Component: ecf.discovery (show other bugs)
Version: 3.5.0   Edit
Hardware: PC Linux
: P3 major (vote)
Target Milestone: ---   Edit
Assignee: Markus Kuppe CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 326228 355863
Blocks:
  Show dependency tree
 
Reported: 2011-06-06 20:40 EDT by Steffen Pingel CLA
Modified: 2011-11-30 14:16 EST (History)
2 users (show)

See Also:


Attachments
proposed fix (1.13 KB, patch)
2011-08-09 11:53 EDT, Steffen Pingel CLA
no flags Details | Diff
mylyn/context/zip (1.54 KB, application/octet-stream)
2011-08-09 11:53 EDT, Steffen Pingel CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steffen Pingel CLA 2011-06-06 20:40:02 EDT
I noticed that after Eclipse appears to have shutdown it takes a long time for the Java process to exit. Looking at a thread dump all remaining threads are daemon threads except for JMDNS Discovery Thread. Shouln't this be marked as a daemon thread as well and/or shutdown when the bundle is stopped that started it?

Full thread dump Java HotSpot(TM) Server VM (16.3-b01 mixed mode):

"MultiThreadedHttpConnectionManager cleanup" daemon prio=10 tid=0x0b4aa800 nid=0xd6e in Object.wait() [0xc0f76000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xd71f4800> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
        - locked <0xd71f4800> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
        at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)

"Thread-4" daemon prio=10 tid=0xc32f8800 nid=0xcb3 in Object.wait() [0xc06fe000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xd71012c8> (a org.apache.commons.httpclient.util.IdleConnectionTimeoutThread)
        at org.apache.commons.httpclient.util.IdleConnectionTimeoutThread.run(IdleConnectionTimeoutThread.java:108)
        - locked <0xd71012c8> (a org.apache.commons.httpclient.util.IdleConnectionTimeoutThread)

"JMDNS Discovery Thread" prio=10 tid=0xc0db9000 nid=0xcaa in Object.wait() [0xc02fe000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xd62fd768> (a org.eclipse.ecf.internal.provider.jmdns.SimpleFIFOQueue)
        at java.lang.Object.wait(Object.java:485)
        at org.eclipse.ecf.internal.provider.jmdns.SimpleFIFOQueue.peekQueue(SimpleFIFOQueue.java:53)
        - locked <0xd62fd768> (a org.eclipse.ecf.internal.provider.jmdns.SimpleFIFOQueue)
        at org.eclipse.ecf.internal.provider.jmdns.SimpleFIFOQueue.dequeue(SimpleFIFOQueue.java:41)
        - locked <0xd62fd768> (a org.eclipse.ecf.internal.provider.jmdns.SimpleFIFOQueue)
        at org.eclipse.ecf.provider.jmdns.container.JMDNSDiscoveryContainer$1.run(JMDNSDiscoveryContainer.java:121)
        at java.lang.Thread.run(Thread.java:619)

"Low Memory Detector" daemon prio=10 tid=0xc6f16c00 nid=0xc70 runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread1" daemon prio=10 tid=0xc6f14c00 nid=0xc6f waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0xc6f12c00 nid=0xc6e waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0xc6f11400 nid=0xc6d waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0xc6f00800 nid=0xc6c in Object.wait() [0xc70f3000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xd3969058> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
        - locked <0xd3969058> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x096e0c00 nid=0xc6b in Object.wait() [0xc7144000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xd3981908> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
        - locked <0xd3981908> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x09640800 nid=0xc65 waiting on condition [0xf6992000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.Shutdown.halt0(Native Method)
        at java.lang.Shutdown.halt(Shutdown.java:95)
        - locked <0xd3837c70> (a java.lang.Shutdown$Lock)
        at java.lang.Shutdown.exit(Shutdown.java:169)
        - locked <0xc7b25238> (a java.lang.Class for java.lang.Shutdown)
        at java.lang.Runtime.exit(Runtime.java:90)
        at java.lang.System.exit(System.java:904)
        at org.eclipse.equinox.launcher.Main.main(Main.java:1394)

"VM Thread" prio=10 tid=0x096de000 nid=0xc6a runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x09647800 nid=0xc66 runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x09649000 nid=0xc67 runnable 

"GC task thread#2 (ParallelGC)" prio=10 tid=0x0964a800 nid=0xc68 runnable 

"GC task thread#3 (ParallelGC)" prio=10 tid=0x0964bc00 nid=0xc69 runnable 

"VM Periodic Task Thread" prio=10 tid=0xc6f18800 nid=0xc71 waiting on condition
Comment 1 Markus Kuppe CLA 2011-06-07 03:31:35 EDT
Hi Steffen,

threading and deadlocks have been an issue upstream (JmDNS) indeed, but so far I haven't had a chance to complete the work on bug #326228. 

Given the little resources I currently have, I will mark this bug a dependent on #326228 for the moment.

Thanks
Markus
Comment 2 Steffen Pingel CLA 2011-08-09 11:53:26 EDT
Created attachment 201156 [details]
proposed fix

This patch marks the notification thread as a daemon thread to avoid blocking shutdown. 

Not sure if this causes any bad side effects due to abrupt stopping of the thread on shutdown. To me it seems preferable to blocking shutdown as the bundle should properly end all threads in stop() anyways.
Comment 3 Steffen Pingel CLA 2011-08-09 11:53:29 EDT
Created attachment 201157 [details]
mylyn/context/zip
Comment 4 Scott Lewis CLA 2011-08-09 13:12:13 EDT
Thanks much for the patch Steffen.  

Assigning to Markus for comment/discussion/evaluation/testing/release of patch, as I think he's most familiar with the relevant JMDNS code (he's certainly more familiar with it than I am :).

Markus if this won't work for you then please just let all know here.
Comment 5 Markus Kuppe CLA 2011-08-10 03:16:57 EDT
Steffen, please check if  org.eclipse.ecf.provider.jmdns.container.JMDNSDiscoveryContainer.disconnect() gets called prior to shutdown.
Comment 6 Steffen Pingel CLA 2011-08-10 12:27:46 EDT
Looking at the Mylyn client code I am wondering if we are actually missing the disconnect() call there. The code is roughly as following:

 container = ContainerFactory.getDefault().createContainer(ECF_SINGLETON_DISCOVERY,
				new Object[] { ECF_DISCOVERY_JMDNS });
 container.connect(null, null);

Do we need to call SingletonDiscoveryContainer.disconnect() on bundle shutdown? It doesn't seem to make any difference. 

Looking at JMDNSDiscoveryContainer.initializeQueue() I am bit surprised by the condition:

  while (!disposed || queue.isStopped()) {
  
Shouldn't the thread stop running when the queue is stopped (I might be misunderstanding the code)?

While debugging I noticed that connect is called in the stop() method of the JMDNSPlugin bundle (shortly before disconnect() is called) which seemed a bit suspicious but maybe its intended. The odd thing is that is seems to connect and disconnect a different container than the one that is used by Mylyn.
 
 Thread [main] (Suspended (breakpoint at line 93 in JMDNSDiscoveryContainer))	
	owns: Object  (id=145)	
	owns: ServiceUse  (id=146)	
	owns: Object  (id=147)	
	owns: Framework  (id=148)	
	JMDNSDiscoveryContainer.connect(ID, IConnectContext) line: 93	
	JMDNSPlugin$1.getService(Bundle, ServiceRegistration) line: 94	
	ServiceUse$1.run() line: 120	
	AccessController.doPrivileged(PrivilegedAction<T>) line: not available [native method]	
	ServiceUse.getService() line: 118	
	ServiceRegistrationImpl.getService(BundleContextImpl) line: 449	
	ServiceRegistry.getService(BundleContextImpl, ServiceReferenceImpl) line: 430	
	BundleContextImpl.getService(ServiceReference) line: 667	
	JMDNSPlugin.stop(BundleContext) line: 129	
	BundleContextImpl$2.run() line: 843	
	AccessController.doPrivileged(PrivilegedExceptionAction<T>) line: not available [native method]	
	BundleContextImpl.stop() line: 836	
	BundleHost.stopWorker(int) line: 474	
	BundleHost(AbstractBundle).suspend(boolean) line: 546	
	Framework.suspendBundle(AbstractBundle, boolean) line: 1098	
	StartLevelManager.decFWSL(int, AbstractBundle[]) line: 593	
	StartLevelManager.doSetStartLevel(int) line: 261	
	StartLevelManager.shutdown() line: 216	
	InternalSystemBundle.suspend() line: 266	
	Framework.shutdown(int) line: 685	
	Framework.close() line: 583	
	EclipseStarter.shutdown() line: 409	
	EclipseStarter.run(String[], Runnable) line: 200	
	NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]	
	NativeMethodAccessorImpl.invoke(Object, Object[]) line: 39	
	DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 25	
	Method.invoke(Object, Object...) line: 597	
	Main.invokeFramework(String[], URL[]) line: 559	
	Main.basicRun(String[]) line: 514	
	Main.run(String[]) line: 1311	
	Main.main(String[]) line: 1287
Comment 7 Markus Kuppe CLA 2011-08-25 08:19:40 EDT
(In reply to comment #6)
> Looking at the Mylyn client code I am wondering if we are actually missing the
> disconnect() call there. The code is roughly as following:
> 
>  container =
> ContainerFactory.getDefault().createContainer(ECF_SINGLETON_DISCOVERY,
>                 new Object[] { ECF_DISCOVERY_JMDNS });
>  container.connect(null, null);
> 
> Do we need to call SingletonDiscoveryContainer.disconnect() on bundle shutdown?
> It doesn't seem to make any difference. 

Yes, you need to call disconnect() manually which would make jmDNS disconnect and dispose of the thread. However org.eclipse.ecf.provider.discovery.SingletonDiscoveryContainer.disconnect() does not delegate this. Why do you use SingletonDiscoveryContainer anyway?

I advice to use (jmDNS) ECF discovery in the way shown in https://bugs.eclipse.org/bugs/attachment.cgi?id=181065&action=edit

> Looking at JMDNSDiscoveryContainer.initializeQueue() I am bit surprised by the
> condition:
> 
>   while (!disposed || queue.isStopped()) {
> 
> Shouldn't the thread stop running when the queue is stopped (I might be
> misunderstanding the code)?

This looks funny to me as well, though the "!disposed" check takes care of correctly stopping the thread in our tests. 

> While debugging I noticed that connect is called in the stop() method of the
> JMDNSPlugin bundle (shortly before disconnect() is called) which seemed a bit
> suspicious but maybe its intended. The odd thing is that is seems to connect
> and disconnect a different container than the one that is used by Mylyn.

This is a again side effect of SingletonDiscoveryContainer. SDC creates its own instance of a jmDNS discovery container which isn't handled by org.eclipse.ecf.internal.provider.jmdns.JMDNSPlugin.stop(BundleContext).
Comment 8 Steffen Pingel CLA 2011-08-25 11:45:52 EDT
(In reply to comment #7)
> Yes, you need to call disconnect() manually which would make jmDNS disconnect
> and dispose of the thread. However
> org.eclipse.ecf.provider.discovery.SingletonDiscoveryContainer.disconnect() does
> not delegate this. Why do you use SingletonDiscoveryContainer anyway?

Good question. I have opened bug 355863 to change that.

> I advice to use (jmDNS) ECF discovery in the way shown in
> https://bugs.eclipse.org/bugs/attachment.cgi?id=181065&action=edit

I would prefer not to introduce a dependency on ECF 3.4 at this point.

> > While debugging I noticed that connect is called in the stop() method of the
> > JMDNSPlugin bundle (shortly before disconnect() is called) which seemed a bit
> > suspicious but maybe its intended. The odd thing is that is seems to connect
> > and disconnect a different container than the one that is used by Mylyn.
> 
> This is a again side effect of SingletonDiscoveryContainer. SDC creates its own
> instance of a jmDNS discovery container which isn't handled by
> org.eclipse.ecf.internal.provider.jmdns.JMDNSPlugin.stop(BundleContext).

I'm still seeing this even though the code does not longer use the SingletonDiscoveryContainer. 

Could you take a quick look at the change on bug 355863 and let me know how we can avoid the connect in stop() if this is indeed not an ECF bug?
Comment 9 Markus Kuppe CLA 2011-08-25 12:36:09 EDT
(In reply to comment #8)

> > I advice to use (jmDNS) ECF discovery in the way shown in
> > https://bugs.eclipse.org/bugs/attachment.cgi?id=181065&action=edit
> 
> I would prefer not to introduce a dependency on ECF 3.4 at this point.

What's the problem with moving up to 3.4? If we end up with a code change in ECF to address the Hudson issue, you would have to consume 3.5.2.
Comment 10 Markus Kuppe CLA 2011-08-25 12:38:35 EDT
(In reply to comment #9)
> (In reply to comment #8)
> 
> > > I advice to use (jmDNS) ECF discovery in the way shown in
> > > https://bugs.eclipse.org/bugs/attachment.cgi?id=181065&action=edit
> > 
> > I would prefer not to introduce a dependency on ECF 3.4 at this point.
> 
> What's the problem with moving up to 3.4? If we end up with a code change in
> ECF to address the Hudson issue, you would have to consume 3.5.2.

I guess I can answer my own question. Mylyn supports installation into an older  Eclipse like Galileo?!
Comment 11 Scott Lewis CLA 2011-08-25 12:59:21 EDT
(In reply to comment #10)
> (In reply to comment #9)
> > (In reply to comment #8)
> > 
> > > > I advice to use (jmDNS) ECF discovery in the way shown in
> > > > https://bugs.eclipse.org/bugs/attachment.cgi?id=181065&action=edit
> > > 
> > > I would prefer not to introduce a dependency on ECF 3.4 at this point.
> > 
> > What's the problem with moving up to 3.4? If we end up with a code change in
> > ECF to address the Hudson issue, you would have to consume 3.5.2.
> 
> I guess I can answer my own question. Mylyn supports installation into an older
>  Eclipse like Galileo?!

Although I haven't tested it (we would appreciate help with that), I know of no reason why 3.5.2 discovery wouldn't work on Eclipse Galileo.
Comment 12 Markus Kuppe CLA 2011-08-25 13:47:21 EDT
(In reply to comment #8)
> I'm still seeing this even though the code does not longer use the
> SingletonDiscoveryContainer. 
> 
> Could you take a quick look at the change on bug 355863 and let me know how we
> can avoid the connect in stop() if this is indeed not an ECF bug?

The second connect/disconnect cycle is indeed suboptimal. But local testing shows that no threads get stuck and thus considerably slows down shutdown. It creates a jmDNS discovery container and disconnects/disposes it immediately. This should hardly be noticeable for the user.
Comment 13 Steffen Pingel CLA 2011-08-25 16:25:05 EDT
I don't want to risk breaking compatibility with other Galileo based products that may expect the version that was contributed by ECF. It's certainly a trade off and occasionally we will lean towards consuming newer APIs (e.g. for EGit) but it has implications for the build and consumers who install Mylyn. I prefer to depend on the consistent set of integrations available from a particular train repository whenever feasible.

From my point of view the connect/disconnect cycle during bundle stop is minor and nothing that needs to be fixed in the service release (but of course that's entirely up to you). The actual problem that I originally reported was caused by an unintended use of the API on the Mylyn end and should be fixed in the next service release.

Thanks Markus for your help with tracking down and understanding the problem.
Comment 14 Markus Kuppe CLA 2011-11-30 14:16:38 EST
Pushed changes to not create a new IDC during shutdown [http://git.eclipse.org/c/ecf/org.eclipse.ecf.git/commit/?id=cb9db515791a2b377703b2f3751f886f629e7113]