| Summary: | JMDNS Discovery Thread is slowing down shutdown | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [RT] ECF | Reporter: | Steffen Pingel <steffen.pingel> | ||||||
| Component: | ecf.discovery | Assignee: | Markus Kuppe <bugs.eclipse.org> | ||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||
| Severity: | major | ||||||||
| Priority: | P3 | CC: | bugs.eclipse.org, slewis | ||||||
| Version: | 3.5.0 | ||||||||
| Target Milestone: | --- | ||||||||
| Hardware: | PC | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Bug Depends on: | 326228, 355863 | ||||||||
| Bug Blocks: | |||||||||
| Attachments: |
|
||||||||
Hi Steffen, threading and deadlocks have been an issue upstream (JmDNS) indeed, but so far I haven't had a chance to complete the work on bug #326228. Given the little resources I currently have, I will mark this bug a dependent on #326228 for the moment. Thanks Markus Created attachment 201156 [details]
proposed fix
This patch marks the notification thread as a daemon thread to avoid blocking shutdown.
Not sure if this causes any bad side effects due to abrupt stopping of the thread on shutdown. To me it seems preferable to blocking shutdown as the bundle should properly end all threads in stop() anyways.
Created attachment 201157 [details]
mylyn/context/zip
Thanks much for the patch Steffen. Assigning to Markus for comment/discussion/evaluation/testing/release of patch, as I think he's most familiar with the relevant JMDNS code (he's certainly more familiar with it than I am :). Markus if this won't work for you then please just let all know here. Steffen, please check if org.eclipse.ecf.provider.jmdns.container.JMDNSDiscoveryContainer.disconnect() gets called prior to shutdown. Looking at the Mylyn client code I am wondering if we are actually missing the disconnect() call there. The code is roughly as following:
container = ContainerFactory.getDefault().createContainer(ECF_SINGLETON_DISCOVERY,
new Object[] { ECF_DISCOVERY_JMDNS });
container.connect(null, null);
Do we need to call SingletonDiscoveryContainer.disconnect() on bundle shutdown? It doesn't seem to make any difference.
Looking at JMDNSDiscoveryContainer.initializeQueue() I am bit surprised by the condition:
while (!disposed || queue.isStopped()) {
Shouldn't the thread stop running when the queue is stopped (I might be misunderstanding the code)?
While debugging I noticed that connect is called in the stop() method of the JMDNSPlugin bundle (shortly before disconnect() is called) which seemed a bit suspicious but maybe its intended. The odd thing is that is seems to connect and disconnect a different container than the one that is used by Mylyn.
Thread [main] (Suspended (breakpoint at line 93 in JMDNSDiscoveryContainer))
owns: Object (id=145)
owns: ServiceUse (id=146)
owns: Object (id=147)
owns: Framework (id=148)
JMDNSDiscoveryContainer.connect(ID, IConnectContext) line: 93
JMDNSPlugin$1.getService(Bundle, ServiceRegistration) line: 94
ServiceUse$1.run() line: 120
AccessController.doPrivileged(PrivilegedAction<T>) line: not available [native method]
ServiceUse.getService() line: 118
ServiceRegistrationImpl.getService(BundleContextImpl) line: 449
ServiceRegistry.getService(BundleContextImpl, ServiceReferenceImpl) line: 430
BundleContextImpl.getService(ServiceReference) line: 667
JMDNSPlugin.stop(BundleContext) line: 129
BundleContextImpl$2.run() line: 843
AccessController.doPrivileged(PrivilegedExceptionAction<T>) line: not available [native method]
BundleContextImpl.stop() line: 836
BundleHost.stopWorker(int) line: 474
BundleHost(AbstractBundle).suspend(boolean) line: 546
Framework.suspendBundle(AbstractBundle, boolean) line: 1098
StartLevelManager.decFWSL(int, AbstractBundle[]) line: 593
StartLevelManager.doSetStartLevel(int) line: 261
StartLevelManager.shutdown() line: 216
InternalSystemBundle.suspend() line: 266
Framework.shutdown(int) line: 685
Framework.close() line: 583
EclipseStarter.shutdown() line: 409
EclipseStarter.run(String[], Runnable) line: 200
NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]
NativeMethodAccessorImpl.invoke(Object, Object[]) line: 39
DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 25
Method.invoke(Object, Object...) line: 597
Main.invokeFramework(String[], URL[]) line: 559
Main.basicRun(String[]) line: 514
Main.run(String[]) line: 1311
Main.main(String[]) line: 1287
(In reply to comment #6) > Looking at the Mylyn client code I am wondering if we are actually missing the > disconnect() call there. The code is roughly as following: > > container = > ContainerFactory.getDefault().createContainer(ECF_SINGLETON_DISCOVERY, > new Object[] { ECF_DISCOVERY_JMDNS }); > container.connect(null, null); > > Do we need to call SingletonDiscoveryContainer.disconnect() on bundle shutdown? > It doesn't seem to make any difference. Yes, you need to call disconnect() manually which would make jmDNS disconnect and dispose of the thread. However org.eclipse.ecf.provider.discovery.SingletonDiscoveryContainer.disconnect() does not delegate this. Why do you use SingletonDiscoveryContainer anyway? I advice to use (jmDNS) ECF discovery in the way shown in https://bugs.eclipse.org/bugs/attachment.cgi?id=181065&action=edit > Looking at JMDNSDiscoveryContainer.initializeQueue() I am bit surprised by the > condition: > > while (!disposed || queue.isStopped()) { > > Shouldn't the thread stop running when the queue is stopped (I might be > misunderstanding the code)? This looks funny to me as well, though the "!disposed" check takes care of correctly stopping the thread in our tests. > While debugging I noticed that connect is called in the stop() method of the > JMDNSPlugin bundle (shortly before disconnect() is called) which seemed a bit > suspicious but maybe its intended. The odd thing is that is seems to connect > and disconnect a different container than the one that is used by Mylyn. This is a again side effect of SingletonDiscoveryContainer. SDC creates its own instance of a jmDNS discovery container which isn't handled by org.eclipse.ecf.internal.provider.jmdns.JMDNSPlugin.stop(BundleContext). (In reply to comment #7) > Yes, you need to call disconnect() manually which would make jmDNS disconnect > and dispose of the thread. However > org.eclipse.ecf.provider.discovery.SingletonDiscoveryContainer.disconnect() does > not delegate this. Why do you use SingletonDiscoveryContainer anyway? Good question. I have opened bug 355863 to change that. > I advice to use (jmDNS) ECF discovery in the way shown in > https://bugs.eclipse.org/bugs/attachment.cgi?id=181065&action=edit I would prefer not to introduce a dependency on ECF 3.4 at this point. > > While debugging I noticed that connect is called in the stop() method of the > > JMDNSPlugin bundle (shortly before disconnect() is called) which seemed a bit > > suspicious but maybe its intended. The odd thing is that is seems to connect > > and disconnect a different container than the one that is used by Mylyn. > > This is a again side effect of SingletonDiscoveryContainer. SDC creates its own > instance of a jmDNS discovery container which isn't handled by > org.eclipse.ecf.internal.provider.jmdns.JMDNSPlugin.stop(BundleContext). I'm still seeing this even though the code does not longer use the SingletonDiscoveryContainer. Could you take a quick look at the change on bug 355863 and let me know how we can avoid the connect in stop() if this is indeed not an ECF bug? (In reply to comment #8) > > I advice to use (jmDNS) ECF discovery in the way shown in > > https://bugs.eclipse.org/bugs/attachment.cgi?id=181065&action=edit > > I would prefer not to introduce a dependency on ECF 3.4 at this point. What's the problem with moving up to 3.4? If we end up with a code change in ECF to address the Hudson issue, you would have to consume 3.5.2. (In reply to comment #9) > (In reply to comment #8) > > > > I advice to use (jmDNS) ECF discovery in the way shown in > > > https://bugs.eclipse.org/bugs/attachment.cgi?id=181065&action=edit > > > > I would prefer not to introduce a dependency on ECF 3.4 at this point. > > What's the problem with moving up to 3.4? If we end up with a code change in > ECF to address the Hudson issue, you would have to consume 3.5.2. I guess I can answer my own question. Mylyn supports installation into an older Eclipse like Galileo?! (In reply to comment #10) > (In reply to comment #9) > > (In reply to comment #8) > > > > > > I advice to use (jmDNS) ECF discovery in the way shown in > > > > https://bugs.eclipse.org/bugs/attachment.cgi?id=181065&action=edit > > > > > > I would prefer not to introduce a dependency on ECF 3.4 at this point. > > > > What's the problem with moving up to 3.4? If we end up with a code change in > > ECF to address the Hudson issue, you would have to consume 3.5.2. > > I guess I can answer my own question. Mylyn supports installation into an older > Eclipse like Galileo?! Although I haven't tested it (we would appreciate help with that), I know of no reason why 3.5.2 discovery wouldn't work on Eclipse Galileo. (In reply to comment #8) > I'm still seeing this even though the code does not longer use the > SingletonDiscoveryContainer. > > Could you take a quick look at the change on bug 355863 and let me know how we > can avoid the connect in stop() if this is indeed not an ECF bug? The second connect/disconnect cycle is indeed suboptimal. But local testing shows that no threads get stuck and thus considerably slows down shutdown. It creates a jmDNS discovery container and disconnects/disposes it immediately. This should hardly be noticeable for the user. I don't want to risk breaking compatibility with other Galileo based products that may expect the version that was contributed by ECF. It's certainly a trade off and occasionally we will lean towards consuming newer APIs (e.g. for EGit) but it has implications for the build and consumers who install Mylyn. I prefer to depend on the consistent set of integrations available from a particular train repository whenever feasible. From my point of view the connect/disconnect cycle during bundle stop is minor and nothing that needs to be fixed in the service release (but of course that's entirely up to you). The actual problem that I originally reported was caused by an unintended use of the API on the Mylyn end and should be fixed in the next service release. Thanks Markus for your help with tracking down and understanding the problem. Pushed changes to not create a new IDC during shutdown [http://git.eclipse.org/c/ecf/org.eclipse.ecf.git/commit/?id=cb9db515791a2b377703b2f3751f886f629e7113] |
I noticed that after Eclipse appears to have shutdown it takes a long time for the Java process to exit. Looking at a thread dump all remaining threads are daemon threads except for JMDNS Discovery Thread. Shouln't this be marked as a daemon thread as well and/or shutdown when the bundle is stopped that started it? Full thread dump Java HotSpot(TM) Server VM (16.3-b01 mixed mode): "MultiThreadedHttpConnectionManager cleanup" daemon prio=10 tid=0x0b4aa800 nid=0xd6e in Object.wait() [0xc0f76000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xd71f4800> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0xd71f4800> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122) "Thread-4" daemon prio=10 tid=0xc32f8800 nid=0xcb3 in Object.wait() [0xc06fe000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xd71012c8> (a org.apache.commons.httpclient.util.IdleConnectionTimeoutThread) at org.apache.commons.httpclient.util.IdleConnectionTimeoutThread.run(IdleConnectionTimeoutThread.java:108) - locked <0xd71012c8> (a org.apache.commons.httpclient.util.IdleConnectionTimeoutThread) "JMDNS Discovery Thread" prio=10 tid=0xc0db9000 nid=0xcaa in Object.wait() [0xc02fe000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xd62fd768> (a org.eclipse.ecf.internal.provider.jmdns.SimpleFIFOQueue) at java.lang.Object.wait(Object.java:485) at org.eclipse.ecf.internal.provider.jmdns.SimpleFIFOQueue.peekQueue(SimpleFIFOQueue.java:53) - locked <0xd62fd768> (a org.eclipse.ecf.internal.provider.jmdns.SimpleFIFOQueue) at org.eclipse.ecf.internal.provider.jmdns.SimpleFIFOQueue.dequeue(SimpleFIFOQueue.java:41) - locked <0xd62fd768> (a org.eclipse.ecf.internal.provider.jmdns.SimpleFIFOQueue) at org.eclipse.ecf.provider.jmdns.container.JMDNSDiscoveryContainer$1.run(JMDNSDiscoveryContainer.java:121) at java.lang.Thread.run(Thread.java:619) "Low Memory Detector" daemon prio=10 tid=0xc6f16c00 nid=0xc70 runnable [0x00000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0xc6f14c00 nid=0xc6f waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0xc6f12c00 nid=0xc6e waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0xc6f11400 nid=0xc6d waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0xc6f00800 nid=0xc6c in Object.wait() [0xc70f3000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xd3969058> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0xd3969058> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x096e0c00 nid=0xc6b in Object.wait() [0xc7144000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xd3981908> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0xd3981908> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x09640800 nid=0xc65 waiting on condition [0xf6992000] java.lang.Thread.State: RUNNABLE at java.lang.Shutdown.halt0(Native Method) at java.lang.Shutdown.halt(Shutdown.java:95) - locked <0xd3837c70> (a java.lang.Shutdown$Lock) at java.lang.Shutdown.exit(Shutdown.java:169) - locked <0xc7b25238> (a java.lang.Class for java.lang.Shutdown) at java.lang.Runtime.exit(Runtime.java:90) at java.lang.System.exit(System.java:904) at org.eclipse.equinox.launcher.Main.main(Main.java:1394) "VM Thread" prio=10 tid=0x096de000 nid=0xc6a runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x09647800 nid=0xc66 runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x09649000 nid=0xc67 runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x0964a800 nid=0xc68 runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x0964bc00 nid=0xc69 runnable "VM Periodic Task Thread" prio=10 tid=0xc6f18800 nid=0xc71 waiting on condition