Community
Participate
Working Groups
have a server setup for remote connection; restart eclipse; double click the server ...see that it is blocked for several seconds (seen between 10 seconds to minutes). Running jstack reveals that it is blocked on getting host by address. This is during the "check simple cases" section and so does not follow the API's guarantee of 250ms. "main" prio=8 tid=0000000013801000 nid=0xa0ba9540 runnable [00000000bfffd000] java.lang.Thread.State: RUNNABLE at java.net.Inet6AddressImpl.getHostByAddr(Native Method) at java.net.InetAddress$1.getHostByAddr(InetAddress.java:854) at java.net.InetAddress.getHostFromNameService(InetAddress.java:534) at java.net.InetAddress.getCanonicalHostName(InetAddress.java:505) at org.eclipse.wst.server.core.util.SocketUtil.isLocalhost(SocketUtil.java:296) at org.eclipse.wst.server.ui.internal.editor.OverviewEditorPart.updateRuntimeCombo(OverviewEditorPart.java:891) at org.eclipse.wst.server.ui.internal.editor.OverviewEditorPart.createGeneralSection(OverviewEditorPart.java:428) at org.eclipse.wst.server.ui.internal.editor.OverviewEditorPart.createPartControl(OverviewEditorPart.java:295) at org.eclipse.ui.part.MultiPageEditorPart.addPage(MultiPageEditorPart.java:241) at org.eclipse.ui.part.MultiPageEditorPart.addPage(MultiPageEditorPart.java:211) at org.eclipse.wst.server.ui.internal.editor.ServerEditor.createPages(ServerEditor.java:235) at org.eclipse.ui.part.MultiPageEditorPart.createPartControl(MultiPageEditorPart.java:348)
to note: this was noticed on OS X - I think Linux has much better/faster timeout handling.
I'm upgrading this to major, since it blocks UI and violates the javadoc in the class.
Please look into this?
This also affects Server wizard - just typing in that field makes eclipse freeze.
Angel: The code says: // check simple cases try { localHostaddr = InetAddress.getLocalHost(); if (host.equals(localHostaddr.getHostName().toLowerCase()) /*HERE*/ || host.equals(localHostaddr.getCanonicalHostName().toLowerCase()) || host.equals(localHostaddr.getHostAddress().toLowerCase())){ A look into localHostaddr.getHostName() and localHostaddr.getHostAddress() both show these return quickly. A look into localHostaddr.getCanonicalHostName() shows clearly this goes out to the getHostFromNameService() call. This is clearly a platform dependent line of code and, since it actually must poll some service, is most likely not a "simple" case at all. This case belongs to be covered by the complex cases which are bound by the thread maximums and thread caches and timeouts. I submit that line 296, the line marked with /*HERE*/ in my above example, be removed from the file.
I can see why it makes sense to remove getCanonicalHostName() from the simple check, since it is called in the CacheThread. However, more investigation is required before removing that line, since getCanonicalHostName() has existed in the simple case for a long time (at least since 2007).
I understand your view. More research is usually a good thing. Just wanted to add that getCanonicalHostName() really doesn't fit the definitions of a "simple" case, in my opinion, because it has to reach out to a service to do its work. The name service can clearly be platform dependent and its speed is implementation dependent. public String getCanonicalHostName() { if (canonicalHostName == null) { canonicalHostName = InetAddress.getHostFromNameService(this, true); } return canonicalHostName; } The performance hit is really striking and the UI gets quite frozen from this call. Basically, the getCanonicalHostName() call potentially (and for Mac, visibly) prevents the method from satisifying the javadoc: * On machines where the network configuration of the machine is bad or the * network has problems, the first call to this method will always return after * 250ms, even if the caching is not complete. At that point it may return * "false negative" results. (i.e. the method will return <code>false</code> * even though it may later determine that the host address is a local host) If a call to getCanonicalHostName() blocks for 10 seconds (not saying it does, but, it could), this method has no protection against it, thus violating the javadoc.
(In reply to comment #6) > I can see why it makes sense to remove getCanonicalHostName() from the simple > check, since it is called in the CacheThread. However, more investigation is > required before removing that line, since getCanonicalHostName() has existed in > the simple case for a long time (at least since 2007). Yes, and I think there have been bug reports about this issue ever since then ;) I hope we can find a way to get this fixed in maintanence since its making it rather hard for users to start use remote/cloud servers.
Steven: Can you please elaborate on what else needs to be done, and who has the ability to do it? ;) I'd rather this bug not stagnate under the pretence of "more study is needed" as a dead-end. I'd be glad to help in whatever way possible, but let's at least discuss some path to approval here. I've done my best to give a suggestion already.
Can steven or Elson please comment on this? It's marked for 3.3.1 which is rapidly approaching
Steven is currently looking into this and working on a solution. Stay tuned.
Created attachment 202168 [details] Patch v1.0 ==== Overview of changes ==== 1. The getCanonicalHostName is removed from the simple check. 2. The getCanonicalHostName check is now done before the more expensive complex checks. In the quick case, it is better to run this test (with a timeout of 100ms), since this will be less expensive than skipping the test completely and going to the complex case. 3. The JavaDoc was updated to increase the return time from 250ms to 350ms. The extra 100 ms is added for the getCanonicalHostName check. ==== Tests run ==== Fast network speed case (my current machine setup) 1. Test local canonical host name - the new thread will add the canonical host name correctly into the cache 2. Test local host name - existing code will correctly add the entry into the cache 3. Test local host address - existing code will correctly add the entry into the cache 4. Test remote host name - existing code will correctly add the entry into notLocalHostCache To test slower networks, a static method was created in SocketUtil that would call getCanonicalHostName after adding a specified delay Medium network speed case (150 ms delay) 1. Test canonical host name - the new thread will exit prematurely and cache is not updated as expected. The cached thread will be run and the canonical host name will be added there. The UI is not frozen due to the delay. 2. Test remote host name - existing code will correctly add the entry into notLocalHostCache. The UI is not frozen due to the delay. Slow network speed case (5 second delay) 1. Test canonical host name - the canonincal host name will eventually be added into the cache, the UI will not be frozen but may return a false negative (expected because the cache thread is not complete). 2. Test remote host name - existing code will correctly add the entry into notLocalHostCache. The UI is not frozen due to delay.
Hi Steven: Would it be possible to get a patch build for testing? We have a limited number of people who have the proper environment to test this. Thanks.
(In reply to comment #13) > Hi Steven: > > Would it be possible to get a patch build for testing? We have a limited number > of people who have the proper environment to test this. Thanks. Rob, we typically do not provide patch builds. Given that the patch has been attached to the defect, can you create the patch build so that it can be applied to the proper environment for those limited number of people?
Information for PMC candidate: * Explain why you believe this is a stop-ship defect. Or, if it is a "hotbug" (requested by an adopter) please document it as such. The reported freezing from 10 seconds to 1 minute blocks the UI. By freezing for such a long duration, this issue is breaking the Java Doc's timeout statement. * Is there a work-around? If so, why do you believe the work-around is insufficient? There is no workaround. * How has the fix been tested? Is there a test case attached to the bugzilla record? Has a JUnit Test been added? The fix has been tested under simulated network speeds, with the testing shown in comment #12. There is no JUnit test added, since this issue depends on the network speed. * Give a brief technical overview. Who has reviewed this fix? 1. The getCanonicalHostName is removed from the simple check. 2. The getCanonicalHostName check is now done before the more expensive complex checks. In the quick case, it is better to run this test (with a timeout of 100ms), since this will be less expensive than skipping the test completely and going to the complex case. 3. The JavaDoc was updated to increase the return time from 250ms to 350ms. The extra 100 ms is added for the getCanonicalHostName check. Change was reviewed by Elson Yuen. * What is the risk associated with this fix? There is a chance of false negatives (isLocalHost returns false, when it is local host), since previously we would call getCanonicalHostName in the simple check (without a timeout). Since it did not timeout, this did break the Java Doc's gurantee of 250ms, however it would return the correct result.
Keeping the getCanonicalHostName() check in the latest patch is the preferred way to go. The code changes looks good and the test coverage is sufficient. Assigning to 3.2.5. Given that it is the last week of 3.3.1 build and there are risk associated with this defect, this problem will be address on 3.3.2 on the 3.3.x stream.
Code released to 32M
Ugh god. Not in 3.3.1. FUD.
Code released to HEAD
Code released to 33M
New Gerrit change created: https://git.eclipse.org/r/109069