Community
Participate
Working Groups
I ran the SDK tests on Windows on a 4.2 build on Hudson last night. I think there is a firewall installed on that machine that's preventing our ua tests from running. For instance, the UA tests set up a help center instance and run tests against it on an port in the ephemeral ports range. This fails because the port isn't accessible, in this case, port 55032. I don't think the port is a predictable number. https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/JUnit-win2/lastCompletedBuild/testReport/ java.io.FileNotFoundException: http://localhost:55032/help/search?phrase=jehcyqpfjs+OR+duernfryehd at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1610) at java.net.URL.openStream(URL.java:1035) at This test doesn't fail on the Linux slave.
Actually I was wrong, I can't tell if this problem occurs on the Linux slaves because the ua tests failed completely during this test run.
Does the 3.X test set fail in this way? The AV program may be blocking(but a quick peak doesn't show anything in it's logs). I've turned on the logging of the build in firewall to see if that's the issue. -M.
(In reply to comment #2) > Does the 3.X test set fail in this way? > > The AV program may be blocking(but a quick peak doesn't show anything in it's > logs). > > I've turned on the logging of the build in firewall to see if that's the issue. > > -M. Does the log tell us anything interesting? These tests are still consistently failing for us on Hudson Windows slave, while they happily pass on the Hudson Linux slaves. Here's on example from last night's build: http://localhost:57690/help/topic/org.eclipse.ua.tests/data/help/manual/dz2.html java.io.FileNotFoundException: http://localhost:57690/help/topic/org.eclipse.ua.tests/data/help/manual/dz2.html at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1610) at java.net.URL.openStream(URL.java:1035) at org.eclipse.ua.tests.help.webapp.TocZipTest.readPage(TocZipTest.java:56)
While logging is on, the logs themselves seem to be empty. I've made some tweaks to the firewall definitions for Eclipse related apps, and have removed some of the dupe entries. Let me know if that makes any difference. -M.
(In reply to comment #4) > While logging is on, the logs themselves seem to be empty. > > I've made some tweaks to the firewall definitions for Eclipse related apps, and > have removed some of the dupe entries. > > Let me know if that makes any difference. > > -M. No, no difference. Seem to pass fine on linux and the mac, but not on windows, for neither 3.8 or 4.2 tests. Any (easy) test we an do to help diagnose this? = = = http://localhost:61430/help/toc?lang=en java.io.FileNotFoundException: http://localhost:61430/help/toc?lang=en at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1610) at java.net.URL.openStream(URL.java:1035) at org.eclipse.ua.tests.help.remote.TocServletTest.getTocContributions(TocServletTest.java:118) at org.eclipse.ua.tests.help.remote.TocServletTest.testTocServletContainsFilteredToc(TocServletTest.java:54) = = =
I have noticed on the Mac machine, in the logs for this test, (where the tests are running/passing) there are messages that say things about proxy settings, such as: !ENTRY org.eclipse.core.net 1 0 2012-06-01 07:24:31.992 !MESSAGE System property http.nonProxyHosts has been set to 127.0.0.1|localhost|*.localhost|local|*.local|169.254/16|*.169.254/16|eclipse.org|*.eclipse.org|hudson.eclipse.org|*.hudson.eclipse.org|dev.eclipse.org|*.dev.eclipse.org by an external source. Webmasters, is that coming from something _you_ set on the machine? As far as I can tell, it is not in our test set-up code (thought, admit, it might be there in some "hidden" form). The Windows machine (where the test are failing) I do not see these messages, so wonder if that machine is trying to go through a proxy, even though it says "localhost"? I will admit, though, I do not see them in the Linux logs either, and the tests pass there ... but, we all know Linux is magical so it may be it "just works" :)
I will also document here, I tried adding a -D argument directly to where we invoke the VM we use for testing, -Dhttp.nonProxyHosts="127.0.0.1|localhost|*.localhost|local|*.local|169.254/16|*.169.254/16|eclipse.org|*.eclipse.org|hudson.eclipse.org|*.hudson.eclipse.org|dev.eclipse.org|*.dev.eclipse.org'" but, it did not have any effect (as far as I know, these vm sort of properties are not completely standard?)
One more oddity I'll note here; in the "script files" that start the test jobs, both linux and the mac (where the tests work) had this had a "no_proxy" variable, but it was commented out: #export no_proxy=localhost,dev.eclipse.org,hudson.eclipse.org But the windows script (.bat) had similar line, but NOT commented out: set no_proxy=localhost,dev.eclipse.org,hudson.eclipse.org I think I'll comment out the windows one too, for consistency, but my "quick test" indicated it made no difference. I could not find that variable 'no_proxy' used any where. But, I think this might imply there was some "work in progress" as Kim left that "got lost"? Or, may she was trying something and found it didn't work? Just thought I'd mention it, since I think the fundamental issue here is related to proxy settings ... not ports, per se. (I think there would be a different error message if port problem ... but ... "file not found" sounds like the request is actually going somewhere ... just not the local machine!? Speaking of proxy settings ... some of our Linux CVS tests are failing with messages like Could not connect to :pserver:hudsontest@hudson.eclipse.org:/cvs/org.eclipse.tests: I/O exception occurred: ProxyHTTP: java.io.IOException: proxy error: Forbidden (I thought a separate bug was being open for that, but couldn't find it right off ... just thought I'd mention it here, in case "proxy setup" needs a broader look.). Thanks
So searching the firewall logs with the port number and date from comment 5 I find these: 2012-06-01 04:31:43 ALLOW TCP 127.0.0.1 127.0.0.1 61432 61431 0 - 0 0 0 - - - RECEIVE 2012-06-01 04:31:43 ALLOW TCP 7f00:1:4c30:e585:100:20:5830:e585 7f00:1:4e61:6d65:8032:6786:100:e0 61433 61430 0 - 0 0 0 - - - SEND 2012-06-01 04:31:43 ALLOW TCP 127.0.0.1 127.0.0.1 61433 61430 0 - 0 0 0 - - - RECEIVE 2012-06-01 04:31:43 ALLOW TCP 7f00:1:4c30:e585:100:20:5830:e585 7f00:1:4e61:6d65:8032:6786:100:e0 61435 61434 0 - 0 0 0 - - - SEND 2012-06-01 04:31:43 ALLOW TCP 127.0.0.1 127.0.0.1 61435 61434 0 - 0 0 0 - - - RECEIVE Which looks ok to me. I've explicitly added 127.0.0.1 to the no-proxy list in the network settings(even though 'bypass proxy server for local addresses' is turned on. I also took a quick peak in the proxy logs and I don't see any requests destined for 'localhost'(or 127.0.0.1) Is there a way for me to manually start the infocenter on windows? -M.
> > Is there a way for me to manually start the infocenter on windows? > You mean just to test that it works, at all, right? The general instructions are at http://help.eclipse.org/indigo/index.jsp?topic=%2Forg.eclipse.platform.doc.isv%2Fguide%2Fua_help_setup_infocenter.htm I dusted off my windows machine, and could start the info center this way: C:\builds\eclipse-SDK-I20120604-1900-win32-x86_64\eclipse>C:\jdks\ibm-java-sdk-60-win-x86_64\sdk\jre\bin\java -classpath C:\builds\eclipse-SDK-I20120604-1900-win32-x86_64\eclipse\plugins\org.eclipse.help.base_3.6.100.v201206041900.jar org.eclipse.help.standalone.Infocenter -command start -eclipsehome C:\builds\eclipse-SDK-I20120604-1900-win32-x86_64\eclipse -port 8081 Once that's running, you should be able to "see" help, in a browser with http://localhost:8081/help/index.jsp But, I did not have a Java 7 version available (for windows), which is what we are trying to use in our tests ... so ... I'll be working on getting one of those. In the mean time, if you try it, be aware you'll see some harmless "logging" messages in the console, that are not a sign of any problem ... its just being informative that there is no slf4j logger (which is expected): SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Ok, I'll give those a try in a little while. Oddly enough I happened to have the desktop open when the Junit tests started running. So far I've seen one 'java.exe from Oracle wants to access' dialog. I ok'd that and I'll check to see if it made a change to the firewall ruleset once the tests are done. -M.
(In reply to comment #11) > ... Oddly enough I happened to have > the desktop open when the Junit tests started running. So far I've seen one > 'java.exe from Oracle wants to access' dialog. I assume that's "access the internet" type of dialog? It is a bit surprising that would "still be around" (i.e. not previously "permitted"). Perhaps that Java version was "auto updated"? (If so, we probably want to turn off "auto updates" :) Or ... could be some difference between 'java' and 'javaw' which I am ashamed to admit I have never understood :) FWIW, I did final get around to testing Java 7 on windows and the info center worked fine (though, admit, that was with IBM's JRE) ... with might need to set a special "maxpermspace" or something with Oracle's JRE. We'll track it down eventually.
Created attachment 217039 [details] Windows desktop I tried starting the Infocenter manually and got the popup in the screen shot. I clicked ok and was able to access the help center. I then went through and manually added the javaw.exe to the firewall list, and rebooted the slave. I was then able to start the infocenter on port 55032 without the popup. -M.
(In reply to comment #13) > Created attachment 217039 [details] > Windows desktop > > I tried starting the Infocenter manually and got the popup in the screen shot. > I clicked ok and was able to access the help center. I then went through and > manually added the javaw.exe to the firewall list, and rebooted the slave. I > was then able to start the infocenter on port 55032 without the popup. > > -M. Great! Thanks. I think we'll get one more chance this evening to see if that works to fix the test suite as a whole! Is there any larger lesson to learn here? How to set firewalls? Windows security settings? Windows updates settings?
For me I think the takeaway is: if people are having problems with the windows slave, get the instructions to 'manually' do whatever is exploding and the login and see what errors/popups are produced. -M.
(In reply to comment #15) > For me I think the takeaway is: if people are having problems with the windows > slave, get the instructions to 'manually' do whatever is exploding and the > login and see what errors/popups are produced. > > -M. I was afraid you'd say that :)
no joy. same problem. Not too surprising ... these don't have the "look" of something that is "hung" waiting for a response. But ... after the "rush of the release" I'll take a closer look and/or "watch the tests", get more diagnostics, etc. Thanks for your continued help. Error http://localhost:50941/help/index.jsp java.io.FileNotFoundException: http://localhost:50941/help/index.jsp at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1610) at org.eclipse.ua.tests.help.webapp.HelpServerInterrupt.checkServer(HelpServerInterrupt.java:112) at org.eclipse.ua.tests.help.webapp.HelpServerInterrupt.testServerWithoutInterrupt(HelpServerInterrupt.java:56) at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:501)
I wanted to make some observations on what I'm seeing related to "proxies" on the test machines. First, I added some statements to dump all environment variables, right before the tests begin (so, should be pure "machine setup", nothing we do). All three (linux, mac, and windows) had these: ANT_ARGS=-Dhttp.proxyHost=proxy.eclipse.org -Dhttp.proxyPort=9898 -Dhttps.proxyHost=proxy.eclipse.org -Dhttps.proxyPort=9898 -Dhttp.nonProxyHosts="*.eclipse.org|172.30.206.*" -Dhttps.nonProxyHosts="*.eclipse.org" -Dftp.proxyHost=proxy.eclipse.org -Dftp.proxyPort=9898 -Dftp.nonProxyHosts="*.eclipse.org" ANT_OPTS=-Dhttp.proxyHost=proxy.eclipse.org -Dhttp.proxyPort=9898 -Dhttps.proxyHost=proxy.eclipse.org -Dhttps.proxyPort=9898 -Dhttp.nonProxyHosts="*.eclipse.org|172.30.206.*" -Dhttps.nonProxyHosts="*.eclipse.org" -Dftp.proxyHost=proxy.eclipse.org -Dftp.proxyPort=9898 -Dftp.nonProxyHosts="*.eclipse.org" JVM_OPTS=-Dhttp.proxyHost=proxy.eclipse.org -Dhttp.proxyPort=9898 -Dhttps.proxyHost=proxy.eclipse.org -Dhttps.proxyPort=9898 -Dhttp.nonProxyHosts="*.eclipse.org|172.30.206.*" -Dhttps.nonProxyHosts="*.eclipse.org" -Dftp.proxyHost=proxy.eclipse.org -Dftp.proxyPort=9898 -Dftp.nonProxyHosts="*.eclipse.org" JAVA_ARGS=-Dhttp.proxyHost=proxy.eclipse.org -Dhttp.proxyPort=9898 -Dhttps.proxyHost=proxy.eclipse.org -Dhttps.proxyPort=9898 -Dhttp.nonProxyHosts="*.eclipse.org|172.30.206.*" -Dhttps.nonProxyHosts="*.eclipse.org" -Dftp.proxyHost=proxy.eclipse.org -Dftp.proxyPort=9898 -Dftp.nonProxyHosts="*.eclipse.org" BUT only the linux machine machine had these: no_proxy=localhost,127.0.0.1,172.30.206.0,dev.eclipse.org,.eclipse.org http_proxy=http://proxy.eclipse.org:9898 https_proxy=http://proxy.eclipse.org:9898 ftp_proxy=http://proxy.eclipse.org:9898 Does that sound right? One evidence this may make a difference is found in the hudson logs. One of the very first things I do to bootstrap the builds is issue a "wget" command to pull a cGit snapshot of a project (which has the scripts I execute in subsequent build steps). On both Windows and the Mac, it is clear from the log wget is using the proxy (here's a log snippet from windows): syswgetrc = C:\Program Files\GnuWin32/etc/wgetrc --2012-07-10 12:02:41-- http://git.eclipse.org/c/platform/eclipse.platform.releng.eclipsebuilder.git/snapshot/master.zip Resolving proxy.eclipse.org... 206.191.52.57 Connecting to proxy.eclipse.org|206.191.52.57|:9898... connected. Proxy request sent, awaiting response... 200 OK Length: unspecified [application/x-zip] Saving to: `master.zip' But it's fairly clear that linux is not using the proxy to do this operation. Here's its snippet: + wget http://git.eclipse.org/c/platform/eclipse.platform.releng.eclipsebuilder.git/snapshot/master.zip --2012-07-10 12:02:03-- http://git.eclipse.org/c/platform/eclipse.platform.releng.eclipsebuilder.git/snapshot/master.zip Resolving git.eclipse.org... 172.25.25.51 Connecting to git.eclipse.org|172.25.25.51|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [application/x-zip] Saving to: `master.zip' So ... question is ... is the "no_proxy" variable missing from windows and mac slave setup? I'm not sure where they are set. (I actually tried setting as system variables, but didn't seem to matter) ... or ... is windows and mac simply not capable of this sophistication?
Another question/approach ... according to http://wiki.eclipse.org/Hudson#Why_use_a_Proxy.3F Hudson using a proxy is not literally _required_ (such as, for security) so I am wondering ... could all the proxy stuff by "turned off" for the windows and the mac slave? At least, say, for a couple of months and see what difference it makes in our tests? [I doubt too many people actually _build_ on windows or mac, and sounds like most of the reasons to use proxies are related to building, not testing.].
(In reply to comment #18) > > So ... question is ... is the "no_proxy" variable missing from windows and mac > slave setup? I'm not sure where they are set. (I actually tried setting as > system variables, but didn't seem to matter) ... or ... is windows and mac > simply not capable of this sophistication? The no_proxy vars are set via the linux slave system config tools, not as part of hudson itself. I don't see anything in a quick Google that indicates the Mac can't handle them, so I've added them to the Hudson users startup script. A quick look at the windows slave shows that it should already have these set. (In reply to comment #19) While part of our proxy usage is for security, it's also because the slaves are on an non-routable network, and without the proxy they have no access to the outside world. I'd be willing to turn off the proxy settings on the windows and mac slave for a couple of days, but ultimately the tests will need to run with the proxy configured. -M.
Just to grasp at straws ... with by minor knowledge of networks ... using some special (temporary) access Matt granted to this machine, I see the hosts file is defined starting with # localhost name resolution is handled within DNS itself # 127.0.0.1 localhost # ::1 localho [and then some other "real" address-server pairs] And, (sic) comment does say 'localho' ... I hope DNS itself doesn't? :) Now, if I "ping localhost", it replies with Pinging WINSLAVE [::1] with 32 bytes of data: Reply from ::1: time<1ms etc. In other words ... Ip6 addresses. I'm relatively sure there's nothing in our help system or test set-ups that knows or cares about the difference between Ip4 and Ip6 ... but, thought I'd mention it in case others know better. But, I'm also wondering, webmasters, variables such as no_proxy uses some explicit Ip4 addresses (e.g. 127.0.0.1), namely ... no_proxy=localhost,127.0.0.1,172.30.206.0,dev.eclipse.org,.eclipse.org Again, just grasping at straws, I'm wondering if ip6 addresses need to be in that list? And, I have no idea what it means to "resolve localhost name in DNS itself" .... but ... I assume you've checked that and found it to be correctly defined for WINSLAVE? I suspect it is, since from the machine itself, I can ping both 127.0.0.1 and ::1 (as well as "localhost"). Perhaps you also know ... is ip6 used throughout Eclipse infrastructure? (So, would be similar on linux and mac machines and hence not something different or unique to this windows test machine)? I'm hoping soon, but when no other tests are running, to try a "stand alone" UA test on the machine itself ... that might tell us if its something related to the machine itself ... or (perhaps) something related running "within hudson". Thanks,
I have tried running the ua tests in a "standalone" environment, on the windows slave, meant to completely simulate how they are run "in hudson". (by standalone, I mean I just initiated them from a command line). I tried this from both the hudsonbuild id and the e4build id and in both cases, the tests fail just like they do when ran directly from hudson. So ... I don't think its hudson related, per se, but ... not sure my "set up" was perfectly equivalent. I say this because I was led to look at the "ant log" from the test, in detail, and it captures an array of properties in effect at the time of the test. Among the many, it had cur['http.nonProxyHosts'] = '<local>'; cur['http.proxyHost'] = 'proxy.eclipse.org'; cur['http.proxyPort'] = '9898'; cur['http.proxySet'] = 'true'; Sounds almost right, and at first, thought the '<local>' was some sort of shorthand. But, this led me to look at the ant log for a recent I-build tests in detail and found the difference between Linux and Windows interesting (For the Mac, there appeared to be no such variables, as sounds consistent with previous comments in this bug). [These tests were from midday 7/10.] Windows: cur['http.nonProxyHosts'] = 'build.eclipse.org|dev.eclipse.org|206.191.52.58|172.30.206.*|172.25.25.*|hudson.eclipse.org|127.0.0.1'; cur['http.proxyHost'] = 'proxy.eclipse.org'; cur['http.proxyPort'] = '9898'; Linux: cur['http.nonProxyHosts'] = 'localhost|127.0.0.1|172.30.206.0|dev.eclipse.org|.eclipse.org'; cur['http.proxyHost'] = 'proxy.eclipse.org'; cur['http.proxyPort'] = '9898'; So, the http.nonProxyHosts on Windows does not contain the literal word "localhost". Could that explain the problem? Could that part of the "slave" definition be redefined or configured? Please? Not to mention, the numerical addresses and other "named" addresses differ between the two ... For example, the linux one does not contain "hudson.eclipse.org" and we have another bug open that appeared related to that. See bug 381661. This would have been on slave 1 or 6 not sure which this particular test ran on, but we name them both in our "linux pool" so suggest both of them (at least) be checked. Not sure which is correct form: 172.30.206.0 or 172.30.206.* but ... would be best to be consistent (even if incorrect :) [it'd make it more obvious, probably]. Plus, I see you use ".eclipse.org" in a few spots, but many examples I see on the web appear to use only the end segments (by analogy, "eclipse.org") or some I think use "*.eclipse.org" ... so, given all the problems and amount of work this is causing us, I'd use all three forms :) ... unless you wanted to experiment. And, still not sure why I was seeing <local> in my console tests [but, I was experimenting] :)
Created attachment 218829 [details] Screenshot I've disabled IPv6 and the Link Layer Topology bullcrap. I'm not sure if that will change anything, but skimming through the comments I can't seem to figure out what it is that I need to look into.
(In reply to comment #23) > Created attachment 218829 [details] > Screenshot > > I've disabled IPv6 and the Link Layer Topology bullcrap. I'm not sure if that > will change anything, but skimming through the comments I can't seem to figure > out what it is that I need to look into. Ok, thanks. Good to reduce number of variables. We'll see. Think the slaves have to be restarted to "take effect" (If so, its fine to kill the tests running there from Eclipse Project, especially on Windows and the Mac. I'd prefer to re-run them if any chance this might avoid the problem. I was still wondering about this part: <quote> So, the http.nonProxyHosts on Windows does not contain the literal word "localhost". Could that explain the problem? Could that part of the "slave" definition be redefined or configured? Please? [And, similar remark about hudson.eclipse.org on Linux]. </quote> Maybe I'll have to learn to set up my own Hudson master, and slaves, to know, but, I got the impression there's some "environment variables" you define in "Hudson slave startup" while "making a slave". Is that accurate? If so, then perhaps the variable http.nonProxyHosts needs to be changed so it is exactly the same on all slaves. Specifically, I'd recommend http.nonProxyHosts = localhost|build.eclipse.org|dev.eclipse.org|206.191.52.58|172.30.206.*|172.30.206.0|172.25.25.*|172.25.25.0|hudson.eclipse.org|127.0.0.1|*.eclipse.org|eclipse.org This might be more than needed, but would be the "maximum union" of all of the existing ones, (and that's assuming no IPv6 address used, else I'd also recommend |::1| be added. Let me know if I can/should help in some way.
(In reply to comment #24) > Specifically, I'd recommend > > http.nonProxyHosts = > > localhost|build.eclipse.org|dev.eclipse.org|206.191.52.58|172.30.206.*|172.30.206.0|172.25.25.*|172.25.25.0|hudson.eclipse.org|127.0.0.1|*.eclipse.org|eclipse.org > The http.proxy environment vars on the windows slave are set via Windows(not via the slave config page in hudson). I've added this one to the Hudson slave config to see if it has the desired effect. -M.
Based on the proxy logs, I'd suggest git and download be added too: localhost|build.eclipse.org|dev.eclipse.org|git.eclipse.org|download.eclipse.org|206.191.52.58|172.30.206.*|172.30.206.0|172.25.25.*|172.25.25.0|hudson.eclipse.org|127.0.0.1|*.eclipse.org|eclipse.org
(In reply to comment #25) > (In reply to comment #24) > > > The http.proxy environment vars on the windows slave are set via Windows(not > via the slave config page in hudson). > Not sure if there's a reason for that? Because its run as a service? But, I think I see part of the problem or miscommunication. http.nonProxyHosts (and https.nonProxyHosts) in its -D form, is actually part of EACH of the "main" arguments, of ANT_OPTS ANT_ARGS JAVA_ARGS JVM_OPTS And that's where it needs to be changed (since, that's what Java, or Ant reads as they start to execute). So, the current ANT_ARGS, on Windows, is defined as : -Dhttp.proxyHost=proxy.eclipse.org -Dhttp.proxyPort=9898 -Dhttps.proxyHost=proxy.eclipse.org -Dhttps.proxyPort=9898 -Dhttp.nonProxyHosts="*.eclipse.org|172.30.206.*" -Dhttps.nonProxyHosts="*.eclipse.org" -Dftp.proxyHost=proxy.eclipse.org -Dftp.proxyPort=9898 -Dftp.nonProxyHosts="*.eclipse.org" so that would become (based on what we've said) -Dhttp.proxyHost=proxy.eclipse.org -Dhttp.proxyPort=9898 -Dhttps.proxyHost=proxy.eclipse.org -Dhttps.proxyPort=9898 -Dhttp.nonProxyHosts="localhost|build.eclipse.org|dev.eclipse.org|206.191.52.58|172.30.206.*|172.30.206.0|172.25.25.*|172.25.25.0|hudson.eclipse.org|127.0.0.1|*.eclipse.org|eclipse.org|git.eclipse.org|download.eclipse.org" -Dhttps.nonProxyHosts=""localhost|build.eclipse.org|dev.eclipse.org|206.191.52.58|172.30.206.*|172.30.206.0|172.25.25.*|172.25.25.0|hudson.eclipse.org|127.0.0.1|*.eclipse.org|eclipse.org|git.eclipse.org|download.eclipse.org" -Dftp.proxyHost=proxy.eclipse.org -Dftp.proxyPort=9898 -Dftp.nonProxyHosts="*.eclipse.org" (all as one line. No EOLs. Technically ftp ones should be similar, but I'd leave them as is, unless someone starts having trouble with ftp). To see if I could, have made this change on the windows7slave, and it did let me 'save' that config. We could probably remove "stand alone" http.nonProxyHosts you added. But I didn't. In either case, I think you still need to restart the slave, and if I understand things right, you do have to do that, not me. As I mentioned before, you can restart windows7slave at any time, no need to wait for current eclipse test to finish.
Another tidbit, I'm fairly sure "we" are using the ANT_ARGS incorrectly. According to ant documentation: ANT_OPTS - command-line arguments that should be passed to the JVM. For example, you can define system properties or set the maximum Java heap size here. ANT_ARGS - Ant command-line arguments. For example, set ANT_ARGS to point to a different logger, include a listener, and to include the -find flag. Hence, ANT_OPTS is the one that needs the -D options for Java. I no of nothing we need to specify on ANT_ARGS. But, I do think we need to set no_proxy as an environment variable in Hudson environment (in addition to all the others). It is the one used "by the shell" or "wget" and, probably, other tools. In the docs I've found about it, it is quiet explicit it only needs to be the "domain suffix" (e.g. "eclipse.org") or numeric address with optional port: <quote> no_proxy Some clients support the no_proxy environment variable that specifies a set of domains for which the proxy should not be consulted; the contents is a comma-separated list of domain names, with an optional :port part: no_proxy="cern.ch,ncsa.uiuc.edu,some.host:8080" export no_proxy </quote> So ... since I work late, and since I can restart Linux and the Mac (in theory) I may try this soon if no one else is running tests).
Some encouraging results. For the Mac and Linux6 I defined (in Hudon's web interface for slave configs) the following variables and restarted them, ran some short (partial) tests and seemed to fix a couple of issues. I've also defined the same variables on the Windows machine, but need you to restart that one. Please. I set no_proxy to localhost,build.eclipse.org,dev.eclipse.org,206.191.52.58,172.30.206.*,172.30.206.0,172.25.25.*,172.25.25.0,hudson.eclipse.org,127.0.0.1,*.eclipse.org,eclipse.org,git.eclipse.org,download.eclipse.org The "fix" I saw here is, on the mac, the initial wget (to git) no longer went through the proxy, as it did, as I first noted in comment 18. I removed ANT_ARGS (and, the standalone http.nonproxyhosts) and defined these three, will all the same value. ANT_OPTS JAVA_ARGS JVM_OPTS -Dhttp.proxyHost=proxy.eclipse.org -Dhttp.proxyPort=9898 -Dhttps.proxyHost=proxy.eclipse.org -Dhttps.proxyPort=9898 -Dhttp.nonProxyHosts="localhost|build.eclipse.org|dev.eclipse.org|206.191.52.58|172.30.206.*|172.30.206.0|172.25.25.*|172.25.25.0|hudson.eclipse.org|127.0.0.1|*.eclipse.org|eclipse.org|git.eclipse.org|download.eclipse.org" -Dhttps.nonProxyHosts="localhost|build.eclipse.org|dev.eclipse.org|206.191.52.58|172.30.206.*|172.30.206.0|172.25.25.*|172.25.25.0|hudson.eclipse.org|127.0.0.1|*.eclipse.org|eclipse.org|git.eclipse.org|download.eclipse.org" -Dftp.proxyHost=proxy.eclipse.org -Dftp.proxyPort=9898 -Dftp.nonProxyHosts="*.eclipse.org" again, one long line, no EOLs Here, the "fix" was for bug 381661, in my quick and partial test. The cvs test could (finally) reach the "hudson" server where the test data was stashed. So, I am optimistic that once with windows machine is restarted, it will fixe the "localhost" problems we've been having with the help-info-center tests ... but you know ... windows ... hudson ... lots could still go wrong. Two final comments: A. As I would watch the logs startup, I see some of these variables appear to be defined in .bashrc (or, somewhere) as the variables would be printed out there (but, with different values that I'd just put in). So ... doubt we need them both places and the behavior of "combining" them may not be well defined. (e.g. Hudson might ignore previous definitions, or it might ignore its own if variables already defined ... only a hudson programmer would know. B. Our current definitions are almost certainly overkill. If when you ever thought it important to reduce to the minimum needed, let me know and we could try a few experiments, but I think the priority should be on getting it working first ... doing it better can come later. So, if you'll restart windows slave, please, I'll test our problematic test case there.
(In reply to comment #29) > A. As I would watch the logs startup, I see some of these variables appear to > be defined in .bashrc (or, somewhere) as the variables would be printed out > there (but, with different values that I'd just put in). There is a 'global' .bashrc file that the hudson users on the slaves load, since in the past at least, the Hudson environment vars didn't always appear to job grand-children. > So, if you'll restart windows slave, please, I'll test our problematic test > case there. Done. -M.
(In reply to comment #30) > (In reply to comment #29) > > > > A. As I would watch the logs startup, I see some of these variables appear to > > be defined in .bashrc (or, somewhere) as the variables would be printed out > > there (but, with different values that I'd just put in). > > There is a 'global' .bashrc file that the hudson users on the slaves load, > since in the past at least, the Hudson environment vars didn't always appear to > job grand-children. > > > So, if you'll restart windows slave, please, I'll test our problematic test > > case there. > > Done. > Thanks Matt. This half helped. The initial "wget" to "git.eclipse.org" no longer goes through the proxy. So that's good. (but, after all, that 'wget' program is not a built in windows native program, but derives from linux :) But, the "help tests" still fail, as though there's a problem with "localhost" or ports, such as java.io.FileNotFoundException: http://localhost:59060/help/toc?lang=en at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1610) at java.net.URL.openStream(URL.java:1035) .... Maybe this is one of those cases these "environment variables" have to be defined as Windows "system variables". I'll try that next ... with any luck my RDP connection will stay up long enough to let me do that part of it. I'll just set those windows system variables to exactly the value we used in the hudson config. (FWIW, from the little logging I am able to (easily) do, it does appear the right values are being "seen" in the test log ... but, hard to tell when the test itself runs, perhaps they are not correctly passed through to ant (by Hudson) in which case the system variables might still help? I'd give it a 40% chance :/
> ... in which case the system > variables might still help? I'd give it a 40% chance :/ Ok, I've defined the Windows system variables. And, I did them as true "system variables". Not user variables. I still think, though, that merely Hudson has to be restarted (not reboot the windows machine ... though, that probably wouldn't hurt :) One odd thing though. I noticed in our test logs the variable ANT_ARGS, defined as it used to be, but I had removed it completely from Hudson config. When I first saw it, I thought, well, it's probably defined as System Variable. But, no ... it was not there. I added ANT_OPTS, JVM_OPTS, and JAVA_ARGS (and changed no_proxy). But, the fact that ANT_ARGS was showing up in our logs makes me wonder if there is a _third_ source of these values?! Perhaps in some hudson startup script? If so, not sure if that will matter or not (i.e. which has "priority") but thought I'd mention that odd finding. So, after one more restart ... I'm giving up :)
BTW, when I RDP to the windows slave and "ping localhost" it still comes back with "::1" address ... that is, sounds like the IPv6 changes you were going to make didn't "take". Not sure if something went wrong? Or just another sign that windows machine does need to be rebooted? I've never seen examples of it, but seems if localhost really was ::1, then we should add ::1 to the many versions of "no proxy versions.
I've rebooted the Windows slave. -M.
(In reply to comment #34) > I've rebooted the Windows slave. > > -M. Thanks Matt. But no joy. Our tests still fail in same way. ping localhost still returns IPv6 address, ::1 and listing all environment variables at start of our tests still shows ANT_ARGS=-Dhttp.proxyHost=proxy.eclipse.org -Dhttp.proxyPort=9898 -Dhttps.proxyHost=proxy.eclipse.org -Dhttps.proxyPort=9898 -Dhttp.nonProxyHosts="*.eclipse.org|172.30.206.*" -Dhttps.nonProxyHosts="*.eclipse.org" -Dftp.proxyHost=proxy.eclipse.org -Dftp.proxyPort=9898 -Dftp.nonProxyHosts="*.eclipse.org" which should no longer exist from that slave's definition and the system variables I've defined. Something is wacky. If you can find where "ANT_ARGS" is defined (and remove it, and any of the other variables that we have defined elsewhere). And figure out why it still is using IPv6 address, given comment 23, there might be some hope. But, otherwise, I'd wipe the darn thing and start over :) [Just half kidding, but seriously, I've no faith this machine is working like it should and no one seems to understand why.]
> java.io.FileNotFoundException: http://localhost:59060/help/toc?lang=en Have we tried connecting to that URI from a web browser or a shell on the local machine? My experience with these things is that Java is dumb as bricks.
Ok, I removed all of the ANT and JAVA (and no_proxy) system vars. I needed the following(and a reboot) to remove IPv6: http://support.microsoft.com/kb/929852 And this to pull IPv6 from the loopback: http://www.blueimprint.com/2011/04/disabling-ipv6-loopback-on-windows-2008-server/ Based on the comments in the second it sounds like we aren't the first people to bump into this. -M.
(In reply to comment #36) > > java.io.FileNotFoundException: http://localhost:59060/help/toc?lang=en > > Have we tried connecting to that URI from a web browser or a shell on the local > machine? My experience with these things is that Java is dumb as bricks. I'll confess I haven't tried it, because a) my RDP connection only lasts about 5 to 10 seconds at a shot and b) it would only be expected to do anything when the test was running ... and it runs in about 22 seconds. (But, granted, some "special tests" could potentially be created to sit there and run for 10 minutes for the purpose of testing it, if needed). But, I'm banking on Matt's solution :) I'm rerunning the tests now. (Oh, and the test runs in 22 seconds, but takes 15-20 minutes to "set up" ... don't ask why :)
(In reply to comment #38) > (In reply to comment #36) > > > java.io.FileNotFoundException: http://localhost:59060/help/toc?lang=en > ... > But, I'm banking on Matt's solution :) I'm rerunning the tests now. No joy. Tests still fail in same way. I confirmed pinging local host returns 127.0.0.1 so there is at least one variable gone. But, the old "ANT_ARGS" still shows up as a system variable when I list them at the start of our tests. I just call "set" from the bat file that starts our tests, before doing anything of ours. (And, did double check, and nothing in our setup mentions ANT_ARGS). I confirmed ANT_ARGS does NOT show up when running 'set' as hudsonbuild user. But, socksnonProxyHosts still does ... but ... doubt that's related to failures. Just extra. But, ANT_ARGS somehow coming from Hudson process. And ... did try http://localhost:59060/help/toc?lang=en in webbrowser, when tests were NOT running ... figured best to confirm something else wasn't already running there but "not found" as expected (as expected when tests not running). So, that just leaves the mysterious "ANT_ARGS" as only hint I know of. Not in JNPL startup/setup script? Maybe Hudson "caches" it somehow, as env. variable, from when slave first defined? (Would be odd, but you know Hudson). And, I'm not directly concerned about ANT_ARGS per se, as mentioned, we do not need it. I'm just concerned that maybe the other variables are defined/cached somewhere, and they are taking priority, somehow, in some "back door" way. If we need a "special test" that sets up starts the unit test server, then waits while we try it, that'll have to wait for week or two to find someone to do it. And, I'll confess ... who knows ... could be something else screwy like window path length limitations. I've glanced at where I think the test files are, and didn't seem unreasonably long to me. But, not sure I know how/where to look. Not sure how to "find" or "grep" things on windows, but if I did, I'd look for all occurrences of ANT_ARGS (unless you know if there is some JPNL startup startup properties?) Again, just to make sure those other http.nonproxy values are not being defined somewhere else. Really appreciate the help and at least "solving" the mysterious IPv6 address issue.
(In reply to comment #39) > > So, that just leaves the mysterious "ANT_ARGS" as only hint I know of. Not in > JNPL startup/setup script? Maybe Hudson "caches" it somehow, as env. variable, If I login as the Hudson user and issue 'set' in a command window I don't see ANT_ARGS listed, so I suspect it's coming from the master(where it is still set). -M.
Now jobs won't run on the windows slave with the old-fashioned usual error shortly after it starts ... so, guess it needs to be restarted again, just for the usual reasons: FATAL: remote file operation failed: c:\hb\workspace\JUnit-win2 at hudson.remoting.Channel@302d17ab:windows7tests hudson.util.IOException2: remote file operation failed: c:\hb\workspace\JUnit-win2 at hudson.remoting.Channel@302d17ab:windows7tests at hudson.FilePath.act(FilePath.java:754)
FWIW, I did try running the info center, directly on that windows machine, with the same VM and port we use during the test. I was thinking this might help verify the "directory lengths" were ok. c:\java\jdk7u2\jre\bin\java -classpath C:\hb\workspace\JUnit-win2\workarea\I20120717-0800\eclipse-testing\test-eclipse\eclipse\plugins\org.eclipse.help.base_3.6.100.v201207170800.jar org.eclipse.help.standalone.Infocenter -command start -eclipsehome C:\hb\workspace\JUnit-win2\workarea\I20120717-0800\eclipse-testing\test-eclipse -port 59060 At first, I got the windows firewall pop up "ok for Java to access the internet?" .... and I got all excited (but ended up that didn't make any difference). Once I said "ok" (and, in fact, I went directly to windows firewall and "permitted" all versions of Java to access internet ... I think there's 4 or 5 versions on that machine), then I COULD start up the info center from command line, and then I COULD access http://localhost:59060/help/toc?lang=en and other pages just fine. (But, remember, proxies are basically "off" now, when running from command line, since env. variables removed ... guess I could make a bat file, setting them, and then see what happens). But, still, when I run the unit tests from hudson, we get the same failures. I found if I leave "show running processes" up and running, my RDP connection stays open longer :) so I "watched" as the test ran and didn't see any "pop up" messages about security, or anything. The only slightly odd thing I noticed was the eclipse workbench came up (not uncommon during unit tests) and after a few seconds, focus changed and the workbench said "not responding" (as though, it was given a message to close, but was taking its own sweet time to close) ... but, in a about 5 or 10 seconds it did close itself ... and the tests still showed failed (not sure when they run with respect to work bench opening, if before or after). I'll try with a bat file, setting the variables as we have in Hudson, and see if the info center still runs then.
> > I'll try with a bat file, setting the variables as we have in Hudson, and see > if the info center still runs then. Yes, still runs with proxy settings. I'm nearly at a loss of what to try next.
Created attachment 218962 [details] eclipse network preferences after hudson tests run Here's a significant finding. I thought to bring up the Eclipse IDE (used for the tests, after the tests run) on the "remote desktop". In the attached screen shot, you can see eclipse is still "picking up" the OLD values. Not the new ones. I do not know if Eclipse Help even uses these values to resolve "localhost" ... I present them only as evidence that the old values are being used (presumably, by hudson, when it runs the tests). So, perhaps under the covers, Hudson gives priorities to the values it gets from the master ... as previously mentioned, it appears ANT_ARGS and its old values are coming from master. I'd suggest a good next step is to fix all the values on master, and restart it all. I think we've demonstrated the current values of the many "nonproxy" variable values work well (e.g. helped "wget" to no longer go through proxies, helped cvs to "get" to hudson.eclipse.org without going through proxy) so I think we'd want those improved values on master anyway? Not sure why windows seems so much different that others (mac and linux) but ... its not surprising. As mentioned in http://wiki.eclipse.org/Hudson#Configuring_a_proxy_for_the_p2_director there is a way for us to "insert" these preferences into eclipse ... and if we were using p2 we might have to do that ... but, I'm not sure they would be used by the appserver that serves up help. (I say "might have to do that for p2", because, note, these are marked "native" which means there is some Eclipse code that is inferring these from operating system/environment. So, I was thinking to fix on master would be a good next step to try?
> So, I was thinking to fix on master would be a good next step to try? And, if the variables on master were edited today, Friday, perhaps the auto-restart on Saturday night (Sunday AM) would suffice? Not sure how much trouble it is tor restart everything and guess would be better if someone was around for restart after variable changes (just in case something goes wrong) ... but, wanted wanted to suggest something that might simplify your workload, as well as be a squeaky wheel :)
Created attachment 219031 [details] eclipse network preferences after adjusting windows "internet options" Since there were no changes after the weekend restart, I looked at this again. In fact, I even searched bugs for issues/solutions. There were no bug directly related to this (in fact, many complaints that Eclipse's proxy settings were NOT used for Help) BUT ... reading the bugs reminded me that, in windows, there is a place to change "Internet Options" on the "Control Panel". And, sure enough, there I could "see" the old values I've been talking about. So, there, I changed the "exception list" to our expanded list (such as were "localhost" is named explicitly, also added git.eclipse.org, etc. and checked the box "bypass for local addresses" and 1) it did effect the "native" settings that Eclipse detects, as shown in the attached image, and 2) more important, allowed our "help" unit tests to run against "localhost"! Yes, that's right ... bug fixed! I'm sure there might be other ways to "fix" this settings issue (such as not set anything in "Internet Options" and let env. variables rule) but ... it works, I'm happy. Key, on windows, is "Internet Options".
Changed title to better reflect what turned out to be the problem (instead of "able to access services in ephemeral ports range") BTW, I am ASSUMING, these "Internet Options" were just set long ago and were not quite right and no one thought to check them. That is, I am ASSUMING, there's nothing we nor Hudson does to change them via variables, etc ... but, if it starts to fail after a restart or something, we might have to revisit that assumption :)
> Yes, that's right ... bug fixed! Congratulations! This one was wearing me out, and I didn't even do anything. > I'm sure there might be other ways to "fix" this settings issue (such as not > set anything in "Internet Options" and let env. variables rule) but ... it > works, I'm happy. Key, on windows, is "Internet Options". I'm not a Windows user, but I had always assumed that Internet Options were only used by MS apps, like Internet Explorer and Outlook Express, etc. Good find.