| Summary: | Scout HIPP Help | ||
|---|---|---|---|
| Product: | Community | Reporter: | Stephan Leicht Vogt <stephan.leichtvogt> |
| Component: | CI-Jenkins | Assignee: | CI Admin Inbox <ci.admin-inbox> |
| Status: | RESOLVED WORKSFORME | QA Contact: | |
| Severity: | major | ||
| Priority: | P3 | CC: | mikael.barbero, pierre-charles.david, webmaster |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Mac OS X | ||
| Whiteboard: | |||
| Attachments: | |||
|
Description
Stephan Leicht Vogt
I get ERR_CONNECTION_REFUSED from my browser (Chromium) on all https://hudson.eclipse.org/ sites (tried with https://hudson.eclipse.org/sirius, https://hudson.eclipse.org/scout/ and https://hudson.eclipse.org/ itself). curl tells me this: % curl 'https://hudson.eclipse.org/' curl: (7) Failed to connect to hudson.eclipse.org port 443: Connection refused I am working on it. It seems there is an issue in the proxy server that serves all hipps. HIPPs are safe and running, but not available from outside eclipse's infra. Proxy server has been restarted and everything seem to work properly now. It still doesn't work for me. (In reply to Stephan Leicht Vogt from comment #4) > It still doesn't work for me. Your instance was corrupted. I had to to clean the caches and restart your HIPP. It seems to work again. Please reopen if you see any weird behaviors. (In reply to Mikael Barbero from comment #5) > (In reply to Stephan Leicht Vogt from comment #4) > > It still doesn't work for me. > > Your instance was corrupted. I had to to clean the caches and restart your > HIPP. It seems to work again. Please reopen if you see any weird behaviors. Looks good now, thanks. (In reply to Mikael Barbero from comment #5) > (In reply to Stephan Leicht Vogt from comment #4) > > It still doesn't work for me. > > Your instance was corrupted. I had to to clean the caches and restart your > HIPP. It seems to work again. Please reopen if you see any weird behaviors. I think we have the same issue again. I can't access the scout hipp. First downloads of maven artefact got stuck, then I restarted the HIPP instance via my account page. Then it seems to not be able to fully reboot. I killed a bunch of maven jobs and restarted your HIPP. It seems to work now. I see that the maven configuration is quite different from what we used to do in other HIPP. I suggest to revert to standard installation. Should I do that? It won't break any of your build. (In reply to Mikael Barbero from comment #9) Thanks for the quick fix. > I see that the maven configuration is quite different from what we used > to do in other HIPP. I suggest to revert to standard installation. Should I > do that? It won't break any of your build. Yes, please do that. I a fan of a homogen infrastructure. (In reply to Stephan Leicht Vogt from comment #10) > (In reply to Mikael Barbero from comment #9) > > Thanks for the quick fix. You're welcome > > > I see that the maven configuration is quite different from what we used > > to do in other HIPP. I suggest to revert to standard installation. Should I > > do that? It won't break any of your build. > > Yes, please do that. I a fan of a homogen infrastructure. Done. I updated the configuration to a more standard one (as we use on other HIPP servers). I am closing this bug, please reopen if the issue comes up again. (In reply to Stephan Leicht Vogt from comment #0) > After restarting Scout HIPP instance (https://hudson.eclipse.org/scout/) > only "Service Unavailable" is resulting. I have this problem again. Please "clean the caches and restart your HIPP". Done. A job has gone wild and was not killed properly when I stopped the parent Hudson process: "org.eclipse.scout.rt_deploy_from_tag". I had to kill it manually. I don't know if it's related but I thought it was worth noticing you. (In reply to Mikael Barbero from comment #13) Thanks I have this problem again. Please "clean the caches and restart your HIPP". (In reply to Stephan Leicht Vogt from comment #15) > I have this problem again. Please "clean the caches and restart your HIPP". I restarted the HIPP because maven downloads from maven central hung until timeout. So the restart was an desperate try to solve/workaround this issue. Done. This time I've seen nothing abnormal (no wild process). It's really weird because you're the only one with such issues with Hudson on your server. (In reply to Mikael Barbero from comment #17) > Done. This time I've seen nothing abnormal (no wild process). It's really > weird because you're the only one with such issues with Hudson on your > server. Thank you for the help. Do we have more traffic/load... than others? Though the build gets still stuck at downloading from central: https://hudson.eclipse.org/scout/job/org.eclipse.scout.rt_5_2_and_higher_gerrit/78/console @Mikael: Can you help us with the issue of 'stuck maven central download'? It seems to be a HTTPS proxy. Matt, Denis, do you see anything weird on the proxy side? I don't see anything in the proxy error logs that looks related, but it looks like https://repo.maven.apache.org/maven2 redirects to https://repo1.maven.org/maven2/ which 403s with a notice to use https://repo1.maven.org . It also appears to have a bad SSL cert domain name. -M. I could workaround this problem with a groovy script in https://hudson.eclipse.org/scout/script which downloads these files to the local maven repository. I don't understand why this works but from maven not... Created attachment 258562 [details]
GroovyScript which downloads some files from maven central
Can I wipe your workspace to try something and see if a clean build manages to download from maven? (In reply to Mikael Barbero from comment #25) > Can I wipe your workspace to try something and see if a clean build manages > to download from maven? Sure, anything that gives us any hint what the problem may be. I've added MAVEN_OPTS env variable with proxy configuration and it seems to help. The build is green. Please reopen if you still face the issue. In the meantime, we are investigating the state of the machine your HIPP is running on to try to resolve your recurrent issues. (In reply to Mikael Barbero from comment #27) Thank you. Hi, we after a restart we are faced again wit "Service Unavailable". Please help. Thanks Stephan (In reply to Stephan Leicht Vogt from comment #29) Mh, it works again. I've just killed all rogue processes and restarted the HIPP. That's why it works again. I will attach the list of rogue processes shortly. (In reply to Mikael Barbero from comment #31) > I've just killed all rogue processes and restarted the HIPP. That's why it > works again. I will attach the list of rogue processes shortly. Ah, ok. Thanks for the quick help! Created attachment 258868 [details]
List of processes that did not stop when I killed the parent Hudson process
Please check the associated jobs
We have many jobs which do lock while running a maven plugin: *** from https://hudson.eclipse.org/scout/job/org.eclipse.scout.rt_5_2_and_higher_gerrit/217/console *** [INFO] --- animal-sniffer-maven-plugin:1.14:check (enforce-java-api-compatibility) @ org.eclipse.scout.rt.server.jdbc.test --- [INFO] Checking unresolved references to org.codehaus.mojo.signature:java17:1.0 But I'm still unsure if this specific plugin is the bad one or that the lock has another source. So we do abort the stuck jobs which may result in all those rogue processes. Yes, every time I abort a job or it runs into a build timeout (https://hudson.eclipse.org/scout/job/org.eclipse.scout.rt_5_2_and_higher_gerrit/217/console) a process keeps running. Can you create stack dumps (jstack) of every java process currently running? And a heap-dump of the youngest java process (should be a maven process). Created attachment 258882 [details] stacktraces of running java processes You should be able to download the jmap output from https://hudson.eclipse.org/scout/job/test-webmaster/ws/31776.jmap.bin It was a combination of surefire 2.19 with this issue: https://issues.apache.org/jira/browse/SUREFIRE-1182 and surefire property "forkedProcessTimeoutInSeconds" set to 30 seconds. It looks like if a fork runs into this 30 seconds it stops but doesn't release a lock. I downgraded now to 2.18.1 and upped the timeout to 300 seconds. Thanks for the analysis. I'm glad the infra is not part of the issue ;) |