Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 482271

Summary: Scout HIPP Help
Product: Community Reporter: Stephan Leicht Vogt <stephan.leichtvogt>
Component: CI-JenkinsAssignee: CI Admin Inbox <ci.admin-inbox>
Status: RESOLVED WORKSFORME QA Contact:
Severity: major    
Priority: P3 CC: mikael.barbero, pierre-charles.david, webmaster
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Mac OS X   
Whiteboard:
Attachments:
Description Flags
GroovyScript which downloads some files from maven central
none
List of processes that did not stop when I killed the parent Hudson process
none
stacktraces of running java processes none

Description Stephan Leicht Vogt CLA 2015-11-16 07:07:23 EST
After restarting Scout HIPP instance (https://hudson.eclipse.org/scout/) only "Service Unavailable" is resulting.
Comment 1 Pierre-Charles David CLA 2015-11-16 07:38:02 EST
I get ERR_CONNECTION_REFUSED from my browser (Chromium) on all https://hudson.eclipse.org/ sites (tried with https://hudson.eclipse.org/sirius, https://hudson.eclipse.org/scout/ and https://hudson.eclipse.org/ itself).

curl tells me this:
% curl 'https://hudson.eclipse.org/'
curl: (7) Failed to connect to hudson.eclipse.org port 443: Connection refused
Comment 2 Mikaël Barbero CLA 2015-11-16 07:39:53 EST
I am working on it. It seems there is an issue in the proxy server that serves all hipps. HIPPs are safe and running, but not available from outside eclipse's infra.
Comment 3 Mikaël Barbero CLA 2015-11-16 08:24:23 EST
Proxy server has been restarted and everything seem to work properly now.
Comment 4 Stephan Leicht Vogt CLA 2015-11-16 08:52:02 EST
It still doesn't work for me.
Comment 5 Mikaël Barbero CLA 2015-11-16 09:20:40 EST
(In reply to Stephan Leicht Vogt from comment #4)
> It still doesn't work for me.

Your instance was corrupted. I had to to clean the caches and restart your HIPP. It seems to work again. Please reopen if you see any weird behaviors.
Comment 6 Stephan Leicht Vogt CLA 2015-11-16 09:28:58 EST
(In reply to Mikael Barbero from comment #5)
> (In reply to Stephan Leicht Vogt from comment #4)
> > It still doesn't work for me.
> 
> Your instance was corrupted. I had to to clean the caches and restart your
> HIPP. It seems to work again. Please reopen if you see any weird behaviors.

Looks good now, thanks.
Comment 7 Stephan Leicht Vogt CLA 2015-11-18 04:28:56 EST
(In reply to Mikael Barbero from comment #5)
> (In reply to Stephan Leicht Vogt from comment #4)
> > It still doesn't work for me.
> 
> Your instance was corrupted. I had to to clean the caches and restart your
> HIPP. It seems to work again. Please reopen if you see any weird behaviors.

I think we have the same issue again. I can't access the scout hipp.
Comment 8 Stephan Leicht Vogt CLA 2015-11-18 04:31:07 EST
First downloads of maven artefact got stuck, then I restarted the HIPP instance via my account page. Then it seems to not be able to fully reboot.
Comment 9 Mikaël Barbero CLA 2015-11-18 04:38:03 EST
I killed a bunch of maven jobs and restarted your HIPP. It seems to work now. I see that the maven configuration is quite different from what we used to do in other HIPP. I suggest to revert to standard installation. Should I do that? It won't break any of your build.
Comment 10 Stephan Leicht Vogt CLA 2015-11-18 04:39:57 EST
(In reply to Mikael Barbero from comment #9)

Thanks for the quick fix.

> I see that the maven configuration is quite different from what we used
> to do in other HIPP. I suggest to revert to standard installation. Should I
> do that? It won't break any of your build.

Yes, please do that. I a fan of a homogen infrastructure.
Comment 11 Mikaël Barbero CLA 2015-11-18 04:46:33 EST
(In reply to Stephan Leicht Vogt from comment #10)
> (In reply to Mikael Barbero from comment #9)
> 
> Thanks for the quick fix.

You're welcome

> 
> > I see that the maven configuration is quite different from what we used
> > to do in other HIPP. I suggest to revert to standard installation. Should I
> > do that? It won't break any of your build.
> 
> Yes, please do that. I a fan of a homogen infrastructure.

Done. I updated the configuration to a more standard one (as we use on other HIPP servers). I am closing this bug, please reopen if the issue comes up again.
Comment 12 Stephan Leicht Vogt CLA 2015-11-30 04:42:05 EST
(In reply to Stephan Leicht Vogt from comment #0)
> After restarting Scout HIPP instance (https://hudson.eclipse.org/scout/)
> only "Service Unavailable" is resulting.

I have this problem again. Please "clean the caches and restart your HIPP".
Comment 13 Mikaël Barbero CLA 2015-11-30 04:53:38 EST
Done. A job has gone wild and was not killed properly when I stopped the parent Hudson process: "org.eclipse.scout.rt_deploy_from_tag". I had to kill it manually. I don't know if it's related but I thought it was worth noticing you.
Comment 14 Stephan Leicht Vogt CLA 2015-11-30 04:54:37 EST
(In reply to Mikael Barbero from comment #13)
Thanks
Comment 15 Stephan Leicht Vogt CLA 2015-12-09 07:48:59 EST
I have this problem again. Please "clean the caches and restart your HIPP".
Comment 16 Stephan Leicht Vogt CLA 2015-12-09 07:50:05 EST
(In reply to Stephan Leicht Vogt from comment #15)
> I have this problem again. Please "clean the caches and restart your HIPP".

I restarted the HIPP because maven downloads from maven central hung until timeout. So the restart was an desperate try to solve/workaround this issue.
Comment 17 Mikaël Barbero CLA 2015-12-09 08:03:38 EST
Done. This time I've seen nothing abnormal (no wild process). It's really weird because you're the only one with such issues with Hudson on your server.
Comment 18 Stephan Leicht Vogt CLA 2015-12-09 08:10:33 EST
(In reply to Mikael Barbero from comment #17)
> Done. This time I've seen nothing abnormal (no wild process). It's really
> weird because you're the only one with such issues with Hudson on your
> server.

Thank you for the help. Do we have more traffic/load... than others?
Comment 19 Stephan Leicht Vogt CLA 2015-12-09 08:12:27 EST
Though the build gets still stuck at downloading from central: https://hudson.eclipse.org/scout/job/org.eclipse.scout.rt_5_2_and_higher_gerrit/78/console
Comment 20 Stephan Leicht Vogt CLA 2015-12-09 09:09:35 EST
@Mikael: Can you help us with the issue of 'stuck maven central download'?
Comment 21 Mikaël Barbero CLA 2015-12-09 09:51:51 EST
It seems to be a HTTPS proxy. 

Matt, Denis, do you see anything weird on the proxy side?
Comment 22 Eclipse Webmaster CLA 2015-12-09 14:02:21 EST
I don't see anything in the proxy error logs that looks related, but it looks like https://repo.maven.apache.org/maven2 redirects to https://repo1.maven.org/maven2/ which 403s with a notice to use https://repo1.maven.org . It also appears to have a bad SSL cert domain name.

-M.
Comment 23 Stephan Leicht Vogt CLA 2015-12-10 02:00:15 EST
I could workaround this problem with a groovy script in https://hudson.eclipse.org/scout/script which downloads these files to the local maven repository. I don't understand why this works but from maven not...
Comment 24 Stephan Leicht Vogt CLA 2015-12-10 02:01:22 EST
Created attachment 258562 [details]
GroovyScript which downloads some files from maven central
Comment 25 Mikaël Barbero CLA 2015-12-10 04:24:50 EST
Can I wipe your workspace to try something and see if a clean build manages to download from maven?
Comment 26 Stephan Leicht Vogt CLA 2015-12-10 04:30:28 EST
(In reply to Mikael Barbero from comment #25)
> Can I wipe your workspace to try something and see if a clean build manages
> to download from maven?

Sure, anything that gives us any hint what the problem may be.
Comment 27 Mikaël Barbero CLA 2015-12-10 05:29:58 EST
I've added MAVEN_OPTS env variable with proxy configuration and it seems to help. The build is green. Please reopen if you still face the issue. 

In the meantime, we are investigating the state of the machine your HIPP is running on to try to resolve your recurrent issues.
Comment 28 Stephan Leicht Vogt CLA 2015-12-10 05:41:57 EST
(In reply to Mikael Barbero from comment #27)
Thank you.
Comment 29 Stephan Leicht Vogt CLA 2015-12-23 05:09:57 EST
Hi, we after a restart we are faced again wit "Service Unavailable". Please help.

Thanks
Stephan
Comment 30 Stephan Leicht Vogt CLA 2015-12-23 05:21:56 EST
(In reply to Stephan Leicht Vogt from comment #29)
Mh, it works again.
Comment 31 Mikaël Barbero CLA 2015-12-23 05:22:59 EST
I've just killed all rogue processes and restarted the HIPP. That's why it works again. I will attach the list of rogue processes shortly.
Comment 32 Stephan Leicht Vogt CLA 2015-12-23 05:25:41 EST
(In reply to Mikael Barbero from comment #31)
> I've just killed all rogue processes and restarted the HIPP. That's why it
> works again. I will attach the list of rogue processes shortly.

Ah, ok. Thanks for the quick help!
Comment 33 Mikaël Barbero CLA 2015-12-23 05:27:26 EST
Created attachment 258868 [details]
List of processes that did not stop when I killed the parent Hudson process

Please check the associated jobs
Comment 34 Stephan Leicht Vogt CLA 2015-12-23 07:03:07 EST
We have many jobs which do lock while running a maven plugin:

***
from https://hudson.eclipse.org/scout/job/org.eclipse.scout.rt_5_2_and_higher_gerrit/217/console
***
[INFO] --- animal-sniffer-maven-plugin:1.14:check (enforce-java-api-compatibility) @ org.eclipse.scout.rt.server.jdbc.test ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java17:1.0

But I'm still unsure if this specific plugin is the bad one or that the lock has another source.

So we do abort the stuck jobs which may result in all those rogue processes.
Comment 35 Stephan Leicht Vogt CLA 2015-12-23 07:54:35 EST
Yes, every time I abort a job or it runs into a build timeout (https://hudson.eclipse.org/scout/job/org.eclipse.scout.rt_5_2_and_higher_gerrit/217/console) a process keeps running.

Can you create stack dumps (jstack) of every java process currently running?

And a heap-dump of the youngest java process (should be a maven process).
Comment 36 Mikaël Barbero CLA 2015-12-23 08:26:44 EST
Created attachment 258882 [details]
stacktraces of running java processes

You should be able to download the jmap output from https://hudson.eclipse.org/scout/job/test-webmaster/ws/31776.jmap.bin
Comment 37 Stephan Leicht Vogt CLA 2015-12-23 10:01:30 EST
It was a combination of surefire 2.19 with this issue: https://issues.apache.org/jira/browse/SUREFIRE-1182 and surefire property "forkedProcessTimeoutInSeconds" set to 30 seconds. It looks like if a fork runs into this 30 seconds it stops but doesn't release a lock.

I downgraded now to 2.18.1 and upped the timeout to 300 seconds.
Comment 38 Mikaël Barbero CLA 2015-12-23 10:03:34 EST
Thanks for the analysis. I'm glad the infra is not part of the issue ;)