Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 517029 - HIPP3 is unavailable
Summary: HIPP3 is unavailable
Status: CLOSED FIXED
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: CI-Jenkins (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 critical (vote)
Target Milestone: ---   Edit
Assignee: CI Admin Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 516950 517433 518106 518112 518115 519731 519752 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-05-21 04:06 EDT by Viktoria Dlugopolskaya CLA
Modified: 2017-09-26 03:42 EDT (History)
13 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Viktoria Dlugopolskaya CLA 2017-05-21 04:06:24 EDT
Hi,

Cannot open https://hudson.eclipse.org/rcptt/
Comment 1 Frederic Gurr CLA 2017-05-22 07:22:49 EDT
*** Bug 516950 has been marked as a duplicate of this bug. ***
Comment 2 Frederic Gurr CLA 2017-05-22 13:20:07 EDT
hipp3 and RCPTT HIPP are back online.
Comment 3 Viktoria Dlugopolskaya CLA 2017-05-22 14:01:50 EDT
https://hudson.eclipse.org/rcptt/ is unavailable again
Comment 4 Viktoria Dlugopolskaya CLA 2017-05-22 14:25:16 EDT
It is available now.

Are there any problems with the hipp? It is very difficult to use this instance when it is unstable.
Comment 5 Frederic Gurr CLA 2017-05-22 18:07:25 EDT
The host machine (hipp3) is having issues. We are still investigating.

RCPTT HIPP is online, therefore closing.
Comment 6 Viktoria Dlugopolskaya CLA 2017-05-25 00:50:01 EDT
https://hudson.eclipse.org/rcptt/ is unavailable now, error message is shown:

> This CI instance is currently unavailable. It may be turned off, or it may be unresponsive. Members of the project can restart this service using the HIPP Control tools in their Eclipse Foundation account (login required).

> If the problem persists, please contact the project team on their forum or file a bug.
Comment 7 Eclipse Webmaster CLA 2017-05-25 16:50:05 EDT
Hipp3 is experiencing an issue that is generating GPFs, we are investigating.

-M.
Comment 8 Viktoria Dlugopolskaya CLA 2017-05-30 09:12:30 EDT
Currently unavailable
Comment 9 Derek Toolan CLA 2017-05-30 09:31:09 EDT
*** Bug 517433 has been marked as a duplicate of this bug. ***
Comment 10 Derek Toolan CLA 2017-05-30 09:32:20 EDT
Hipp3 rebooted itself again about 30 minutes ago which caused the downtime.  Same situation as comment 7, and we are still investigating.

In the meantime, the hipps have been brought back online
Comment 11 Viktoria Dlugopolskaya CLA 2017-05-30 09:34:44 EDT
Thanks for clarification
Comment 12 Matthias Sohn CLA 2017-05-30 09:41:16 EDT
(In reply to Derek Toolan from comment #10)
> Hipp3 rebooted itself again about 30 minutes ago which caused the downtime. 
> Same situation as comment 7, and we are still investigating.
> 
> In the meantime, the hipps have been brought back online

I still can't reach the JGit HIPP. Did it crash again ?
Comment 13 Derek Toolan CLA 2017-05-30 10:13:01 EDT
(In reply to Matthias Sohn from comment #12)
> (In reply to Derek Toolan from comment #10)
> > Hipp3 rebooted itself again about 30 minutes ago which caused the downtime. 
> > Same situation as comment 7, and we are still investigating.
> > 
> > In the meantime, the hipps have been brought back online
> 
> I still can't reach the JGit HIPP. Did it crash again ?

It did, and its the same fault again. I brought the hipps back online once again.
Comment 14 Matthias Sohn CLA 2017-05-30 10:22:28 EDT
I could reach it for a short time and now requests are timing out so it seems to be gone again
Comment 15 Matthias Sohn CLA 2017-05-30 10:24:59 EDT
retrying once more it seems it's responding but very slowly
Comment 16 Eclipse Webmaster CLA 2017-05-30 10:46:48 EDT
The system became unresponsive so we've had to completely restart it.  I'm currently running some tests on the memory to try an isolate the issue.  Once the tests are complete we'll restart the instances hosted on this node.

-M.
Comment 17 Matthias Sohn CLA 2017-05-30 18:46:13 EDT
Any ETA when we can expect Hudson to be back online ?
Comment 18 Matthias Sohn CLA 2017-05-30 18:56:35 EDT
the second retry to restart the JGit HIPP succeeded so it's back online
Comment 19 Eclipse Webmaster CLA 2017-05-31 09:11:28 EDT
After serveral hours the system restarted itself while running the memory tests.  While not conclusive we're going to go forward presuming that some of the RAM is bad.  I'll have to try and find a donor system so we can try replacing the RAM.

All hipps hosted on this machine have been restarted.

-M.
Comment 20 Eclipse Webmaster CLA 2017-06-02 11:31:29 EDT
I've found a donor with compatible RAM that we can use.  The catch is there will be some performance impact as the donor only has 64G(the current system has 128G). 

I'm going to swap the RAM Monday June 5th at 8am EDT, and I expect the server to be offline for about an hour or so.

Once the swap is finished we'll test it on the donor and if we find a fault we'll look into replacing the DIMM(s) in question.  If not and hipp3 remains unstable we'll engage with the hardware vendor about a repair/replacement.

-M.
Comment 21 Eclipse Webmaster CLA 2017-06-05 16:18:36 EDT
So the memory swap was successful, however hipp3 has faulted a couple of times today so I'm not convinced that memory is the culprit.  The donor is still running tests on hipp3's RAM so I'll wait for that to finish but it's looking like there is something else responsible.

Once the test results are in I'll look into getting in touch with the vendor and we'll go from there.

-M.
Comment 22 Frederic Gurr CLA 2017-06-12 05:52:16 EDT
Unfortunately HIPP3 is unavailable again, we will need to wait for webmaster to bring it back online.
Comment 23 Frederic Gurr CLA 2017-06-12 05:53:55 EDT
*** Bug 518112 has been marked as a duplicate of this bug. ***
Comment 24 Frederic Gurr CLA 2017-06-12 05:57:15 EDT
*** Bug 518106 has been marked as a duplicate of this bug. ***
Comment 25 Frederic Gurr CLA 2017-06-12 07:07:56 EDT
*** Bug 518115 has been marked as a duplicate of this bug. ***
Comment 26 Eclipse Webmaster CLA 2017-06-12 09:11:30 EDT
#1) The host has been restarted.

#2) I think this indicates that RAM was not the cause of these outages.  We'll engage with the hardware vendor.

-M.
Comment 27 Cristiano De Alti CLA 2017-06-12 12:16:59 EDT
I'm not sure if it's related but our kura-develop job is failing the Sonar step due to an JDBC connection problem [1].

[1] https://hudson.eclipse.org/kura/job/kura-develop/1076/consoleFull
Comment 28 Eclipse Webmaster CLA 2017-06-14 11:36:48 EDT
(In reply to Cristiano De Alti from comment #27)

I don't think that's related to the issue at hand.

I've spoken to our vendor and they are recommending we upgrade our firmware.  As such I'm going to shut the system down Friday June 16th at 3:30pm EDT and I'll replace the RAM and run the updates at the same time.  I'm expecting the system will be down for about an hour and a half while all of this happens.

-M.
Comment 29 Eclipse Webmaster CLA 2017-07-05 16:31:42 EDT
Since the firmware updates the system has been much more stable.  Closing as 'resolved', please re-open if something goes wrong.

-M.
Comment 30 Frederic Gurr CLA 2017-07-17 05:05:33 EDT
*** Bug 519731 has been marked as a duplicate of this bug. ***
Comment 31 Frederic Gurr CLA 2017-07-17 05:06:43 EDT
hipp3 is down again. :(
Comment 32 Frederic Gurr CLA 2017-07-17 07:50:19 EDT
*** Bug 519752 has been marked as a duplicate of this bug. ***
Comment 33 Mikaël Barbero CLA 2017-07-17 07:57:26 EDT
(In reply to Frederic Gurr from comment #31)
> hipp3 is down again. :(

then reopening
Comment 34 Frederic Gurr CLA 2017-07-17 11:37:42 EDT
hipp3 is back online.
Comment 35 Viktoria Dlugopolskaya CLA 2017-09-26 02:33:07 EDT
HIPP3 is unavailable now (try to use https://hudson.eclipse.org/rcptt/)
Comment 36 Mikaël Barbero CLA 2017-09-26 03:42:12 EDT
(In reply to Viktoria Dlugopolskaya from comment #35)
> HIPP3 is unavailable now (try to use https://hudson.eclipse.org/rcptt/)

hipp3 is back online.