Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 540106

Summary: Platform servers need to be restarted?
Product: Community Reporter: Andrey Loskutov <loskutov>
Component: ServersAssignee: Eclipse Webmaster <webmaster>
Status: CLOSED INVALID QA Contact:
Severity: major    
Priority: P3 CC: akurtakov, christian.dietrich.opensource, daniel_megert, Lars.Vogel, mikael.barbero, mistria
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
See Also: https://bugs.eclipse.org/bugs/show_bug.cgi?id=540124
Whiteboard:

Description Andrey Loskutov CLA 2018-10-14 02:50:54 EDT
I guess Platform UI Gerrit server must be restarted. Most jobs run over 2 hours instead of only one and then killed.


https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/
Comment 1 Andrey Loskutov CLA 2018-10-14 03:22:10 EDT
Probably this is also the reason why official builds did not run for Linux : http://download.eclipse.org/eclipse/downloads/drops4/I20181012-1800/testResults.php
Comment 2 Andrey Loskutov CLA 2018-10-15 02:52:27 EDT
I've managed to restart Platform UI Gerrit, will see if this helps (I guess not, but who knows).
I've re-triggered the build for https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/16088/ - if this does not succeed, we still have a problem.
Comment 3 Mikaël Barbero CLA 2018-10-15 04:12:36 EDT
I see nothing special on the infra side. The job has been configured for a long time to abort when stuck. The configured may need to be tweaked.

Note however that yesterday, a change has been made by Lars to the job config: 

clean verify -Pbuild-individual-bundles -Pbree-libs

has been changed to 

clean verify -Pbuild-individual-bundles -Pbree-libs -fae

Asking maven to fail at end may be the cause of the timeout.
Comment 4 Alexander Kurtakov CLA 2018-10-15 04:14:21 EDT
(In reply to Mikaël Barbero from comment #3)
> I see nothing special on the infra side. The job has been configured for a
> long time to abort when stuck. The configured may need to be tweaked.
> 
> Note however that yesterday, a change has been made by Lars to the job
> config: 
> 
> clean verify -Pbuild-individual-bundles -Pbree-libs
> 
> has been changed to 
> 
> clean verify -Pbuild-individual-bundles -Pbree-libs -fae
> 
> Asking maven to fail at end may be the cause of the timeout.

You can see that https://ci.eclipse.org/releng/job/ep410I-unit-cen64-gtk3/ also timeouts and so do https://ci.eclipse.org/pde/job/eclipse.pde.ui-Gerrit/ . So smth is fishy with infra IMHO.
Comment 5 Andrey Loskutov CLA 2018-10-15 04:22:14 EDT
(In reply to Andrey Loskutov from comment #2)
> I've managed to restart Platform UI Gerrit, will see if this helps (I guess
> not, but who knows).
> I've re-triggered the build for
> https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/16088/ - if
> this does not succeed, we still have a problem.

The build is still running (since 1.5 hours) => no, Gerrit restart didn't help :-(


(In reply to Mikaël Barbero from comment #3)
> I see nothing special on the infra side. The job has been configured for a
> long time to abort when stuck. The configured may need to be tweaked.
> 
> Note however that yesterday, a change has been made by Lars to the job
> config: 

The last successful Platform UI build on Gerrit was two days ago, so the fails started before job changes.

Please also note, the *official* Platform builds aren't succeeded for all Linux since I20181012-1800 build. So something got broken on Friday or after I20181010-1800.
Comment 6 Mikaël Barbero CLA 2018-10-15 04:24:04 EDT
(In reply to Andrey Loskutov from comment #5)
> (In reply to Andrey Loskutov from comment #2)
> > I've managed to restart Platform UI Gerrit, will see if this helps (I guess
> > not, but who knows).
> > I've re-triggered the build for
> > https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/16088/ - if
> > this does not succeed, we still have a problem.
> 
> The build is still running (since 1.5 hours) => no, Gerrit restart didn't
> help :-(

What do you mean by gerrit restart? You restarted the Jenkins instance? or just the connection between Gerrit and Jenkins?
Comment 7 Andrey Loskutov CLA 2018-10-15 04:32:50 EDT
(In reply to Mikaël Barbero from comment #6)
> (In reply to Andrey Loskutov from comment #5)
> > (In reply to Andrey Loskutov from comment #2)
> > > I've managed to restart Platform UI Gerrit, will see if this helps (I guess
> > > not, but who knows).
> > > I've re-triggered the build for
> > > https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/16088/ - if
> > > this does not succeed, we still have a problem.
> > 
> > The build is still running (since 1.5 hours) => no, Gerrit restart didn't
> > help :-(
> 
> What do you mean by gerrit restart? You restarted the Jenkins instance? or
> just the connection between Gerrit and Jenkins?

If I only knew... I've clicked on "Restart" icon shown next to "CI Control: Eclipse Platform: " entry on the
 under https://accounts.eclipse.org/users/aloskutov page.

There is no hint what it is supposed to do, but it looks like it restarted Jenkins running on https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/, because just after hitting this button I've got 502 error on that URL.
Comment 8 Mikaël Barbero CLA 2018-10-15 04:34:52 EDT
(In reply to Andrey Loskutov from comment #7)
> If I only knew... I've clicked on "Restart" icon shown next to "CI Control:
> Eclipse Platform: " entry on the
>  under https://accounts.eclipse.org/users/aloskutov page.
> 
> There is no hint what it is supposed to do, but it looks like it restarted
> Jenkins running on
> https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/, because
> just after hitting this button I've got 502 error on that URL.

It restarts the Jenkins instance. You've done the right thing.


I'm investigating.
Comment 9 Mikaël Barbero CLA 2018-10-15 05:10:15 EDT
I'm still seeing no issue with platform's JIPP. I'm still investigating.
Comment 10 Mikaël Barbero CLA 2018-10-15 05:13:04 EDT
I've noted that in all aborted builds, the job is stuck in the test:

----- testGetWorkbenchWindows
testGetWorkbenchWindows: setUp...

Has it been changed recently so that it cannot be run non-interactively? 

See 

https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/16082/console
https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/16083/console
https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/16084/console
https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/16088/console
Comment 11 Mikaël Barbero CLA 2018-10-15 05:15:38 EDT
(In reply to Alexander Kurtakov from comment #4)
> You can see that https://ci.eclipse.org/releng/job/ep410I-unit-cen64-gtk3/
> also timeouts and so do
> https://ci.eclipse.org/pde/job/eclipse.pde.ui-Gerrit/ . So smth is fishy
> with infra IMHO.

For PDE, for all aborted builds, the jobs stay stuck in org.eclipse.ui.tests.smartimport.AllTests. Could it be related?
Comment 12 Alexander Kurtakov CLA 2018-10-15 05:16:07 EDT
Would you please check whether there isn't some kind of inode exhaustion? 'df -i' should give info. 
Also can the machine be rebooted? (just in case :).
Comment 13 Andrey Loskutov CLA 2018-10-15 05:44:43 EDT
(In reply to Mikaël Barbero from comment #10)
> I've noted that in all aborted builds, the job is stuck in the test:
> 
> ----- testGetWorkbenchWindows
> testGetWorkbenchWindows: setUp...
> 
> Has it been changed recently so that it cannot be run non-interactively? 
> 
> See 
> 
> https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/16082/console
> https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/16083/console
> https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/16084/console
> https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/16088/console

I'm not aware about changes in related code. Running test from IDE the test passes for me on GTK 3.22 / RHEL 7.4.

May there was some change in Tycho?
Comment 14 Mikaël Barbero CLA 2018-10-15 05:46:59 EDT
(In reply to Alexander Kurtakov from comment #12)
> Would you please check whether there isn't some kind of inode exhaustion?
> 'df -i' should give info. 

No inode exhaustion.

> Also can the machine be rebooted? (just in case :).

There are several other projects running on this machine. Rebooting it for no *visible* reasons would be harsh.
Comment 15 Alexander Kurtakov CLA 2018-10-15 08:28:24 EDT
We are going to rerun the last tests that succeeded in order to answer the question is it machine issue or change in platform.
Comment 16 Alexander Kurtakov CLA 2018-10-15 08:34:56 EDT
https://ci.eclipse.org/releng/job/ep410I-unit-cen64-gtk3/41/console - if it succeeds it's infra issue, if it doesn't it's platform change.
Comment 17 Alexander Kurtakov CLA 2018-10-15 08:35:28 EDT
(In reply to Alexander Kurtakov from comment #16)
> https://ci.eclipse.org/releng/job/ep410I-unit-cen64-gtk3/41/console - if it
> succeeds it's infra issue, if it doesn't it's platform change.

heh, actually the opposite
Comment 18 Andrey Loskutov CLA 2018-10-15 08:36:08 EDT
See bug 540124 comment 8 about possible SWT change which might have causing this hangup in some configs / tests.
Comment 19 Mikaël Barbero CLA 2018-10-15 15:11:56 EDT
https://ci.eclipse.org/releng/job/ep410I-unit-cen64-gtk3/41/ is successful. The issue will most probably better be discussed on bug 540124.