Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 571351 - [OI2JIRO] Migration of Object Teams JIPP to new CI infrastructure
Summary: [OI2JIRO] Migration of Object Teams JIPP to new CI infrastructure
Status: CLOSED FIXED
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: CI-Jenkins (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Stephan Herrmann CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 571640
Blocks: 544221
  Show dependency tree
 
Reported: 2021-02-19 06:48 EST by Frederic Gurr CLA
Modified: 2021-03-23 14:05 EDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Frederic Gurr CLA 2021-02-19 06:48:56 EST
In the coming days we will migrate the Object Teams JIPP to our new CI infrastructure.

Here is what's going to happen:

We will
* create a JIPP on the new infra
* copy job configurations, views, etc to the new JIPP
* ask the project to double-check that everything works as expected on the new JIPP
* archive the old instance, once the project confirms that the new JIPP is functional

More info can be found here: https://wiki.eclipse.org/CBI/Jenkins_Migration_FAQ

Please let us know if you have any questions, concerns or upcoming releases that should be taken into account.
Comment 1 Frederic Gurr CLA 2021-03-02 13:59:38 EST
The Object Teams JIPP on Jiro is available here now:

=> https://ci-staging.eclipse.org/objectteams


PLEASE NOTE:
* Please double-check that all jobs on the new JIPP still work as expected. We recommend to disable all jobs on the old JIPP (https://ci.eclipse.org/objectteams) to avoid duplicate builds. You can still use it to compare the settings.

* Publishing to download.eclipse.org requires access via SCP. We've added the credentials to the JIPP. Please see https://wiki.eclipse.org/Jenkins#How_do_I_deploy_artifacts_to_download.eclipse.org.3F for more info.

* To simplify setting up jobs on our cluster-based infra, we provide a "migration" pod template that can also be used with Freestyle jobs. The pod template has the label "migration" which can be specified in the job configuration under "Restrict where this project can be run". The image should contain most of the dependencies that were available on the hipp machines on the old infra.

* If you use UI tests (and see errors like ‘FATAL: Cannot run program "Xvnc”’) please use the migration pod template.

* Also, the tools paths have changed. Tools can now be found under /opt/tools instead of /shared/common. See also: https://wiki.eclipse.org/Jenkins#Tools_.28and_locations_on_the_default_JNLP_agent_container.29

* Please also note that the ci-staging.eclipse.org domain will only be used temporarily. Once the migration is confirmed to be successful, we will switch the old JIPP off and move the new JIPP over to ci.eclipse.org.

For even more info, see https://wiki.eclipse.org/Jenkins.
Comment 2 Stephan Herrmann CLA 2021-03-02 15:06:12 EST
Hi Fred, I could use a jumpt start right from the outset:

On the old Jenkins the build was bootstrapped by

$ git archive --remote=file://localhost/gitroot/objectteams/org.eclipse.objectteams.git ${branch} releng | tar xv

On the new instance none of these work:

(1) file://localhost/gitroot/objectteams/org.eclipse.objectteams.git

(2) git://git.eclipse.org/gitroot/objectteams/org.eclipse.objectteams.git

(3) https://git.eclipse.org/r/objectteams/org.eclipse.objectteams.git

I'm not surprised about (1).

(2) says:
access denied or repository not exported: /gitroot/objectteams/org.eclipse.objectteams.git

(3) says:
Operation not supported by protocol.


Surely Jenkins can access git :)  but how?


PS: Clicking on Preview for this comment says, the token is more than 3 days old and hence preview doesn't work. The truth is closer to 3 MINUTES, rather than DAYS. And I already had to log in twice within just a few minutes.
Comment 3 Stephan Herrmann CLA 2021-03-02 15:18:51 EST
One more failure

(4) ssh://genie.objectteams@git.eclipse.org:29418/objectteams/org.eclipse.objectteams.git
Permission denied (publickey).
fatal: The remote end hung up unexpectedly


(And after yet another login, bugzilla still doesn't like my token)
Comment 4 Frederic Gurr CLA 2021-03-03 06:07:52 EST
(In reply to Stephan Herrmann from comment #2)
> Surely Jenkins can access git :)  but how?
As you can see in https://ci-staging.eclipse.org/objectteams/job/webmaster-test/5/console, Jenkins can access git. I don't know, why git:// and https:// does not work with git archive and I won't dig into it.

(In reply to Stephan Herrmann from comment #3)
> (4)
> ssh://genie.objectteams@git.eclipse.org:29418/objectteams/org.eclipse.
> objectteams.git
> Permission denied (publickey).
> fatal: The remote end hung up unexpectedly
SSH requires credentials. Therefore the job config option "Build Environment -> SSH agent" needs to be enabled. Please make sure to select the correct credentials: "genie.objectteams (ssh://genie.objectteams@git.eclipse.org)"

I've enabled the option for your buildAndTest build job. The git part worked, but Ant was not found. Therefore I've enabled the "Build Environment -> With Ant" option.
Comment 5 Stephan Herrmann CLA 2021-03-03 14:03:43 EST
Thanks Fred,

With your help I've inched along further, investing several hours today.

In the end it seemed I "killed" it: 
504 Gateway Time-out
The server didn't respond in time. 

Last thing I saw that a build was triggered and waited (several minutes) for an executor.
Comment 6 Frederic Gurr CLA 2021-03-03 14:22:39 EST
Not sure what happened, but I gave the Jenkins instance a kick and it's now back online.
Comment 7 Stephan Herrmann CLA 2021-03-03 18:09:24 EST
Finally (after many hours of trial and error) I got one build to reach the testing stage, which is a huge success, BUT now I realize that the general design of my jobs is EOL since for all those years I've been relying on the fact that one job (post processing, publishing) can see the workspace of a previously run build of another job.

Not having a permanent workspace is a PITA in many regards.

Just git cloning several required repos from scratch for every build is a big waste of resources and time (where previously I simply pulled in a permanent clone).

Etc.

There are more jobs still waiting to be migrated.
Comment 8 Frederic Gurr CLA 2021-03-04 04:45:26 EST
(In reply to Stephan Herrmann from comment #7)
> Finally (after many hours of trial and error) I got one build to reach the
> testing stage, which is a huge success, BUT now I realize that the general
> design of my jobs is EOL since for all those years I've been relying on the
> fact that one job (post processing, publishing) can see the workspace of a
> previously run build of another job.
You should be able to work around that by archiving the relevant parts of your workspace with a "Archive the artifacts" post-build action and use them in your post-processing/publishing jobs.

> Not having a permanent workspace is a PITA in many regards.
> 
> Just git cloning several required repos from scratch for every build is a
> big waste of resources and time (where previously I simply pulled in a
> permanent clone).
Ephemeral workspaces can be a blessing and a curse.
Comment 9 Stephan Herrmann CLA 2021-03-04 04:55:44 EST
(In reply to Frederic Gurr from comment #8)
> (In reply to Stephan Herrmann from comment #7)
> > Finally (after many hours of trial and error) I got one build to reach the
> > testing stage, which is a huge success, BUT now I realize that the general
> > design of my jobs is EOL since for all those years I've been relying on the
> > fact that one job (post processing, publishing) can see the workspace of a
> > previously run build of another job.
> You should be able to work around that by archiving the relevant parts of
> your workspace with a "Archive the artifacts" post-build action and use them
> in your post-processing/publishing jobs.

I'll  try that.

Meanwhile the first test run completed and shows many failures that have not been observed on the old infrastructure - one issue seems to replicate bug 571640, but there are more and of different kinds.

My guess is: addressing those is probably a matter of weeks rather than days, before I can give +1 to shutting down the old infra.
Comment 10 Frederic Gurr CLA 2021-03-04 05:15:14 EST
(In reply to Stephan Herrmann from comment #9)
> Meanwhile the first test run completed and shows many failures that have not
> been observed on the old infrastructure - one issue seems to replicate bug
> 571640, but there are more and of different kinds.
If you can help with providing a minimal test case, it would go a long way.
Comment 11 Stephan Herrmann CLA 2021-03-06 19:55:48 EST
Documenting a new option for debugging the build:

Since I needed to see what arguments were passed into the compiler and since Object Teams uses its own variant of the compiler I added a new system property
  ecj.batch.configure.verbose
When set to true all compiler args are printed to stderr.
Comment 12 Frederic Gurr CLA 2021-03-16 06:52:49 EDT
Any update here?
Comment 13 Stephan Herrmann CLA 2021-03-16 10:45:11 EDT
Yes I can give some update:

OK: The main job buildAndTest has been extended to publish to a staging area to avoid needing to access artifacts from a non-existent workspace. 

* Build success is only blocked by bug 571640 (assumed to be fixable in JDT)

* BUILD TIME has roughly DOUBLED (going from 2 hours to 4 hours)
  - this is not caused by the additional publishing step (it's quick)
  - this combined with impossibility to run tests on demand from a populated workspace
    will significantly slow down some processes.

@Fred: can you comment on execution time?
 - one known factor is inability to re-use git clones across builds,
   but I haven't yet measured the impact of this particular issue
------------------

OK: Promoting from staging to a permanent update site works.

I still need to work on two other channels for publishing:

* Publish our compiler as a standalone jar (no technical issues expected here)

* Publish compiler & runtime via maven. Honestly, this step has never worked automatically,
  but it's on my agenda anyway.
  - do I have to expect building against an empty local maven repo each time?

=> These issues do not require to keep the old HIPP.
------------------

TL;DR: I'm concerned about build times, but other than that nothing is blocking here.
Comment 14 Frederic Gurr CLA 2021-03-16 11:08:49 EDT
(In reply to Stephan Herrmann from comment #13)
> @Fred: can you comment on execution time?
>  - one known factor is inability to re-use git clones across builds,
>    but I haven't yet measured the impact of this particular issue
This is expected. We are still in the process of moving faster build machines to the new cluster. The build times should become shorter again once that is done. 

Ephemeral workspace will have an impact on speed, so does restricted access to resources (compared to the old infra). Once all CI instances are migrated, we will have a closer look on performance, both on the cluster side, but also on the build config side.

> * Publish compiler & runtime via maven. Honestly, this step has never worked
> automatically,
>   but it's on my agenda anyway.
>   - do I have to expect building against an empty local maven repo each time?
For the foreseeable future, yes. Most Maven artifacts are cached on our local Nexus instance though.

> TL;DR: I'm concerned about build times, but other than that nothing is
> blocking here.
So we can retire the old Jenkins instance?
Comment 15 Stephan Herrmann CLA 2021-03-16 18:20:37 EDT
(In reply to Frederic Gurr from comment #14)
> So we can retire the old Jenkins instance?

OK, fine be me.
Comment 16 Stephan Herrmann CLA 2021-03-16 18:29:37 EDT
For posterity I'm preserving here some up-to-date disk usage figures from the old instance:

cached git clones:
242580	git___git_eclipse_org_gitroot_jdt_eclipse_jdt_core_git
88456	git___git_eclipse_org_gitroot_jdt_eclipse_jdt_debug_git
236804	git___git_eclipse_org_gitroot_jdt_eclipse_jdt_ui_git
249228	git___git_eclipse_org_gitroot_objectteams_org_eclipse_objectteams_git
72804	git___git_eclipse_org_gitroot_platform_eclipse_platform_text_git
889872	total

workspace proper:
685636	workspace/testrun/build-root
13920	workspace/testrun/ecj
2148	workspace/testrun/jdtcoremodel_mem_trace.log
24240	workspace/testrun/otdt.jar
912852	workspace/testrun/test-root
26520	workspace/testrun/updateSite
25824	workspace/testrun/updateSiteRepack
26740	workspace/testrun/updateSiteSigned
49188	workspace/testrun/updateSiteTests
1767068	total

single build:
53380	builds/642
Comment 17 Frederic Gurr CLA 2021-03-17 11:14:06 EDT
The old JIPP has been removed.

The Object Teams JIPP on Jiro is now reachable at:
 => https://ci.eclipse.org/objectteams

This concludes the migration.
Comment 18 Stephan Herrmann CLA 2021-03-23 12:32:48 EDT
Recording some before / after comparison:

On old infra buildAndTest Job typically finished in less than 2 hours.

Recent builds on the new infra (with identical build input)

#682 saw lots of JVM crashes -> unusable

#683 took seven hours(!)
  - only a few well-known intermittent test failures

#684 took twelve hours(!!!)
  - hundreds of bogus test failures
    - leak tests reported failure to remove a jar file
    - subsequent tests reported "project x already exists" during project creation
    => did the file system loose ability to delete files / directories?

#685 took six hours
  - tests timed out
  - then org.jenkindsci.remote.protocol.* saw java.nio.channels.ClosedChannelException
  - last words were
ERROR: Step ‘Archive the artifacts’ failed: no workspace for buildAndTest #685
ERROR: Step ‘Publish JUnit test result report’ failed: no workspace for buildAndTest #685
ERROR: centos-7-xd7zf is offline; cannot locate oracle-jdk9-latest

#686 completed within 3 hours, hooray
  - just one intermittent test failure

#687 completed successfully after 3 h 16 m
  - SUCCESS 3 days later.

I repeat: all this builds had the exact same build input (I only improved the processing of crashlogs, which previously created an avalanche of its own - smth that never happened before).


I have no idea if there is any connection between all those break downs, but the unfortunate downside of it all: if this is a resource problem, we enter a vicious circle: the more we see the problem, the more I will contribute to it, because I have to re-run my job again and again.

Hoping for better times ...
Comment 19 Mikaël Barbero CLA 2021-03-23 12:43:50 EDT
Stephan, we certainly have to smooth some rough edges here and there. We are facing some slow down these days due to reasons explained in bug 571952. Hopefully, this will resolve fast. Also, we've already migrated 200+ projects most of those migration were uneventful with little to no issue.

For the sake of completness, how long does it take you to run a build similar to the one running on Jenkins but on your machine. Also, what are the rough specs of your machine (#cores/ghz/ram/disktype)? 

Thanks
Comment 20 Stephan Herrmann CLA 2021-03-23 14:05:37 EDT
Thanks Mikaël

(In reply to Mikaël Barbero from comment #19)
> For the sake of completness, how long does it take you to run a build
> similar to the one running on Jenkins but on your machine. Also, what are
> the rough specs of your machine (#cores/ghz/ram/disktype)? 

frankly I haven't run this locally for ages. I'll try it soonish and report back.