| Summary: | Make Gerrit run a sub-set of our tests suites when validating a change | ||
|---|---|---|---|
| Product: | [Modeling] Sirius | Reporter: | Pierre-Charles David <pierre-charles.david> |
| Component: | Core | Assignee: | Pierre-Charles David <pierre-charles.david> |
| Status: | CLOSED FIXED | QA Contact: | Pierre-Charles David <pierre-charles.david> |
| Severity: | normal | ||
| Priority: | P1 | CC: | florian.barbin |
| Version: | 2.0.0 | Keywords: | triaged |
| Target Milestone: | 3.0.0M7 | ||
| Hardware: | All | ||
| OS: | All | ||
| See Also: |
https://bugs.eclipse.org/bugs/show_bug.cgi?id=445371 https://git.eclipse.org/r/48925 https://git.eclipse.org/c/sirius/org.eclipse.sirius.git/commit/?id=e61f8736172eb84d66b6016eaa5a44a53fe49bc3 https://git.eclipse.org/r/49181 https://git.eclipse.org/c/sirius/org.eclipse.sirius.git/commit/?id=a74170d807cbed69ad82ea001558ee9291f9a9a0 |
||
| Whiteboard: | |||
|
Description
Pierre-Charles David
The new test suites with the corresponding maven profiles have been created: http://git.eclipse.org/c/sirius/org.eclipse.sirius.git/commit/?id=6fb1939a347798fe9f2ca6c3e090568e90fc5c72 After a few mistakes (e.g. forgot to enable Xvnc, then forgot to start a window manager...), this is starting to work, on a small subset of the test suites. See https://hudson.eclipse.org/sirius/view/gerrit/job/sirius.gerrit/3199/PLATFORM=luna/consoleText: Running org.eclipse.sirius.tests.suite.GerritJUnitSuite Tests run: 90, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.225 sec Running org.eclipse.sirius.tests.suite.tree.AllSiriusTestSuite [...] Tests run: 49, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.673 sec Running org.eclipse.sirius.tests.swtbot.suite.GerritSWTBotSuite [...] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 51.168 sec Running org.eclipse.sirius.tests.swtbot.suite.GerritSequenceSWTBotSuite [...] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 57.343 sec More tests added by commit a42b85fa1d51b9935757e960aadaf441c97cdace. Complete Gerrit validation of the change that added these took 27 minutes, but the current scope does not include any non-sequence SWTBot tests, and in the JUnit suite only the standalone subsets (which take less than 5 seconds) are included. Given how the Gerrit job is currently structured, with no parallelism, we will very quickly attain the initial budget of about 1h. After that the options are: 1. Leave the situation as is, with only a relatively small subset of the tests run by Gerrit. It's still better than the situation before. 2. Add more tests, at the cost of longer feedback from Gerrit. 3. Add more tests but invest the time required to make them run faster. 4. Restructure the Gerrit jobs to run the separate suites in parallel. Options 3 and 4 are non-exclusive, and ideally we should try to do both. This gerrit patch adds more SWTBot: https://git.eclipse.org/r/#/c/38850/4 I selected quick and reliable tests to add. The last gerrit took 31min. That lets time to add new JUnit and serveral other SWTBot. Currently the JUnit suite is the one with the less tests run by Gerrit. On the Eclipse HIPP, it takes about 39 minutes, but: * Two tests fail systematically on the HIPP: BorderMarginTest.testAutoSize and DiagramMigrationTestCampaign10.testAllCustomisationsKeeped[0]. We've already seen these two fail systematically on some machines and pass realiably on others, with no hint on the actual reasons (maybe a difference in the system fonts installed?). * Two tests (AcceleoMTInterpreterOnPackageImportTests and SiriusLayoutDataManagerForSemanticElementsApplyWithPredefinedDataTest) seem responsible for a disproportionate amount of the total time (resp. 12 minutes and 9 minutes). I'll move all the JUnit tests except the 4 mentioned above into the Gerrit JUnit suite, and this should give use something close to 1h of Gerrit-triggered validation tests. We'll see after that if we cann add some more of the SWTBot ones or if we need to remove a few of the JUnit. (In reply to Pierre-Charles David from comment #3) > 4. Restructure the Gerrit jobs to run the separate suites in parallel. > > Options 3 and 4 are non-exclusive, and ideally we should try to do both. For option 4, it might be possible to use https://wiki.jenkins-ci.org/display/JENKINS/Parameterized+Trigger+Plugin to launch sub-jobs in parallel one for each suite. http://strongspace.com/rtyler/public/gerrit-jenkins-notes.pdf might also contain some hints and tips. Just an update: we have switched to a matrix job whith two dimensions: PLATFORM(juno,kepler,luna)×SUITE(gerrit-junit,gerrit-sequence,gerrit-swtbot). The tests are only executed when building for Luna for now, in parallel on 3 different slots/slaves. For Juno and Kepler, we use the "gerrit-junit" SUITE to perform a simple build, but do not do anything for the two other suites (there is no point to re-build the same thing 3 times). We also now publish the tests results in a form that Hudson can present properly. For the matrix elements where we do not actually execute any tests we publish an empty test report to make Hudson happy (otherwise it considers it as a failure). The shell code which does this different behavior depending on the branch, the platform, and the suite is starting to be a little complex. It should probably be moved into the repo itself, and the job could simply fetch it with curl and execute it. At least it would be properly versioned. We also increased the number of executors to 9. This corresponds to the number of jobs launched in parallel by the sirius.gerrit matrix, even though 4 of the concrete jobs will do nothing and return in just a few seconds. With all this, we are down to about 25minutes (from 1h before) to get the "Verified" vote on a push to Gerrit. The individual suites time are: * junit: 23min. This includes almost all the JUnit tests we have (see comment 5). * swtbot-sequence: 24min, with 140 tests on 440 available. * swtbot: 15min, with only 242 tests on 1319 available. 25 minutes looks like a sustainable feedback time for now. We can start to add more tests to the Gerrit SWTBot suite until it reaches runtimes similar to the others. I'm tempted to close this, as the system mostly works and it seems from now on it will only need small adjustments, but we still only run a relatively small subset of our complete suites, so moving to M7 instead: if time permits, we'll have one more look at what can be done to speed up the tests enough to include more tests in the Gerrit suites. The gerrit-verify.sh script has proven to be unreliable, with builds passing green even with Tycho/p2 crashes during target platform resolutions for example.
Trying to handle all the cases with a single job is too complex and fragile. The sirius.gerrit job is being retired and replaced with two, more focused jobs:
* sirius.gerrit.build: a simple matrix job on PLATFORM={juno×kepler×luna×mars}, which only builds the code (Core and Tests) on all the platforms we support.
* sirius.gerrit.tests: a matrix job on PLATFORM={luna×mars} and SUITE={gerrit-junit,gerrit-swtbot,gerri-sequence} which builds and executes the "Gerrit Tests Suites" only for the current and next reference platform.
While not perfect, the current solution works fine for now. Further improvements in feedback speed, tests coverage and additional checks (e.g. CheckStyle) will be handled separately. The actual content of the suites executed by Gerrit will evolve over time, but the overall organization works fine and have already proved its value. New Gerrit change created: https://git.eclipse.org/r/48925 Gerrit change https://git.eclipse.org/r/48925 was merged to [v2.0.x]. Commit: http://git.eclipse.org/c/sirius/org.eclipse.sirius.git/commit/?id=e61f8736172eb84d66b6016eaa5a44a53fe49bc3 New Gerrit change created: https://git.eclipse.org/r/49181 Gerrit change https://git.eclipse.org/r/49181 was merged to [v2.0.x]. Commit: http://git.eclipse.org/c/sirius/org.eclipse.sirius.git/commit/?id=a74170d807cbed69ad82ea001558ee9291f9a9a0 Available in Sirius 3.0.0. See https://wiki.eclipse.org/Sirius/3.0.0. |