| Summary: | transition to HIPP build of Sim. Release aggregation | ||
|---|---|---|---|
| Product: | Community | Reporter: | David Williams <david_williams> |
| Component: | Cross-Project | Assignee: | David Williams <david_williams> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | CC: | mikael.barbero, mistria, mknauer |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Whiteboard: | |||
|
Description
David Williams
So, one question I have, is if we should still have the "staging" and "maintenance" repositories on "downloads" server. I assume many are "used to" the concept of using "latest-stable" URI's (which, in the case of "CLEAN_BUILD" would be "just like" what would be in "staging". BUT, I am wondering if our version of Hudson (I assume it's routed through recent Jetty or Apache webservers?) is optimized for so much "downloads"? Or, if it would be better to still put builds on the official "downloads server" which presumably is better able to handle lots of traffic? Webmasters, can you advise? My remaining open question to committers, the advantage of this type of setup, with the three stages of building/checking the repo, is really only maximized if the b3aggrcon files "change every time there is a new contribution". That is, committers should change the URL to point to a simple repo, that has only one version their features in it, or, the b3aggrcon files should be change so that *exact* versions are listed. I know some projects do this already, but, not all. Does anyone have any tools/procedures to make this easier, automated? Either way, if we do not do it "right now", I plan to do it after the Mars release, so be thinking about what it takes, and if/when you'd be ready. I don't want to be heavy handed about it, but, will be eventually. For example, one way of "enforcing" this *might* be to use the repo created by the cached BUILD stage, as the "input" for all contributions, for the final CLEAN_BUILD. So, once something is in the cached area, it not be reliably updated, if the b3aggrcon file doesn't change. (Technically, I'm not sure about that statement :) ... but, that's my idea of "heavy handed"). I'll just make a quick note that the "reports" are currently only ran in "CLEAN_BUILD", and the reports are provided "under" the repo, currently under the "buildinfo" directory. The jobs are "set up", so that there could be tests ran in each stage ... and, I believe eventually there will be test ran at each stage, AND the test alone may "fail the build" and not proceed to the next stage until the issue addressed. (For example, even in "cached build", if bundles or features missing "legal files", I'd like to fail the build, and not waste time with a "CLEAN_BUILD" that should not be "deployed". At least "eventually". Not sure if I (we) will have time for that in Mars. So, with that, feedback/suggestions welcome. I think I have security settings correct, so "committers" can build the "VALIDATE" job (if they don't want to sait 15 minutes after changing their Git b3aggrcon file. And, everyone should be able to "see" everything (workspace, etc.). I have added Markus as essentially "co-admin" so there is an "emergency backup" if something is amiss. = = = = Oh, and one last issue, if anyone knows how to solve. Since one job "triggers another", but each have separate workspaces, so, for example, a number of VALIDATE jobs can run, during the long period it takes to do a CLEAN_BUILD, but one result of this setup is that requests for "CLEAN_BUILD" can sort of "stack up" in the queue. And, all we really need to CLEAN_BUILD is the last successful BUILD job. Anyone know of a technique, or Hudson plugin, that basically "throws away" all pending jobs, except the last? = = = = Thanks, Nice change! I already imagine some future Gerrit validation hooks ;) > BUT, I am wondering if our version of Hudson (I assume it's routed through > recent Jetty or Apache webservers?) is optimized for so much "downloads"? > Or, if it would be better to still put builds on the official "downloads > server" which presumably is better able to handle lots of traffic? I don't think Hudson can face so much traffic as download.eclipse.org does. If those URLs are meant to be used widely, it seems better to use download.eclipse.org; if they are more "internal" URL that no-one or very few people will access, it seems fine to keep them on Hudson. > My remaining open question to committers, the advantage of this type of > setup, with the three stages of building/checking the repo, is really only > maximized if the b3aggrcon files "change every time there is a new > contribution". That is, committers should change the URL to point to a > simple repo, that has only one version their features in it, or, the > b3aggrcon files should be change so that *exact* versions are listed. > I know some projects do this already, but, not all. > Does anyone have any tools/procedures to make this easier, automated? Is it equivalent for looking in *.b3aggrcon for all <feature> tags which have either no versionRange, or versionRange not in the form of x.y.z,qualifier pattern? > Either way, if we do not do it "right now", I plan to do it after the Mars > release, so be thinking about what it takes, and if/when you'd be ready. I > don't want to be heavy handed about it, but, will be eventually. For > example, one way of "enforcing" this *might* be to use the repo created by > the cached BUILD stage, as the "input" for all contributions, for the final > CLEAN_BUILD. So, once something is in the cached area, it not be reliably > updated, if the b3aggrcon file doesn't change. (Technically, I'm not sure > about that statement :) ... but, that's my idea of "heavy handed"). For the future, we could think of an automated job to validate that a given Gerrit contribution does respect such constraints, and that would vote "-1" on the review in case it introduce some new floating versions in b3aggrcon files. (In reply to Mickael Istria from comment #5) > Nice change! I already imagine some future Gerrit validation hooks ;) Thanks. > I don't think Hudson can face so much traffic as download.eclipse.org does. > If those URLs are meant to be used widely, it seems better to use > download.eclipse.org; if they are more "internal" URL that no-one or very > few people will access, it seems fine to keep them on Hudson. Yes, I think for now we'll stick with "staging" and "maintenance" ... if nothing else, offers one more choice for "how often we update it" (such as, might do "just once a day", until the final week, or something similar. > > Does anyone have any tools/procedures to make this easier, automated? > > Is it equivalent for looking in *.b3aggrcon for all <feature> tags which > have either no versionRange, or versionRange not in the form of > x.y.z,qualifier pattern? Pretty much. I was trying to think of a way to do it so it had more a feel of "it is to your advantage to do such and such" rather "this is exact format you must use, even if you don't need to". But, I can think of lots of advantages to requiring the exact format ... and then it's a question of how to make that easier for everyone ... some (maven) build tool? Some extension to b3 aggregator editor? > For the future, we could think of an automated job to validate that a given > Gerrit contribution does respect such constraints, and that would vote "-1" > on the review in case it introduce some new floating versions in b3aggrcon > files. Yep. That's one thing to do. To me, there's only the question of "is it the same as specifying a simple repository, in URL, that has only one version of features in it" ... and then, we do, after all, log which features versions were used. It is convenient, though, for adopters, at least, to have the features recorded in the b3aggrcon files AND to use a specific URL, just to provide more redundancy, for sanity checking. Besides a plugin (or script) to cancel all queued jobs, except the last, the other thing I need to do is be able to keep the "repo report" artifacts, for a long time ... even though we don't want to keep "the whole repo" (except for latest success). I think I've seen plugins that provide more "advanced archiving" functions, but need to research and see if that works as expected. (So, if any readers have specific experience with that, any pointers would be appreciated). Here are some other issues I noticed during the M7 crunch ... so I don't forget. Currently the tests are installed (for each type of build) if they have not already been installed. But, this is not good automation, since then when I made some changes to the tests, they did not get installed (since already installed) and required me to "wipe the workspace" to get a fresh install. Will need to better track "which version" (hash?) of tests are installed, to know if changed, then to know when to re-install them. Second, this might be a point in time issue, but the way I set things up, I wanted to have the "tests" with the "build", to allow the possibility of "failing the build" if the tests failed. But, during M7, this meant if there was an error in the tests, or even if there was a small improvement I wanted to make, and re-run the tests, I would have to "redo" the entire build (which, takes a very long time, for the "clean build".) Perhaps I need a "manual" job too, which would be for only re-running the tests? As another "todo" item, I still need to add the post build automation of creating the content.xml.xz file and replacing the p2.index file. Not hard to do manually .... but .... best to automate. (In reply to David Williams from comment #7) > Besides a plugin (or script) to cancel all queued jobs, except the last, > > the other thing I need to do is be able to keep the "repo report" artifacts, > for a long time ... even though we don't want to keep "the whole repo" > (except for latest success). I think I've seen plugins that provide more > "advanced archiving" functions, but need to research and see if that works > as expected. (So, if any readers have specific experience with that, any > pointers would be appreciated). I have just "fixed" this. It works well with my "test jobs" on my local machine, and with limited testing on production machine it does work. We'll see how well, in practice next time there is a big crunch. I had to write my own groovy script to do what we want. Most "plugins" I could find either did not seem to work, or worked by only clearing "whole cache" which we do not want, we only want to clear jobs of the same name. -- First thing I learned, that for simple projects Hudson works automatically, just like we would want it to. So, to find a simple test case, had to find out how to reproduce the "problem". Turns out, it is only with "parameterized builds", and THEN, only if the parameters actually change, from one job to another. Sort of makes sense, and the Hudson team as done a good job of handling this, for the simple cases. -- The solution I came up with runs the groovy script when a long-running job first starts. If there are other jobs (of same name) in the queue, then that currently starting job simply cancel's itself, since subsequent jobs will pick up the same change (only a mild assumption, there). But, once the job starts, it will finish, no matter how many other jobs fill up in the queue. And, the last job in the queue, will of course run, since it would not find any other jobs in the queue (of the same name). One advantage of this approach, is that it let's Hudson do "queue management" how it normally would. That is, we do not have to figure out "which is the next one to run" ... we simply do not run the current one. -- The goovy script I use is in the org.eclipse.simrel.tools project, under a folder named groovyScripts. It is not actually "ran" from there, but must be a "build machine" location where it can always be found, by "master" node. (I did not try to deploy on branch by branch basis, but from what I've read, needs to be in a persistent location). See http://git.eclipse.org/c/simrel/org.eclipse.simrel.tools.git/tree/groovyScripts/clearCache -- I currently use this for both "build" and "build.clean" jobs. I'm closing this bug as "fixed". I did some work last week/weekend that I feel more confident "everything is working correctly". I did, though, open bug 476345 to make note of future improvements. The "flush queue" seems to be working well. I also put in some logic to handle the Gerrit job in a special way, to make sure that we "get" the Gerrit change that triggered the build. Probably easier ways to do that, if I better understood what Hudson was doing when it gets the data. But, also, I added an expression to save the b3aggrcon files that were used in a job as part of the "archives" of a build, So, if nothing else, committers can peek there, and make sure their intended b3aggrcon file was "in the build". I also changed the name of a few jobs. simrel.neon.runaggregator.BUILD to simrel.neon.runaggregator.BUILD_CACHED to help emphasize that it "keeps what it had", so should be relatively fast, if only a few things changed. simrel.neon.runaggregator.CLEAN_BUILD to simrel.neon.runaggregator.BUILD__CLEAN just to get it ordered better. So, while the order is "reversed" on the page, in an ideal world, the jobs would run in order simrel.neon.runaggregator.VALIDATE.gerrit simrel.neon.runaggregator.VALIDATE simrel.neon.runaggregator.BUILD_CACHED simrel.neon.runaggregator.BUILD__CLEAN Here's a brief description of each. == simrel.neon.runaggregator.VALIDATE.gerrit Runs with the HEAD of o.e.simrel.build with just the addition of the gerrit proposed change, without merging that change into main branch. If successful, committers still have to "review" and then "submit" to merge it into main branch. Especially important to use gerrit if making a change to categories, included features, or contact names (since those all require changes to two files, and are a frequent source of build failures that are harder to track down. == simrel.neon.runaggregator.VALIDATE checks every 5 minutes for a change to the contents of o.e.simrel.build, and if change is detected, will build HEAD of that branch. May include more than one change. It "stores" the commit hash it built with, so if successful, that exact same commit hash is used in next step. Committers can "trigger" this .VALIDATE job, if needed, from the Hudson interface. This VALIDATE job checks only the meta data -- confirms that all the version ranges "can be installed together". Does not fetch any artifacts. == simrel.neon.runaggregator.BUILD_CACHED Using the same 'commit' as previous step, builds everything, but does not clean out previous runs, so only new artifacts are fetched (hence, a quick sanity check, on the new stuff). Is only triggered by a successful VALIDATE build. == simrel.neon.runaggregator.BUILD__CLEAN Using the same 'commit' as previous step, builds everything, but wipes everything clean, so it makes sure previously fetched artifacts, can still be retrieved. If there are other BUILD__CLEAN jobs waiting in the queue, then "the current job" is automatically canceled, for efficiency, since those in the queue presumably have the same change. Is only triggered by a successful BUILD_CACHED job. This BUILD_CLEAN job also runs the tests (repo reports) against the repository, if the repository is built successfully. Eventually, we will also fail this build, if certain tests fail. - - - - - - - - Please note any other improvements you'd like to see in bug 476345 . |