Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 476345 - Improve Sim. Release Aggregation builds
Summary: Improve Sim. Release Aggregation builds
Status: RESOLVED FIXED
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: Cross-Project (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: David Williams CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-01 10:56 EDT by David Williams CLA
Modified: 2017-07-18 11:37 EDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Williams CLA 2015-09-01 10:56:03 EDT
As I close out bug 462859, wanted to leave a few notes on what should be improved -- so will consider *this* bug, as an organizing bug, to keep track of the the big picture. 

1. Currently, out own scripts gets the "right" content to build, from Git. There might be ways to let Hudson do more of that work (but, perhaps not). 

2. Regardless of '1', we need to separate "getting the data" and "building the data". Currently it is all in the same repository (o.e.simrel.tools). It is conceivable that some change might require "two runs"; one to update the code that gets the data, and another to get the code that builds the data. Since "getting the data" also "gets the build scripts". 

3. Both conceptually and practically, would be easier if the "tools" and "tests" were installable, say from a p2 repository, so that Hudson only needed to get and manage one repository -- the one with the data (o.e.simrel.build). The most recent version of the tools and tests could be built in one job. And then the job that "did the build" would simply install the right version of the tools, with the right version of the data. 

4. Not sure if this is "funky Hudson/Gerrit" integration, but it is hard to look at some jobs, and see the "changes" were the ones that committers are looking for. Sometimes, "Changes" lists only changes in "tools", for example, not changes in data, and for that, it only lists a Gerrit commit. 

.... 

I think there are separate bugs to "build the tests". Especially "the new tests" that Dennis Hubner has been working on. 

I'm sure there are others. If any one has any suggestions for improvements, make a note of them here.
Comment 1 David Williams CLA 2015-09-01 11:02:52 EDT
Another thing to improve ... I should automate "promote to staging". 

Basically should be done "after a clean build" is successful. In the past, I sometimes did not want to do it after every clean build, since sometimes there are 3 or 4 of those per day ... but, not sure it matters that much, any more.
Comment 2 David Williams CLA 2015-09-07 17:07:44 EDT
For the record, I have set the jobs up so that on a successful CLEAN BUILD, the "promote" job is triggered. (Markus, let me know if this causes issues, such has happens too often ... which, I'm pretty sure won't know, but may, when we get the new faster running tests in place.) 

Plus, I've noticed some odd behavior, which I'm not sure if will be an issue, or not. The triggering worked fine, once. 

But, another time, for mars build 304/305 the promote job seemed to get "hung up" between the two jobs 304, and 305 ... keep in mind, the "clean build" and the "promote" jobs to have to acquire (the same) lock, so only one can run at a time (since they use a shared workspace). 

In this problematic case, CLEAN_BUILD triggered the promote job, but, I guess, by the time it was ready to start, CLEAN_BUILD 305 had started, so the promote job just "spun". (literately, just the  spinning icon on output page, no output, other than that). One the one hand, this may be good. But, on the other, the promote jobs are configured to timeout after 30 minutes, and while current one has been "running" for 2.5 hours, it may "cancel itself" due to time out once it is allowed to run. And, if that's the case, when there are lots of CLEAN BUILDS, one after the other, the "promote" job may never get a chance to run? 

Well -- by now, all are finished, so I can report "actuals". It appears both promote jobs ran ... the one triggered by clean build 304, and the one triggered by clean build 305 ... but, they ran in "quick succession" ... both ran after clean build 305 was finished. I assume they both promoted the "same thing" but would (and did) trigger two EPP builds. Each presumably will be building the same thing. 

So ... something needs to be improved here -- especially since the "30 minute" time out did not work. (Must start timing when the job acquires the lock. 

There's flags about "don't build if downstream/upstream job is building". Maybe one of those will prevent "multiple jobs" from running? But, I assume depends on timing. 

Just remembered, I have a Job Priority plugin installed, and a CLEAN BUILD has priority 90, a "promote" has a priority of 110. 

Hmm, but reading 'help' for the plugin, sounds like that's the right order -- plus the issue is not really with "executors" being full ... more a matter of "locks", I guess?

<quote>
The priority for this job. Priorities are used when all executors are busy to decide which job in the build queue to run next. A job with higher priority is ran before jobs with lower priorities. 
</quote>
Comment 3 David Williams CLA 2015-09-23 17:59:22 EDT
I think there is a bug, too. When doing the Mars.1 respin, with another branch. I thought I would only have to change the branch of the initial "validation job" and then it would pass along that value to the "cached build" and in turn, it pass along the value to the "clean build". But, that "passing along" did not work. 

Not sure if it was due to changing branches? Or if the subsequent jobs are somehow "over writing" what the previous job sends them to build?
Comment 4 David Williams CLA 2017-07-18 11:37:48 EDT
Most, if not all, of these items, were fixed in the "great simplifications of 2016" :) so will mark as fixed.