Community
Participate
Working Groups
On June 5, the VE integration build ran clean and returned signed jars. The whole build, including compilation, packing, signing & testing, took 14 minutes. The time waiting for the signer to complete is about 4 minutes https://build.eclipse.org/hudson/job/cbi-ve-1.4-integration-Ganymede/81/consoleFull (search for "Signing VE-Master" then "unpackUpdateJarsAndRepack") Since then, builds are timing out after waiting over 3hrs for the signer to return a zip w/ signed jars. https://build.eclipse.org/hudson/job/cbi-ve-1.4-integration-Ganymede/83/console I can put in place code that will a) recover faster from signing, eg., set a 1hr timeout b) should signing fail, continue the build as if signing was never done But the real question is why does this process take 4 minutes one day and over 3hrs the next?
> But the real question is why does this process take 4 minutes one day and over > 3hrs the next? My best educated guess is that your build process is looping over and over for some reason, as shown by this repeated message? /opt/users/hudsonbuild/.hudson/jobs/cbi-ve-1.4-integration-Ganymede/workspace/build/org.eclipse.dash.common.releng/tools/scripts/buildAllHelper.xml:5: java.lang.StackOverflowError
(In reply to comment #1) > > But the real question is why does this process take 4 minutes one day and over > > 3hrs the next? > > My best educated guess is that your build process is looping over and over for > some reason, as shown by this repeated message? > > /opt/users/hudsonbuild/.hudson/jobs/cbi-ve-1.4-integration-Ganymede/workspace/build/org.eclipse.dash.common.releng/tools/scripts/buildAllHelper.xml:5: > java.lang.StackOverflowError > Well, yeah. It's polling the server every 2 minutes looking for the zip with signed jars. At some point the recursive looping kills the build (at about 3h20min). The issue is not that the code is bad/stupid, it's that the signing process takes 4 mins one day and more than 3hrs the next. Should I post this question on stackoverflow.com instead? ;P
> But the real question is why does this process take 4 minutes one day and over > 3hrs the next? Why is it that doing the QEW at 1:00am takes 15 minutes, but at 6:30 takes 3 hours? We're all sharing a bunch of servers, what can I say... Feel free to cat /proc/loadavg before signing to determine of the build server is in any position to sign something quickly at any given moment. Or don't sign while you're building another project and populating searchcvs =)
(In reply to comment #3) > > But the real question is why does this process take 4 minutes one day and over > > 3hrs the next? > Why is it that doing the QEW at 1:00am takes 15 minutes, but at 6:30 takes 3 > hours? We're all sharing a bunch of servers, what can I say... Feel free to > cat /proc/loadavg before signing to determine of the build server is in any > position to sign something quickly at any given moment. Or don't sign while > you're building another project and populating searchcvs =) Can we change the process priority on signing and on searchcvs so the former is high (an ambulance or police car on the QEW) and the latter is low (a minivan or Smart car)?
> Can we change the process priority on signing and on searchcvs so the former is > high (an ambulance or police car on the QEW) and the latter is low (a minivan > or Smart car)? That is already done; however, the bottleneck here is not the CPU -- it's the disk access. There is no way for me to throttle how fast a process uses disk resources. Building, signing, and crawling CVS are all disk-intensive processes.
I guess this depends on me now doing some work to fix up the script to better recover from situations where signing will never complete (or will take WAY too long). See also bug 254205. Moving to Athena.
Signing will now do exactly 15 checks at 180 second intervals. For GEF I build, this meant exactly ONE check. https://build.eclipse.org/hudson/view/Athena%20CBI/job/cbi-gef-3.5.x-nightly/92/ In future, could look at parameterizing the # of checks and the intervals; for now, these properties remain as unexposed values which could possibly be set in build.properties: <property name="signed.zip.check.interval.seconds" value="180" /> <property name="signed.zip.check.limit" value="15" /> Closing.