Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 280642 - Improve recovery from infinitely long signing queue
Summary: Improve recovery from infinitely long signing queue
Status: RESOLVED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Dash Athena (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Common Build Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 254205 282593
Blocks:
  Show dependency tree
 
Reported: 2009-06-17 12:58 EDT by Nick Boldt CLA
Modified: 2012-01-30 11:31 EST (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Boldt CLA 2009-06-17 12:58:07 EDT
On June 5, the VE integration build ran clean and returned signed jars. The whole build, including compilation, packing, signing & testing, took 14 minutes. The time waiting for the signer to complete is about 4 minutes

https://build.eclipse.org/hudson/job/cbi-ve-1.4-integration-Ganymede/81/consoleFull (search for "Signing VE-Master" then "unpackUpdateJarsAndRepack")

Since then, builds are timing out after waiting over 3hrs for the signer to return a zip w/ signed jars.

https://build.eclipse.org/hudson/job/cbi-ve-1.4-integration-Ganymede/83/console

I can put in place code that will

a) recover faster from signing, eg., set a 1hr timeout
b) should signing fail, continue the build as if signing was never done

But the real question is why does this process take 4 minutes one day and over 3hrs the next?
Comment 1 Denis Roy CLA 2009-06-17 13:11:14 EDT
> But the real question is why does this process take 4 minutes one day and over
> 3hrs the next?

My best educated guess is that your build process is looping over and over for some reason, as shown by this repeated message?

/opt/users/hudsonbuild/.hudson/jobs/cbi-ve-1.4-integration-Ganymede/workspace/build/org.eclipse.dash.common.releng/tools/scripts/buildAllHelper.xml:5: java.lang.StackOverflowError


Comment 2 Nick Boldt CLA 2009-06-17 20:11:55 EDT
(In reply to comment #1)
> > But the real question is why does this process take 4 minutes one day and over
> > 3hrs the next?
> 
> My best educated guess is that your build process is looping over and over for
> some reason, as shown by this repeated message?
> 
> /opt/users/hudsonbuild/.hudson/jobs/cbi-ve-1.4-integration-Ganymede/workspace/build/org.eclipse.dash.common.releng/tools/scripts/buildAllHelper.xml:5:
> java.lang.StackOverflowError
> 

Well, yeah. It's polling the server every 2 minutes looking for the zip with signed jars. At some point the recursive looping kills the build (at about 3h20min).

The issue is not that the code is bad/stupid, it's that the signing process takes 4 mins one day and more than 3hrs the next.

Should I post this question on stackoverflow.com instead? ;P
Comment 3 Denis Roy CLA 2009-06-18 10:06:25 EDT
> But the real question is why does this process take 4 minutes one day and over
> 3hrs the next?

Why is it that doing the QEW at 1:00am takes 15 minutes, but at 6:30 takes 3 hours?  We're all sharing a bunch of servers, what can I say...  Feel free to cat /proc/loadavg before signing to determine of the build server is in any position to sign something quickly at any given moment.  Or don't sign while you're building another project and populating searchcvs  =)
Comment 4 Nick Boldt CLA 2009-06-18 11:51:22 EDT
(In reply to comment #3)
> > But the real question is why does this process take 4 minutes one day and over
> > 3hrs the next?
> Why is it that doing the QEW at 1:00am takes 15 minutes, but at 6:30 takes 3
> hours?  We're all sharing a bunch of servers, what can I say...  Feel free to
> cat /proc/loadavg before signing to determine of the build server is in any
> position to sign something quickly at any given moment.  Or don't sign while
> you're building another project and populating searchcvs  =)

Can we change the process priority on signing and on searchcvs so the former is high (an ambulance or police car on the QEW) and the latter is low (a minivan or Smart car)?

Comment 5 Denis Roy CLA 2009-06-18 13:18:39 EDT
> Can we change the process priority on signing and on searchcvs so the former is
> high (an ambulance or police car on the QEW) and the latter is low (a minivan
> or Smart car)?

That is already done; however, the bottleneck here is not the CPU -- it's the disk access. There is no way for me to throttle how fast a process uses disk resources.  Building, signing, and crawling CVS are all disk-intensive processes.

Comment 6 Nick Boldt CLA 2009-06-18 16:12:30 EDT
I guess this depends on me now doing some work to fix up the script to better recover from situations where signing will never complete (or will take WAY too long). See also bug 254205. Moving to Athena.
Comment 7 Nick Boldt CLA 2009-07-15 00:15:32 EDT
Signing will now do exactly 15 checks at 180 second intervals. For GEF I build, this meant exactly ONE check.

https://build.eclipse.org/hudson/view/Athena%20CBI/job/cbi-gef-3.5.x-nightly/92/

In future, could look at parameterizing the # of checks and the intervals; for now, these properties remain as unexposed values which could possibly be set in build.properties:

<property name="signed.zip.check.interval.seconds" value="180" />
<property name="signed.zip.check.limit" value="15" />

Closing.