Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 312787

Summary: Can signing processes be moved or added to new build2 hardware?
Product: Community Reporter: David Williams <david_williams>
Component: ServersAssignee: Eclipse Webmaster <webmaster>
Status: RESOLVED FIXED QA Contact:
Severity: enhancement    
Priority: P3 CC: aniefer, john.arthorne, kim.moir, nicolas.bros, overholt, stepper
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: All   
See Also: https://bugs.eclipse.org/bugs/show_bug.cgi?id=313722
Whiteboard:

Description David Williams CLA 2010-05-13 09:59:18 EDT
I suspect you've already thought of this ... but, "signing" adds significantly to our builds. * Sometimes, I suspect, simply because some other large build is in the queue. But, seems it is a bottle neck, in either case. 

Can the process take advantage of the (currently) idle hardware? 

Hopefully transparently to those of us building on 'build'. 

* We had one case of a 30 minute build taking 3 hours when signing turned on. But more routinely it at least doubles our build time ... a normally 4-5 hour build/test cycle taking 8, 10, or even 12 hours, when signing turned on.
Comment 1 Denis Roy CLA 2010-05-13 14:18:43 EDT
Before we all jump ships, I think we should wait a bit.  Also, I don't want to touch the signing process until Helios is out the door.
Comment 2 David Williams CLA 2010-05-13 14:36:30 EDT
Sounds quite sane. 

Do those signing time "waits" we encounter sounds typical? problematic? Or what you would expect on busy server? I haven't seen build.eclipse.org pegged at average load of 20 to 30 lately ... more the ordinary overload of 5 to 10 ... so maybe we'll limp through.
Comment 3 Denis Roy CLA 2010-05-13 14:45:01 EDT
Without a doubt signing is computationally intensive, as well as I/O since it unpacks, processes, signs, repacks files.  Also, jarsigner doesn't scale beyond one CPU, so signing can only be as fast as the fastest CPU core we can get.  On build, we can run multiple signing queues in parallel to take advantage of the multiple CPUs.

In the end, I think signing will be much faster on build2 since the CPU cores are much faster there.  Perhaps a better option would be to host a 4-CPU virtual machine on the idle build3 and use it to perform signing exclusively.
Comment 4 David Williams CLA 2010-05-13 21:09:37 EDT
> [...] Perhaps a better option would be to host a 4-CPU
> virtual machine on the idle build3 and use it to perform signing exclusively.

exclusively for webtools, right? (ha ha) 

But, seriously, I think a good idea to explore. It is a known current need that is a bottle neck. The "windows test machine" use for that machine is something new. While important, and useful, seems to me the signing bottle neck is an actual problem that needs to be solved.
Comment 5 Denis Roy CLA 2010-05-18 08:25:24 EDT
I tried to run the signing on build2 this morning, but invoking the jarprocessor throws an error:

/shared/common/jdk-1.6.0_10/bin/java /home/admin/jarprocessor.jar
Exception in thread "main" java.lang.NoClassDefFoundError: /home/admin/jarprocessor/jar
Caused by: java.lang.ClassNotFoundException: .home.admin.jarprocessor.jar
        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
Could not find the main class: /home/admin/jarprocessor.jar.  Program will exit.

Any ideas?
Comment 6 Denis Roy CLA 2010-05-18 08:31:24 EDT
> Any ideas?

Invoking with java -jar (jarname) seems to work.
Comment 7 David Williams CLA 2010-05-18 08:48:15 EDT
(In reply to comment #6)
> > Any ideas?
> 
> Invoking with java -jar (jarname) seems to work.

I've only ever used -jar. Glad that worked.
Comment 8 Denis Roy CLA 2010-05-18 08:55:42 EDT
I've got it working, but I have a snag wrt signing Hudson artifacts.  Since they are placed in the hudsonbuild user directory locally, build2 cannot access files in build, and vice-versa.

ls: cannot access /opt/users/hudsonbuild/.hudson/jobs/buckminster-nightly/workspace/result/signing/site_609787677.zip: No such file or directory

From Build2 I can't access that job on build.
Comment 9 David Williams CLA 2010-05-18 09:07:10 EDT
(In reply to comment #8)
> I've got it working, but I have a snag wrt signing Hudson artifacts.  Since
> they are placed in the hudsonbuild user directory locally, build2 cannot access
> files in build, and vice-versa.
> 

Do you mean this specific to hudson? I'm not familiar with signing there. In webtools, we put our files in 
/opt/public/download-staging.priv/webtools 
to be signed (and we get them from there). 

Is that what you are trying to make "transparent" (and you'd put to and from 
/opt/public/download-staging.priv/webtools 
and would have some problem as your hudson example? 

Thanks for looking into this.
Comment 10 Denis Roy CLA 2010-05-18 10:36:29 EDT
*** Bug 313364 has been marked as a duplicate of this bug. ***
Comment 11 Denis Roy CLA 2010-05-18 12:01:47 EDT
Signing is taking forever on build2.

On build, I see this:
2010-05-18 11:59:13 SIGNER(31098): Finished signing (some jar)
Adding (stuff)
Processing (stuff)
Extracting (stuff)
Running Repack Sign on (some jar)
2010-05-18 11:59:20 SIGNER(31098): asked to sign

jarprocessor.jar takes about 7 seconds to 'process' the jar after signing.



On build2, the same processing takes about 1m30s to 2m.  Could the 'java' VM I'm using make such a difference?
Comment 12 David Williams CLA 2010-05-18 12:53:40 EDT
I assume exact same jar? And, no ... the vm itself would probably not make _that_ big a difference ... but 64 bit PPC might ... that architecture should be good for something, eh? 

But ... I've no idea.
Comment 13 David Williams CLA 2010-05-18 12:54:49 EDT
John might know?
Comment 14 John Arthorne CLA 2010-05-18 14:40:16 EDT
(In reply to comment #13)
> John might know?

CCing Andrew who is the expert on the jarprocessor. I/O is certainly the big factor here, so if the machines have different disk speeds or different latency to a network disk that could make a big difference.
Comment 15 Denis Roy CLA 2010-05-18 14:45:59 EDT
I was wrong -- jarprocessor.jar is not at fault.  Invoking jarsigner directly takes forever.  My guess is that the wait is with the tsa and I have a good idea what the problem may be -- there is no reverse DNS entry for build2.
Comment 16 Andrew Niefer CLA 2010-05-18 16:47:34 EDT
The JarProcessor itself is very much I/O bound.  Individual steps will have more cpu requirements.  If someone wanted to invest effort into making this more performant, here are some places to look:

1) Given a zip file full of jars, we extract a jar, process it, and add it to a new zip.  ->  This is quite linear and could be made parallel

2) For a given jar, we must 
   a) extract it from its container (zip or jar) (copy #1 to disk)
   b) call the pack200 tool on it, (copy #2 to disk)
   c) call unpack (copy #3 to disk)
   d) call signer (done inplace, no new file)
   e) normalize it (copy #4 to disk)

The normalize step was necessary because all of these tools left the resulting jar in differing zip formats wrt the crc, size, etc information for individual zip entries, this must be "normalized" in order to support nested packed jars.

I can imagine this being very much improved if we could do each of pack, sign & unpack programmatically with input/output streams instead of execing separate tools to do the job.  There were memory leak issues around the java API for pack/unpack, there is bug 307518.  Programmatic signing has never been investigated, but I found http://onjava.com/pub/a/onjava/2001/04/12/signing_jar.html?page=2
Comment 17 Denis Roy CLA 2010-05-19 08:17:52 EDT
> I was wrong -- jarprocessor.jar is not at fault.  Invoking jarsigner directly
> takes forever.  My guess is that the wait is with the tsa and I have a good
> idea what the problem may be -- there is no reverse DNS entry for build2.

My hunch was dead on.  With a reverse DNS entry for the build2 server, the signing of a single 24K JAR takes 3 seconds instead of 2~4 minutes.

I guess before issuing a timestamp, the remote server attempts to do a reverse lookup on our IP.
Comment 18 Denis Roy CLA 2010-05-19 13:21:46 EDT
> My hunch was dead on.

*sigh*  no it wasn't.  I must have screwed up my test for some reason, since that didn't fix the problem.

What _did_ fix it was downloading the IBM JDK for x86_64 and using it to sign.  Consistently, the Sun jarsigner would take 1m30s+ if I used the timestamp, and 2 seconds if not.  The IBM jarsigner takes 3 seconds to sign if I use the timestamp.

I'm testing signing on build2 now...
Comment 19 Denis Roy CLA 2010-05-20 08:19:19 EDT
I have moved signing to build2.  This also fixed the case where build could not sign the builds on build2.
Comment 20 David Williams CLA 2010-05-20 13:51:47 EDT
This is very cool, Denis. That you were able to do this at such a busy time. 

I don't keep accurate enough data to quantitatively compare, but I'm sure will be faster and therefore help relieve the build bottle neck. 

I assume this is all transparent to us users of the service ... we put it where we always do, and your scripts can sign it using whatever cpu they want to? (i.e. nothing we have to do to get the benefit?). 

Again, greatest thanks.
Comment 21 Kim Moir CLA 2010-05-20 13:53:30 EDT
>>I have moved signing to build2.

Thanks!!!!  Our build is even faster today with this change.  Our 8am build was ready at 11:20 today.  It used to take until 1pm or 2pm to complete before the NFS and signing changes.
Comment 22 Denis Roy CLA 2010-05-20 15:01:02 EDT
Yes, this is totally transparent.  My hard work leads to huge payoffs for you, the beloved committer.

Oh, the debug log file has moved to a location on NFS for easy(er) access:
/home/data/httpd/download-staging.priv/arch/signer.log
Comment 23 Kim Moir CLA 2010-05-20 15:14:12 EDT
Could you update the message that you see when you invoke the signing process vai the command line to point to the new log /home/data/httpd/download-staging.priv/arch/signer.log.  The current message still refers to the old one.

We beloved committers are pretty burned out at this stage of the release and my memory retention is shot right now :-)
Comment 24 Denis Roy CLA 2010-05-20 15:24:56 EDT
I've updated the message about the log.
Comment 25 Denis Roy CLA 2010-05-21 15:50:25 EDT
> Thanks!!!!  Our build is even faster today with this change.  Our 8am build was
> ready at 11:20 today.  It used to take until 1pm or 2pm to complete before the
> NFS and signing changes.

Great... you can report this back to the folks who authorized the hardware donation to prove it's being well used (and that they should send more) :)
Comment 26 Kim Moir CLA 2010-05-21 15:58:32 EDT
I plan to do that :-)
Comment 27 Eike Stepper CLA 2010-05-25 06:51:23 EDT
My build still fails with this:

     [java] java.io.IOException: java.io.IOException: /usr/bin/sign: cannot execute
     [java] 	at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
     [java] 	at java.lang.ProcessImpl.start(ProcessImpl.java:65)
     [java] 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
     [java] 	at java.lang.Runtime.exec(Runtime.java:591)
     [java] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     [java] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
     [java] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     [java] 	at java.lang.reflect.Method.invoke(Method.java:592)
     [java] 	at org.apache.tools.ant.taskdefs.Execute$Java13CommandLauncher.exec(Execute.java:832)
     [java] 	at org.apache.tools.ant.taskdefs.Execute.launch(Execute.java:447)
     [java] 	at org.apache.tools.ant.taskdefs.Execute.execute(Execute.java:461)
     [java] 	at org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:621)
     [java] 	at org.apache.tools.ant.taskdefs.ExecTask.runExec(ExecTask.java:662)
     [java] 	at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:487)
     [java] 	at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
     [java] 	at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
     [java] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     [java] 	at java.lang.reflect.Method.invoke(Method.java:592)
     [java] 	at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
     [java] 	at org.apache.tools.ant.Task.perform(Task.java:348)
     [java] 	at org.apache.tools.ant.Target.execute(Target.java:357)
     [java] 	at org.apache.tools.ant.Target.performTasks(Target.java:385)
     [java] 	at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1337)
     [java] 	at org.apache.tools.ant.Project.executeTarget(Project.java:1306)
     [java] 	at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
     [java] 	at org.eclipse.ant.internal.core.ant.EclipseDefaultExecutor.executeTargets(EclipseDefaultExecutor.java:32)
     [java] 	at org.apache.tools.ant.Project.executeTargets(Project.java:1189)
     [java] 	at org.eclipse.ant.internal.core.ant.InternalAntRunner.run(InternalAntRunner.java:662)
     [java] 	at org.eclipse.ant.internal.core.ant.InternalAntRunner.run(InternalAntRunner.java:495)
     [java] 	at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
     [java] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     [java] 	at java.lang.reflect.Method.invoke(Method.java:592)
     [java] 	at org.eclipse.buckminster.ant.AntRunner.run(AntRunner.java:318)
     [java] 	at org.eclipse.buckminster.ant.actor.AntActor.internalPerform(AntActor.java:252)
     [java] 	at org.eclipse.buckminster.core.actor.AbstractActor.perform(AbstractActor.java:186)
     [java] 	at org.eclipse.buckminster.core.internal.actor.PerformManager$DirectActionInvocation.execute(PerformManager.java:143)
     [java] 	at org.eclipse.buckminster.core.internal.actor.PerformManager.internalPerform(PerformManager.java:454)
     [java] 	at org.eclipse.buckminster.core.internal.actor.PerformManager.perform(PerformManager.java:293)
     [java] 	at org.eclipse.buckminster.core.internal.actor.PerformManager.perform(PerformManager.java:305)
     [java] 	at org.eclipse.buckminster.core.commands.Perform.internalRun(Perform.java:108)
     [java] 	at org.eclipse.buckminster.core.commands.WorkspaceCommand.run(WorkspaceCommand.java:91)
     [java] 	at org.eclipse.buckminster.cmdline.AbstractCommand.basicRun(AbstractCommand.java:194)
     [java] 	at org.eclipse.buckminster.cmdline.Headless.run(Headless.java:317)
     [java] 	at org.eclipse.buckminster.cmdline.Headless.run(Headless.java:136)
     [java] 	at org.eclipse.buckminster.cmdline.Headless.start(Headless.java:156)
     [java] 	at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
     [java] 	at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
     [java] 	at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
     [java] 	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:369)
     [java] 	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179)
     [java] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     [java] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
     [java] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     [java] 	at java.lang.reflect.Method.invoke(Method.java:592)
     [java] 	at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:619)
     [java] 	at org.eclipse.equinox.launcher.Main.basicRun(Main.java:574)
     [java] 	at org.eclipse.equinox.launcher.Main.run(Main.java:1407)
     [java] 	at org.eclipse.equinox.launcher.Main.main(Main.java:1383)

Am I supposed to configure something differently or is it still a bug?
Comment 28 Nicolas Bros CLA 2010-05-25 07:43:36 EDT
(In reply to comment #27)
> My build still fails with this:
> 
>      [java] java.io.IOException: java.io.IOException: /usr/bin/sign: cannot
> execute

I made a bug for this: "Bug 313722 - signing fails on build2".
Comment 29 Eike Stepper CLA 2010-05-25 09:51:46 EDT
Thank you Nicolas.