Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 339416 - Performance benchmark/test job
Summary: Performance benchmark/test job
Status: RESOLVED WORKSFORME
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: CI-Jenkins (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Thanh Ha CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 367238
  Show dependency tree
 
Reported: 2011-03-09 14:25 EST by Denis Roy CLA
Modified: 2013-10-18 16:29 EDT (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Denis Roy CLA 2011-03-09 14:25:07 EST
I'd like to create a benchmark/test job on our Hudson instance that will invoke a bunch of common build actions in the hopes of providing us with some troubleshooting tools should builds begin to fail, or take time.

Ideally, this job would:
a) checkout some random piece of code from Eclipse CVS/SVN and Git
b) fetch dependencies from eclipse.org servers
c) fetch dependencies from one or two remote servers
d) build stuff
e) move build artifacts to /shared
f) work as-is for the next few years

If each step is timestamped, over time we should have an idea of how Hudson performs at the various tasks.  Also, when builds seem to hang, we (webmasters) and you could possibly run this job to help with troubleshooting, and to determine if all systems are working correctly.

At this point, I'm not sure where to start beyond "Create a new job".

Thoughts?
Comment 1 Gunnar Wagenknecht CLA 2011-03-09 14:51:40 EST
(In reply to comment #0)
> Ideally, this job would:
> a) checkout some random piece of code from Eclipse CVS/SVN and Git
[...]
> f) work as-is for the next few years

Hmm ... I see a conflict here. :)

What about monitoring execution times of a set of existing jobs in order gather some stats/metrics? If there is a trend across a few jobs then this could lead to a health indicator?
Comment 2 David Williams CLA 2011-03-09 15:04:52 EST
(In reply to comment #0)

> Thoughts?

I think it is a great idea. IMHO we shouldn't worry too much about the "build stuff" step. Assuming that means "run the compiler and compile some code" since a) that's mostly CPU and File creation and b) would be hard to "work as-is" over the years. I am not saying it is impossible, or a bad idea .. just probably one of the least important things to measure/monitor ... so worry about that last, if at all. 

I'd also suggest a "delete stuff on file system" step in there somewhere specifically to measure and monitor some known quantity, since, in my experience, that sometimes takes an abnormally long time on build.eclipse (or, perhaps I should say NFS?) and seems to differ from time to time, I assume due to network load, or something. 

I guess one issue to settle up front, is how to "drive" the whole thing. A series of shell scripts? Ant files? I think I'd lean toward shell scripts, as easier and cleaner to have a very specific action, that is easily timed, but ... is a little unrealistic to how people normally do it, I guess. But, I think that's ok. I think the best "performance tests" do some fairly isolated operation, so you'd know if that operation started taking longer, you'd already have if narrowed down to a more specific area (e.g. "cvs checkout" or "copy a few large files" or "copy many small files"). 

The alternative approach, for example, would be just to take some "known build", say "build WTP XML Tools" and just rebuild the exact same versions over and over ... but, then if if "takes longer" at some point ... you'd have to start from scratch to figure out how/why (could be cvs, could be file copy, etc.) 

Having shell scripts for specific might also make it easier for you to "capture the statistics" you wanted, and put them in a format you wanted. I see that gather and analysis as one of the harder parts of this effort. 

I can't commit to it right now, but I'd be interested in helping, maybe, a little. We'll see. 

But, thanks for suggesting and being willing. I'm sure it'd help things long term.
Comment 3 Ed Willink CLA 2011-03-09 15:10:15 EST
The https://hudson.eclipse.org/hudson/job/buckminster-mdt-ocl-core-3.1-nightly/ job seems to have demonstrated an ability to malfunction in various ways:

Stall before CVS
Stall during CVS
Stall during mv
Stall during archive artefacts
Stall despite being killed many times; CVS continues regardless

It might be appropriate to replicate MDT/OCL as a single Hello world plugin with the same project dependencies, and then run it when Hudson has collapsed and verify that a Hello World program also collapses.
Comment 4 Denis Roy CLA 2011-03-09 15:26:02 EST
> What about monitoring execution times of a set of existing jobs

The problem is that I don't really have any insight as to what all the existing jobs do.  So if job a usually takes 10 minutes and now takes 30, how does that help me?  Plowing though 200KB of logs is tedious.

(In reply to comment #2)
> I guess one issue to settle up front, is how to "drive" the whole thing. A
> series of shell scripts? Ant files?

I was hoping anything but shell scripts.  I can easily log into the shell and test CVS, the shell Proxy settings etc. but it's not easy for me to test the JAVA_OPTS proxy settings.


(In reply to comment #3)
> The https://hudson.eclipse.org/hudson/job/buckminster-mdt-ocl-core-3.1-nightly/
> job seems to have demonstrated an ability to malfunction in various ways

How do you determine which components are malfunctioning? I see your last builds as "success" and a 754KB log file that makes my head spin.  Regardless, what I'm looking for is the most simplest of jobs that will help me pinpoint any potential problem, rather than having me look at huge logfiles.
Comment 5 Matthias Sohn CLA 2011-03-09 16:12:14 EST
Maybe the monitoring plugin [1] could also help here.

[1] http://wiki.jenkins-ci.org/display/JENKINS/Monitoring
Comment 6 Ed Willink CLA 2011-03-09 16:39:03 EST
Possibly two simpole log tests are avilable.

a) initial stall: log file contains just two lines after 5 minutes.

b) Hudson crash:

This is the current return from https://hudson.eclipse.org/hudson/job/buckminster-mdt-ocl-core-3.1-nightly/302/console demionstrating that Hudspon needs a restart NOW.

ERROR: Failed to archive artifacts: MDT-OCL.*/**, publishroot/**, promote.properties
hudson.util.IOException2: Failed to extract /opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.1-nightly/MDT-OCL.*/**, publishroot/**, promote.properties
	at hudson.FilePath.readFromTar(FilePath.java:1577)
	at hudson.FilePath.copyRecursiveTo(FilePath.java:1491)
	at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:117)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
	at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:603)
	at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:582)
	at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:560)
	at hudson.model.Build$RunnerImpl.post2(Build.java:156)
	at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:529)
	at hudson.model.Run.run(Run.java:1361)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:140)
Caused by: java.io.IOException
	at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:175)
	at hudson.util.HeadBufferingStream.read(HeadBufferingStream.java:61)
	at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:221)
	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141)
	at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:92)
	at org.apache.tools.tar.TarBuffer.readBlock(TarBuffer.java:257)
	at org.apache.tools.tar.TarBuffer.readRecord(TarBuffer.java:223)
	at hudson.org.apache.tools.tar.TarInputStream.read(TarInputStream.java:345)
	at java.io.FilterInputStream.read(FilterInputStream.java:90)
	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1025)
	at org.apache.commons.io.IOUtils.copy(IOUtils.java:999)
	at hudson.util.IOUtils.copy(IOUtils.java:33)
	at hudson.FilePath.readFromTar(FilePath.java:1565)
	... 12 more


a grep for "at hudson." is probably a pretty strong smell.
Comment 7 Denis Roy CLA 2013-02-19 10:11:38 EST
I was discussing this with Thanh last week.
Comment 8 Denis Roy CLA 2013-10-18 16:29:07 EDT
I don't think we need this anymore.  As more projects move to their individual hipp instances, the performance discrepancies can be easily observed at the OS level.