| Summary: | Performance benchmark/test job | ||
|---|---|---|---|
| Product: | Community | Reporter: | Denis Roy <denis.roy> |
| Component: | CI-Jenkins | Assignee: | Thanh Ha <thanh.ha> |
| Status: | RESOLVED WORKSFORME | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | CC: | david_williams, denis.roy, ed, gunnar, matthias.sohn, webmaster |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Bug Depends on: | |||
| Bug Blocks: | 367238 | ||
|
Description
Denis Roy
(In reply to comment #0) > Ideally, this job would: > a) checkout some random piece of code from Eclipse CVS/SVN and Git [...] > f) work as-is for the next few years Hmm ... I see a conflict here. :) What about monitoring execution times of a set of existing jobs in order gather some stats/metrics? If there is a trend across a few jobs then this could lead to a health indicator? (In reply to comment #0) > Thoughts? I think it is a great idea. IMHO we shouldn't worry too much about the "build stuff" step. Assuming that means "run the compiler and compile some code" since a) that's mostly CPU and File creation and b) would be hard to "work as-is" over the years. I am not saying it is impossible, or a bad idea .. just probably one of the least important things to measure/monitor ... so worry about that last, if at all. I'd also suggest a "delete stuff on file system" step in there somewhere specifically to measure and monitor some known quantity, since, in my experience, that sometimes takes an abnormally long time on build.eclipse (or, perhaps I should say NFS?) and seems to differ from time to time, I assume due to network load, or something. I guess one issue to settle up front, is how to "drive" the whole thing. A series of shell scripts? Ant files? I think I'd lean toward shell scripts, as easier and cleaner to have a very specific action, that is easily timed, but ... is a little unrealistic to how people normally do it, I guess. But, I think that's ok. I think the best "performance tests" do some fairly isolated operation, so you'd know if that operation started taking longer, you'd already have if narrowed down to a more specific area (e.g. "cvs checkout" or "copy a few large files" or "copy many small files"). The alternative approach, for example, would be just to take some "known build", say "build WTP XML Tools" and just rebuild the exact same versions over and over ... but, then if if "takes longer" at some point ... you'd have to start from scratch to figure out how/why (could be cvs, could be file copy, etc.) Having shell scripts for specific might also make it easier for you to "capture the statistics" you wanted, and put them in a format you wanted. I see that gather and analysis as one of the harder parts of this effort. I can't commit to it right now, but I'd be interested in helping, maybe, a little. We'll see. But, thanks for suggesting and being willing. I'm sure it'd help things long term. The https://hudson.eclipse.org/hudson/job/buckminster-mdt-ocl-core-3.1-nightly/ job seems to have demonstrated an ability to malfunction in various ways: Stall before CVS Stall during CVS Stall during mv Stall during archive artefacts Stall despite being killed many times; CVS continues regardless It might be appropriate to replicate MDT/OCL as a single Hello world plugin with the same project dependencies, and then run it when Hudson has collapsed and verify that a Hello World program also collapses. > What about monitoring execution times of a set of existing jobs The problem is that I don't really have any insight as to what all the existing jobs do. So if job a usually takes 10 minutes and now takes 30, how does that help me? Plowing though 200KB of logs is tedious. (In reply to comment #2) > I guess one issue to settle up front, is how to "drive" the whole thing. A > series of shell scripts? Ant files? I was hoping anything but shell scripts. I can easily log into the shell and test CVS, the shell Proxy settings etc. but it's not easy for me to test the JAVA_OPTS proxy settings. (In reply to comment #3) > The https://hudson.eclipse.org/hudson/job/buckminster-mdt-ocl-core-3.1-nightly/ > job seems to have demonstrated an ability to malfunction in various ways How do you determine which components are malfunctioning? I see your last builds as "success" and a 754KB log file that makes my head spin. Regardless, what I'm looking for is the most simplest of jobs that will help me pinpoint any potential problem, rather than having me look at huge logfiles. Maybe the monitoring plugin [1] could also help here. [1] http://wiki.jenkins-ci.org/display/JENKINS/Monitoring Possibly two simpole log tests are avilable. a) initial stall: log file contains just two lines after 5 minutes. b) Hudson crash: This is the current return from https://hudson.eclipse.org/hudson/job/buckminster-mdt-ocl-core-3.1-nightly/302/console demionstrating that Hudspon needs a restart NOW. ERROR: Failed to archive artifacts: MDT-OCL.*/**, publishroot/**, promote.properties hudson.util.IOException2: Failed to extract /opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.1-nightly/MDT-OCL.*/**, publishroot/**, promote.properties at hudson.FilePath.readFromTar(FilePath.java:1577) at hudson.FilePath.copyRecursiveTo(FilePath.java:1491) at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:117) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:603) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:582) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:560) at hudson.model.Build$RunnerImpl.post2(Build.java:156) at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:529) at hudson.model.Run.run(Run.java:1361) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:140) Caused by: java.io.IOException at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:175) at hudson.util.HeadBufferingStream.read(HeadBufferingStream.java:61) at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:221) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141) at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:92) at org.apache.tools.tar.TarBuffer.readBlock(TarBuffer.java:257) at org.apache.tools.tar.TarBuffer.readRecord(TarBuffer.java:223) at hudson.org.apache.tools.tar.TarInputStream.read(TarInputStream.java:345) at java.io.FilterInputStream.read(FilterInputStream.java:90) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1025) at org.apache.commons.io.IOUtils.copy(IOUtils.java:999) at hudson.util.IOUtils.copy(IOUtils.java:33) at hudson.FilePath.readFromTar(FilePath.java:1565) ... 12 more a grep for "at hudson." is probably a pretty strong smell. I was discussing this with Thanh last week. I don't think we need this anymore. As more projects move to their individual hipp instances, the performance discrepancies can be easily observed at the OS level. |