Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 389048

Summary: test results sometimes do not get routed back to right download page summary
Product: [Eclipse Project] Platform Reporter: David Williams <david_williams>
Component: RelengAssignee: David Williams <david_williams>
Status: VERIFIED FIXED QA Contact:
Severity: major    
Priority: P2 CC: daniel_megert
Version: 4.2   
Target Milestone: 4.3 M3   
Hardware: PC   
OS: Linux   
Whiteboard:

Description David Williams CLA 2012-09-07 08:21:39 EDT
Specific case: Our M20120905-2300 (RC3 candidate) shows only windows tests summarized and linked to page. But, looking on hudson logs, there was a linux test that completed normally at 

https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-JUnit-Linux2/230/

I think the main cause is part of our logic is at the end of a build, we has hudson for "next build number" and use that to figure out what job to watch and where to put the results once we get them back. 

But, if there are several jobs in the que, my guess is "next build number" reports the same thing (before each added to que). So our results from M20120905-2300 might actually have ended up on the M20120905-1640 page? 

So ... we need to do something to improve this logic and "flow". 

This isn't an issue if no jobs in que, just when we get busy. 

Just wanted to capture the issue for now.
Comment 1 David Williams CLA 2012-09-07 08:23:00 EDT
FWIW, I can probably "manually" get results back to M20120905-2300 ... but, not until this afternoon, since I've several appointments and meeting till then.
Comment 2 David Williams CLA 2012-10-22 19:01:18 EDT
I have completed the initial design and code changes to fix this problem. 

Will try in tonight's (10/22) nightly build. 

In short, it now relies on a running cronjob (under committer id) to "check for data" and if found, to "get the zipped artifacts" from Hudson. 

The key is that "the data" is written at the end of the test job's completion, so there is perfect knowledge of what Hudson job (name and number) goes with which build (eclipseStream and buildId). For example, some data written to "testdata..." file might be

ep4-unit-lin64 295 N20121021-0230 4.3.0

With those 4 pieces of data, we can retrieve the results, then do what ever we were doing before. The initial, root problem, after all, was we knew 3 pieces of that data, but just did not know the build number, for sure. 

In terms of mechanics, the new design/system invokes a final build step and it triggers a downstream job "ep-collectResults". It is a very simple job, being passed the data, and simply writing it to /shared/eclipse/sdk/testjobdata
in files named "testjobdata<timestamp>.txt" such as testjobdata201210220658.txt, that has only those 4 pieces of data in it. 

It was done this way to have "the data" in a central place. In more detail, it is not actually good to write, from Hudson, to /shared ... at one point it was  "disallowed" for a short time ... but, it seems that is pretty well entrenched and unlikely to change. But, if it does, then "the data" will at least be (or, could be) at one URL and a cron job could essentially get it from there, instead of a file system ... but that'd be wasteful and/or harder to do with "JSON APIs". 

So, the whole thing is a simple as keeping track of the exact data, instead of relying on "next build number" which might not be accurate by the time the job actually runs. 

All the associated scripts and tasks are still overly convoluted (so, some chance I've missed something), but I'm going to mark this as fixed and keep an eye on tonight's nightly. The cronjob has been running for a bit, and just detecting/reading the data files and echos what it reads without invoking "the work" script, but those have been used/tested independently a fair amount. 

This will be a much more efficient design where the only thing looping is the normal linux cronjobs (which it is made to do) ...and our code simply reacts to changes that are detected (or else does nothing).
Comment 3 David Williams CLA 2012-10-22 19:52:58 EDT
To correct some terminology, (now that I was setting up maintenance build test jobs) ... it is not actually a "final build step" that was added, but use of "Trigger parameterized build" which required a plugin of same name to be installed on the Hudson instance.
Comment 4 David Williams CLA 2013-05-30 16:44:37 EDT
mass change to 'verified', as these bugs are either routine or obviously fixed build breaks.