Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 498077 - performance tests have not been running for a while.
Summary: performance tests have not been running for a while.
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Releng (show other bugs)
Version: 4.7   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 4.7 M1   Edit
Assignee: David Williams CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-18 13:11 EDT by David Williams CLA
Modified: 2016-07-27 16:55 EDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Williams CLA 2016-07-18 13:11:45 EDT
They fail with a "timeout", I think while trying to install the derby feature/bundles. 

This *might* be due to some proxy/IP address changes made recently? 

the "build" machine, where the db servers is running is 172.25.25.57. 

The nonProxiedHosts we use in test machines (for use in workbench) says

nonProxiedHosts=*.eclipse.org|172.30.206.*. 

Perhaps this should now be 

nonProxiedHosts=*.eclipse.org|172.30.206.*|172.25.25.*

Seems worth a try anyway.
Comment 1 David Williams CLA 2016-07-18 13:18:56 EDT
http://git.eclipse.org/c/platform/eclipse.platform.releng.aggregator.git/commit/?id=77b7d8f882cb3e1d90632d72be743da3cbe8e051

I will be optimistic and say "fixed", but other changes may be needed. 

For example, slightly above the code that is "timing out" is a comment that says

    <property
      name="perfrepoLocation"
      value="http://build.eclipse.org/eclipse/buildtools" />
    <!-- use 'file://' if problems with proxies
    <property
      name="perfrepoLocation"
      value="file:///shared/eclipse/buildtools/" />
    -->

We can try that next, is adding the IP address doesn't work.
Comment 2 David Williams CLA 2016-07-18 23:15:39 EDT
My changes didn't make any difference. 

It might be hanging when it tries to run the first test. 

And might be because it can not find the database server. 

In the main Hudson job, there is a phrase that says 


if [[ -n "${eclipse_perf_dbloc}" ]]
then
  PERF_DBLOC_ARG="-Declipse.perf.dbloc=${eclipse_perf_dbloc}"
fi

And later prints "empty value" for PERF_DBLOC_ARG. I would assume if that is 
empty, it would be set elsewhere, deeper in the scripts, but, just in case not, I changed to:  


if [[ -n "${eclipse_perf_dbloc}" ]]
then
  PERF_DBLOC_ARG="-Declipse.perf.dbloc=${eclipse_perf_dbloc}"
else 
  PERF_DBLOC_ARG="-Declipse.perf.dbloc=//172.25.25.57:1527"
fi

= = = = = = = = = =


I should not, on my local test server, the "baseline" performance tests work fine -- well, they "run the tests" in anyway.
Comment 3 David Williams CLA 2016-07-19 08:45:42 EDT
Another problem is we still list "previous release" as "4.5.2". That should be "4.6.0" now. 

I think that "previousRelease" is used in several tests (such as p2, possibly, and API analysis, and performance tests.
Comment 4 David Williams CLA 2016-07-19 09:53:27 EDT
(In reply to David Williams from comment #3)
> Another problem is we still list "previous release" as "4.5.2". That should
> be "4.6.0" now. 
> 
> I think that "previousRelease" is used in several tests (such as p2,
> possibly, and API analysis, and performance tests.

I created bug 498137 for "previous release" and made the changes under that bug.
Comment 5 David Williams CLA 2016-07-20 01:29:35 EDT
I am still seeing in log, 

!ENTRY org.eclipse.equinox.p2.artifact.repository 4 1000 2016-07-19 21:36:35.605
!MESSAGE No repository found at http://build.eclipse.org/eclipse/buildtools.


I don't think this is a problem with proxies, but instead, an infrastructure setup problem where http requests from performance machine do not go through the normal web server which maps build machine directories to URLS. 

While the webmasters might fix that eventually, I think a fix more in our control is to try the direct file approach. That is, use 

    <property
      name="perfrepoLocation"
      value="file:///shared/eclipse/buildtools/" />

instead of 

    <property
      name="perfrepoLocation"
      value="http://build.eclipse.org/eclipse/buildtools/" />
Comment 6 David Williams CLA 2016-07-20 09:23:38 EDT
(In reply to David Williams from comment #5)
> I am still seeing in log, 
> 
> !ENTRY org.eclipse.equinox.p2.artifact.repository 4 1000 2016-07-19
> 21:36:35.605
> !MESSAGE No repository found at http://build.eclipse.org/eclipse/buildtools.
> 
> 

We should create a one-time job of something like 

wget --no-proxy http://build.eclipse.org/eclipse/buildtools/p2.index 

If that works, then it is our "proxy problem". 

If not, then it is because from the performance machine "build.eclipse.org" 
does not "resolve" correctly. 

In the meantime, I have tried the "file:///" approach and we'll see if that's any better.
Comment 7 David Williams CLA 2016-07-25 17:53:21 EDT
(In reply to David Williams from comment #6)
> (In reply to David Williams from comment #5)
> > I am still seeing in log, 
> > 
> > !ENTRY org.eclipse.equinox.p2.artifact.repository 4 1000 2016-07-19
> > 21:36:35.605
> > !MESSAGE No repository found at http://build.eclipse.org/eclipse/buildtools.
> > 
> > 
> 
> We should create a one-time job of something like 
> 
> wget --no-proxy http://build.eclipse.org/eclipse/buildtools/p2.index 
> 
> If that works, then it is our "proxy problem". 
> 
> If not, then it is because from the performance machine "build.eclipse.org" 
> does not "resolve" correctly. 
> 
> In the meantime, I have tried the "file:///" approach and we'll see if
> that's any better.

One of the latest, obvious failures I see on performance tests machines say: 
00:18:01  http://build.eclipse.org/eclipse/miscutils/xvfb-run:
00:18:01  2016-07-25 00:18:01 ERROR 503: Service Unavailable.

This does "smell like" "build.eclipse.org" is not accessible from Hudson infrastructure, but a further problem is that we should not be executing that code in the first place. There is some logic that says (basically) 

"if NOT running Hudson then use xvfb-run, else use Xvnc provided by Hudson". 

None the less, I have opened bug 498462 on infrastructure, since we have now seen two cases of something that relied on http://build.eclipse.org/  that worked before, but no longer does.
Comment 8 David Williams CLA 2016-07-25 18:07:38 EDT
(In reply to David Williams from comment #7)

> One of the latest, obvious failures I see on performance tests machines say: 
> 00:18:01  http://build.eclipse.org/eclipse/miscutils/xvfb-run:
> 00:18:01  2016-07-25 00:18:01 ERROR 503: Service Unavailable.
> 
> This does "smell like" "build.eclipse.org" is not accessible from Hudson
> infrastructure, but a further problem is that we should not be executing
> that code in the first place. There is some logic that says (basically) 
> 
> "if NOT running Hudson then use xvfb-run, else use Xvnc provided by Hudson". 
> 
> None the less, I have opened bug 498462 on infrastructure, since we have now
> seen two cases of something that relied on http://build.eclipse.org/  that
> worked before, but no longer does.

I looked at this some more and discovered one place it does work ok is where I had put in the --no-proxy option on wget was working, but the other place I had not. So I think this is is just a proxy issue. 

Plus, this is NOT the place where I had the pseudo logic mentioned above to check if we were running on Hudson or not. I should add that logic there in the Hudson scripts too.
Comment 9 David Williams CLA 2016-07-26 11:31:42 EDT
Getting closer! 

Adding "--no-proxy" to the wget calls (and using file:/// for the p2 repo) seems to allow each test to run, in the latest "nightly". 

The program that "runs the analysis" failed, though, reporting "no more handles" for swt. That is because the performance anaysis program needs a "UI" to create its pretty pictures. 

Therefore in the "collectResults" job, I enabled "use Xvnc" in that Hudson job. (And restarted the I-build performance tests, which did not yet have the --no-proxy added). 

= = = = = = = = = 

One remaining issue may be that the "collectResults" job treats "baseline" jobs and "current build" jobs the same (from a quick glance). 

But, infact, there should be no analysis done on "just" the baseline jobs. They provide data into database that is then used when the "current job" is analysized. 

So, may have to add a "quick exit" from collect results, if the triggering job ends with "-baseline". I am not sure what will happen if an analysis is attempted on that alone, but I would guess it would fail?
Comment 10 David Williams CLA 2016-07-26 11:53:41 EDT
(In reply to David Williams from comment #9)
> Getting closer! 
> 
> Adding "--no-proxy" to the wget calls (and using file:/// for the p2 repo)
> seems to allow each test to run, in the latest "nightly". 
> 
> The program that "runs the analysis" failed, though, reporting "no more
> handles" for swt. That is because the performance anaysis program needs a
> "UI" to create its pretty pictures. 
> 
> Therefore in the "collectResults" job, I enabled "use Xvnc" in that Hudson
> job. (And restarted the I-build performance tests, which did not yet have
> the --no-proxy added). 
> 
> = = = = = = = = = 
> 
> One remaining issue may be that the "collectResults" job treats "baseline"
> jobs and "current build" jobs the same (from a quick glance). 
> 
> But, infact, there should be no analysis done on "just" the baseline jobs.
> They provide data into database that is then used when the "current job" is
> analysized. 
> 
> So, may have to add a "quick exit" from collect results, if the triggering
> job ends with "-baseline". I am not sure what will happen if an analysis is
> attempted on that alone, but I would guess it would fail?

I put this logic at the end of the performance jobs, where "collectResults" is called: 

if [[ $JOB_NAME =~ .*-baseline ]] 
then 
   printf "\n\t[INFO] Collect results not called since this job was a \"-baseline\" job."
   exit 0
fi

It is a normal exit, not an error condition 

I have also *tried* to execute last night's N-build performance job with all steps temporarily disabled, except the last, to see if "collectResuilts" does any better now that Xvnc has been enabled on the "collectResults" job. Not sure if that "re-execute" will work or not.
Comment 11 David Williams CLA 2016-07-26 12:38:24 EDT
(In reply to David Williams from comment #10)

> if [[ $JOB_NAME =~ .*-baseline ]] 

I should have anchored this regex, but can never remember if $ needs to be escaped or not, so will do a bit later, after testing some simple cases later. 

But, I think it just needs to be 

if [[ $JOB_NAME =~ .*-baseline$ ]]
Comment 12 David Williams CLA 2016-07-26 17:22:37 EDT
(In reply to David Williams from comment #9)
> Getting closer! 
> 
> Adding "--no-proxy" to the wget calls (and using file:/// for the p2 repo)
> seems to allow each test to run, in the latest "nightly". 
> 
> The program that "runs the analysis" failed, though, reporting "no more
> handles" for swt. That is because the performance anaysis program needs a
> "UI" to create its pretty pictures. 
> 
> Therefore in the "collectResults" job, I enabled "use Xvnc" in that Hudson
> job. (And restarted the I-build performance tests, which did not yet have
> the --no-proxy added). 
> 
> = = = = = = =

The Xvnc in ep-collectResults job did the trick: 
https://hudson.eclipse.org/releng/view/Releng/job/ep-collectResults/

The graphical results are now displayed again, at least after rerunning the performance tests from last night's nightly build: 
http://download.eclipse.org/eclipse/downloads/drops4/N20160725-2000/performance/performance.php


> 
> One remaining issue may be that the "collectResults" job treats "baseline"
> jobs and "current build" jobs the same (from a quick glance). 
> 
> But, infact, there should be no analysis done on "just" the baseline jobs.
> They provide data into database that is then used when the "current job" is
> analysized. 
> 
> So, may have to add a "quick exit" from collect results, if the triggering
> job ends with "-baseline". I am not sure what will happen if an analysis is
> attempted on that alone, but I would guess it would fail?

But, I was wrong about this "baseline" part. I guess. Since the "baseline junit results" are missing from that performance.php page, I guess the ep-collectresults has to be ran first on them, and then the differenced about calling "performance.ui" App is handled elsewhere. 

So, I ahve removed that check for .*-baseline and we'll see if I-build does any better. 

Since we did two I-builds today, I will likely cancel test-jobs I see from the first I-build, and just use those from the second I-build. 

But, bottom line, after fixing proxy issues, some hardcoded bits of code, and enabling Xvnc I think we are pretty close to "where we were". 

But even then, susupect one of the "hard coded" pieces is wrong (and wrong before) since one of the labels on the "line graphs" says 
R-4.6--4.6-201606061100.
Comment 13 David Williams CLA 2016-07-27 16:55:07 EDT
I'm going to call this "fixed". The performance tests and results appear to be running as they were before -- which is not to say "correctly" -- but they at least work in a similar way. 

The one difference is that the "unit test" results from "baseline" run are not published (summarized). There is just a blank column, but that is unrelated to these other issues (which were related to the move to Hudson and changing the major version etc.) so will handle separately. I have opened bug 498694 for that.