Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 293830

Summary: reduce the number of platforms in the build
Product: [Eclipse Project] Platform Reporter: Kim Moir <kim.moir>
Component: RelengAssignee: Kim Moir <kim.moir>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: cocoakevin, daniel_megert, denis.roy, donald.smith, d_a_carver, eclipse.felipe, grant_gayed, irbull, john.arthorne, Michael_Rennie, mik.kersten, Mike_Wilson, mober.at+eclipse, Olivier_Thomann, pwebster, remy.suen, Silenio_Quarti, stefan.hausmann, susan, tjwatson
Version: 3.6   
Target Milestone: 3.6 M6   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on: 294378    
Bug Blocks: 298901    
Attachments:
Description Flags
patch to remove wpf platforms from build none

Description Kim Moir CLA 2009-10-30 15:54:39 EDT
These are the platforms that we build today. We are also adding linux.gtk.ppc64 to 3.6M4.

Windows XP
Windows (x86_64)
Windows (x86/WPF) **early access**
Linux (x86/GTK 2) 
Linux (x86/Motif) **testing only**	
Linux (x86_64/GTK 2)
Linux (PPC/GTK 2) 
Linux (S390/GTK 2) **early access** 
Linux (S390x/GTK 2) **early access**
Solaris 10 (SPARC/GTK 2)	
Solaris 10 (x86/GTK 2)	
HP-UX (IA64_32/Motif)	
AIX (PPC/Motif)	
Mac OSX (Mac/Cocoa) 
Mac OSX (Mac/Cocoa/x86_64)
Mac OSX (Mac/Carbon) 

I'd like to remove some of them to speed up the build and reduce the amount of data that we ship to eclipse.org and the mirrors.

Here are my initial candidates

Windows (x86/WPF) **early access**
Linux (S390/GTK 2) **early access** 
Linux (S390x/GTK 2) **early access**
Linux (x86/Motif) **testing only**	
Solaris 10 (SPARC/GTK 2)
Comment 1 John Arthorne CLA 2009-10-30 19:37:27 EDT
Solaris/SPARC 10 is a reference platform so I think we should still be building that. We might be ok with removing Linux PPC 32 that we build today once we have Linux PPC 64 (maybe with some small overlap). This is what we did on the Helios plan (removed PPC-32 and added PPC-64).

To clarify, you are just referring to stopping producing the zips of these platforms? I assume they would still be built and in our repository - and someone could produce an install for such a platform using the p2 director if desired? We could perhaps just provide the script to do that.
Comment 2 Kim Moir CLA 2009-10-31 13:29:33 EDT
Yes, I'd like to remove the number of zips in the build.  It takes a long time to build each one and zip them up.  Also, it consumes a lot of space and many of them have small download numbers.  Yes, the bundles and metadata would still be in the respository, running a script to build them seems like a good idea.
Comment 3 Martin Oberhuber CLA 2009-11-04 10:37:32 EST
I think that rather than stripping off certain host platforms, we could save way more by producing zipped p2 repositories for download, rather than individual downloadable ZIP archives.

The ZIP archives for various platforms duplicate a huge amount of data, whereas in a zipped repository every item is included only once. Many other Eclipse projects have migrated to zipped repositories already.

Where zipped repositories could help reducing the disk space needed, the need for processing power and network bandwidth could be reduced drastically by not signing every build. Especially for N-builds I do not really see why each and every N-build needs to be signed. If I was to vote where I get more for my time and money, I'd rather have more reliable builds and more performance test runs than signed N-builds.
Comment 4 Kim Moir CLA 2009-11-04 10:56:21 EST
Martin, we do have individual zipped p2 repositories for download.  The only runnable bits available are the SDKs, platform zips, p2 agent etc.  I agree that signing takes a long time.  The actual signing process takes about 50 minutes to complete, and longer if their are others ahead of us in the queue at the foundation.  When we first started signing builds, I asked if all build should be signed.  The arch council decided to sign every build because we didn't want to wait a week to find a bug related to a signing in an integration build.  I can bring this up at the arch meeting again if you would like to disable signing on nightly builds.
Comment 5 Martin Oberhuber CLA 2009-11-04 11:10:46 EST
Signing technology at Eclipse was young when we started this, and I have been involved in a few issues related to signing (with nested bundles etc) in the beginning. But that's two years ago now, and I cannot remember any signing related bug ever since. At the moment, I believe that signing is stable enough to allow doing it weekly / on I-builds only. Especially given that there's still enough time until the endgame.
Comment 6 Kim Moir CLA 2009-11-04 11:35:56 EST
Disabling signing will save a lot more time that removing any platforms from the build.  If the PMC agrees to this, I would be happy to remove it from the N-builds, it will literally take 2 min to do and save 50 :-)

One of the suggestions in the arch call was to only create the three platforms we test for the nightly build. Silenio has indicated that he also needs the 64 bit platforms to test.  This won't really save much time and will also need require a significant build script refactoring.   I run the same scripts for nightly and integration builds.  Anyways, it might just be a better use of our time to disable signing for n-builds and remove the zips that we don't use or have active committers such as the s390 builds etc.
Comment 7 John Arthorne CLA 2009-11-04 11:59:27 EST
We could skip signing on the N-builds but I think the more important issue is the time for the I-builds. During milestone weeks, or when rebuilds of the weekly I-build are required, the turnaround time on the builds is too long.

I was thinking an easier alternative for end users would be to reduce the number of platforms that we produce Eclipse SDK and Platform SDK for, but continue producing the Platform runtime binary on all platforms. That way users on those platforms can simply download the platform, and then install any remaining pieces they need via the platform update UI. The platform runtime binaries are much smaller and presumably faster to build.
Comment 8 Martin Oberhuber CLA 2009-11-04 12:45:53 EST
Quite frankly, I don't understand what makes any of the ZIPs faster or slower to build. First of all, the entire repository has to be built anyways, for ALL platforms. After that, the repository is signed (or not), and then all that happens is slicing the repository into ZIP archives. 

I can see how the slicing process takes disk space, but what is it that's taking time in the slicing process?
Comment 9 Martin Oberhuber CLA 2009-11-04 12:47:57 EST
PPS another option that might TREMENDOUSLY speed up build time is reducing network traffic for initial CVS checkout.

If the builds are still running outside Eclipse.org, checking out the entire Eclipse repository takes considerable time. This could be reduced by either running the build on Eclipse.org, or by doing incremental data transfer. Having a local git clone and doing "git pull" would be one option to optimize that.
Comment 10 John Arthorne CLA 2009-11-04 12:57:07 EST
(In reply to comment #8)
> I can see how the slicing process takes disk space, but what is it that's
> taking time in the slicing process?

Producing the zips isn't just a matter of slicing. It has to perform an install (run the p2 director) for each of those zips. The repository is entirely in compressed form, but the zips are in "runnable" form, so various plugins need to be extracted and other configuration steps need to be performed. I don't know if Kim has exact numbers but I recall this zip production step is a very time consuming part of the build (40+ of these install operations), and a higher cost than the actual transfer of bytes to/from eclipse.org.
Comment 11 Kim Moir CLA 2009-11-04 13:56:04 EST
I'd love invoke the actual build on build.eclipse.org.  You can do this today for test builds.  See bug 247332 comment 14.  Unfortunately, until I get the tests running in the cloud I can't run the real build on eclipse.org (See bug 247320).  The Eclipse foundation doesn't have any platform specific slave test servers associated with their Hudson instance.  I need to test VMware images of Windows, Linux and Mac as Hudson slaves and have a persistant repo for our bundles. Amazon EC2 has pretty slow upload speeds so it doesn't make sense to copy our bundles there every build.  In addition, I'll have to modify the build to run more tests in parallel so the test results are available more quickly. Also, I'll have to figure out who will actually pay for this service because right now I'm paying for my testing with my credit card :-) My testing is in the early stages for this change.

I looked at the numbers for last night's build
CVS checkout and fetching Orbit bundles - 20 minutes
signing  - 1 hour 14 minutes - The sigining queue is a black box so I can't tell what other builds were ahead of us last night.
director calls - 20 minutes
slicing and zipping p2 repos (for JDT, PDE, RCP etc zips) - 4 minutes
zipping up SDKs and platform zips - 30 minutes
Running tests - 6 hours 40 minutes

Of course, there are other steps such as compiling source etc. that take time.
Note that the director calls and all the zipping happens using ant parallel tasks to save time, and compilation is also run in a parallel mode.
Comment 12 Martin Oberhuber CLA 2009-11-04 15:16:15 EST
Whow. Those numbers are very interesting. Assuming that nobody wants to download / consume a build before the unittests are done, it looks like 6:40 hours test time has the biggest potential for improvement.

Perhaps tests could be separated into two classes. A fast set of sanity tests to declare the build good, plus a slower set of additional test that are allowed to continue running after the I-build has been declared good? S-builds and R-builds would only be declared after running all tests, of course, but from what John said I understood that making I-builds available faster should be an initial goal. 

For that goal, running tests on unsigned build while the signing is still proceeding might be another option, but that's a slippery path.
Comment 13 David Carver CLA 2009-11-04 15:37:53 EST
(In reply to comment #12)
> Whow. Those numbers are very interesting. Assuming that nobody wants to
> download / consume a build before the unittests are done, it looks like 6:40
> hours test time has the biggest potential for improvement.
> 
> Perhaps tests could be separated into two classes. A fast set of sanity tests
> to declare the build good, plus a slower set of additional test that are
> allowed to continue running after the I-build has been declared good? S-builds
> and R-builds would only be declared after running all tests, of course, but
> from what John said I understood that making I-builds available faster should
> be an initial goal. 
> 
> For that goal, running tests on unsigned build while the signing is still
> proceeding might be another option, but that's a slippery path.

Martin what you suggest is pretty standard practice.  Tests tend to get broken down into various groups and run at different stages of builds.  Nightly/CI builds may have a certain type of critical tests that are run, Integration may have another set of tests that are run to test the integration amongst components.  And there may be a set of tests that are run at release time as well.

Tests can take a long time to run, and breaking the long running build into smaller component based builds can be of great benefit.  Yes it's a bit more work for the poor build engineer, but this is why I think the team needs to be responsible for their builds not a sole person.

The component based builds can run their specific tests, and feed into the chain of builds farther down the line until you get to the complete integration build which is assembling everything.

Breaking your build into smaller stages can give you feedback to the developers sooner, allowing them to respond to issues sooner.  12 to 14 hours is way to long for a developer to get feedback that some unit test is wrong in their particular component.


So my suggestion would be to look at ways to break up your build into those pieces that are necessary for the type of build being done.  Not everything has to be done the same for every build, as each build suits a different purpose.

Also if you can, and you can deploy your tests to multiple machines, run them in parallel, it'll save you time.

The other thing to consider on the Tests, are the tests written efficiently and testing the correct things.  Are the test setups and tear downs overly complicated.  Why are the tests taking so long to run.  I know the number of tests that the platform runs is a lot, but as an example, I have about 8000 tests for wst.xsl that run in under a minute.   I've seen test suites in eclipse with far fewer tests take 10 times as long.  I'd look at which test suites are the longest running and see if you can determine why they are taking so long.
Comment 14 Kim Moir CLA 2009-11-04 16:10:53 EST
Martin, the build is available for download as soon as it's zipped up.  The JUnit tests are invoked as soon as all the zips needed for testing are available.  The build page is updated again when the test results are available.

Running tests on an unsigned build that aren't the actual binaries that will be released seems dicey.

Setup for Eclipse tests:
We unzip a new eclipse
We install the test bundles
We run the tests
Clean up
repeat

This ensures that the previous test run doesn't pollute the next one.

Again, chaining smaller builds on hudson at build.eclipse.org means I need test hardware there.  I don't have it.  So I have to test hardware in the cloud.

The easy solution to slow builds is: Double the number of test machines.  Presto: Build times drop by three hours.

It's not magic.  I just need hardware.
Comment 15 John Arthorne CLA 2009-11-04 16:46:14 EST
To return to the original topic, from the timing data in comment #11 it looks like there are 50 minutes taken to produce the zips, so there is an opportunity to reduce the build time somewhat by not producing some of the esoteric zips that almost nobody downloads anyway.
Comment 16 Denis Roy CLA 2009-11-05 13:13:47 EST
> signing  - 1 hour 14 minutes - The sigining queue is a black box so I can't
> tell what other builds were ahead of us last night.

FWIW, there is no signing queue anymore.  Separate processes are spawned once a signing request is made, so simultaneous signing actions are performed in parallel.  See bug 220037.
Comment 17 Denis Roy CLA 2009-11-05 13:15:12 EST
> Again, chaining smaller builds on hudson at build.eclipse.org means I need test
> hardware there.  I don't have it.  So I have to test hardware in the cloud.

What kind of hardware do you need?  Please entertain me.
Comment 18 Kim Moir CLA 2009-11-05 14:00:42 EST
Thanks, I didn't know that about the queues. I thought there was still just two queues.  

This isn't really the right bug for this discussion. But anyways, today we have 

JUNIT
2 windows
2 linux
1 mac
1 cvs test machine (linux) Could reside on the same machine as the database.

Performance
2 windows
2 linux
1 performance database machine (linux)

This discussion evolved to "I need hardware" because people were complaining about how long our builds take and our build wasn't modular enough.  The solution to this 
1) Modularity: Run chained builds at on hudson on build.eclipse.org
2) Faster test results: More test machines

In order to run our builds on Hudson, we need to have machines accessible via the hudson install at eclipse.org.  The existing machines are inside an IBM firewall and aren't mine to give away :-) I need to find a mechanism to run tests from Hudson on build.eclipse.org.  

Today, each test run takes between 6-8 hours per machine.  This is too long.  We have 54,000 tests.  That's why I'm testing cloud computing.  More machines = faster test results.

The build user also needs to control the display on all the machines because we run UI tests where there can't be any other UI events occurring to interfere with the tests. So the uid that the tests run as is also the uid of the person logged into X. The machines also need a mechanism to be managed remotely so they can be rebooted etc.
Comment 19 Dani Megert CLA 2009-11-11 09:34:32 EST
Completely agree with John's comments:
- we should get back to the original topic and discuss whether we can leave out 
  some of the esoteric builds
- N-builds aren't the issue for me either - it's the I-builds during the ramp 
  downs
- not signing N-builds is OK for me if that helps anyone in any way

Separating the tests into different categories sounds nice but it is definitely not something we can afford to invest our resources into at this point.
Comment 20 Kim Moir CLA 2009-11-11 17:08:10 EST
Disabling signing on N-builds will make the test results available about an hour earlier.  I've enabled this for the upcoming builds.  

With respect to I-builds, I've opened bug 294919 to make this process more efficient.
Comment 21 Kim Moir CLA 2010-01-25 14:11:51 EST
Any news from the PMC if the following platforms can be removed?

Linux (x86/Motif) **testing only**    
Windows (x86/WPF) **early access**

If not, I'll close this bug.
Comment 22 John Arthorne CLA 2010-01-27 10:45:25 EST
(In reply to comment #21)
> Any news from the PMC if the following platforms can be removed?
> Linux (x86/Motif) **testing only**    
> Windows (x86/WPF) **early access**

I just had a discussion with both the PMC and Silenio about this. It turns out Linux x86/Motif is still used regularly by the SWT team for testing. It is the easiest way for them to test/develop Motif since they have many more Linux machines than Solaris or AIX machines. They would like to keep this build around for their testing.

I have heard no support for WPF from anyone. This build has been fairly broken for a long time, so until someone steps up to develop and support it I think we can remove it (in 3.6 stream). I will send a message to eclipse-dev about this so there is some advance warning before we remove it.
Comment 23 Kim Moir CLA 2010-01-27 15:20:39 EST
Created attachment 157461 [details]
patch to remove wpf platforms from build
Comment 24 Kim Moir CLA 2010-02-01 11:51:35 EST
WPF removed for N20100201-2000 and tagged for I-build.
Comment 25 Stefan Hausmann CLA 2010-03-22 07:18:05 EDT
WPF will be the future of programming applications in den Microsoft Windows world.
Well, Win32-API will be supported by Microsoft as long as possible. But there will be features only available with WPF. Maybe not today but for sure in the future.