Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 312645

Summary: Consider moving to b3 aggregator for Helios
Product: Community Reporter: David Williams <david_williams>
Component: Cross-ProjectAssignee: David Williams <david_williams>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: filip.hrbek, henrik.lindberg, john.arthorne, karel.brezina, Kenn.Hussey, mknauer, sbouchet, thomas
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows 7   
Whiteboard:
Bug Depends on:    
Bug Blocks: 312656    
Attachments:
Description Flags
diff on the directories for quick read of how the repos differ
none
another 'diff' as of 5/17 none

Description David Williams CLA 2010-05-12 11:18:27 EDT
This would be for only the "back end", under the covers part ... no changes required to .build files, or other processes, for Helios. 

The b3 aggregator (batch mode) is the replacement for the old "bucky builder". There has been talk of moving to use it for Helios for months, but it didn't receive much attention (from me) until recently. I've been using/testing it "locally" for a few weeks and it at least seems feasible to consider. The Buckminster team has been very responsive in fixing bugs, most of which have been to make the "transition" use of it invisible to contributors to Helios. Long term, there is a new contribution file format that we can use next release. But suspect there is no compelling reason for that large of a change this late in the release. But from what I've seen locally, we can "swap in" the new b3 aggregator with out any changes to .build files for Helios release. 

But there are a few issues, one small, one big. 

The small one is that the Hudson jobs for our Helios aggregation have to be changed. I'd want to do it in a way that both aggregators could still be used, at least a few weeks, while we do final evaluations of the new one, and its results. So, there is always some risk of "breaking things" when I do that. I've been "practicing" on my local machine, and as a result made general improvements to the scripts also ... but, from experience, no matter how perfect my local tests go, there's likely to be a little "down time" when moving to production. I'd expect a few hours. I plan on adding the new jobs this afternoon (Wednesday, eastern time) unless there is quick objections. 

The larger issue, is that in my local builds, the repositories produced seem to have some small differences. Of course, I've not ran them at _exactly_ the same time, so it may simply be that contributions changed from one run to another, but I think the resulting repositories will have to be tested and differences explained, before making a final decision. 

There are a few motivations for making this move, even though its late in development cycle. First, pragmatically, it does mean there is only one aggregator to run and maintain in the future. More important, to contributors to Helios, it does have some new functionality that may be desired and does do  some slightly better error checking (from what I've seen) catching a few problems (when certain options turned on) that have so far been undetected by the older builder. 
 
So, I am convinced we should at least run it, take a look at the results. 

The final decision will be made after discussion in this bugzilla. 

The main issue is timing. Ideally we'd have a month or so to evaluate, but also ideally we'd use the new results in our RC1 repo (due to be finished by 5/19, for availability on 5/21. So, that gives us only one week to evaluate and make the decision. Let me be aggressive in setting the plan, and say we will set 5/18 as the go no-do date. Where "go" means looks good enough to use for RC1, and "no-go" means more time is needed to evaluate.

Seem reasonable? 

thanks,
Comment 1 David Williams CLA 2010-05-12 23:41:59 EDT
I've set up the hudson jobs to run both or either builder. 
https://build.eclipse.org/hudson/view/Repository%20Aggregation/

I have only the old bucky builder on "automatic" for now, the aggregator takes a "manual" start. 

The results from the last aggregator build are in 
/shared/helios/aggregation

The results from the last buckyBuild are in 
/shared/helios/buildresults

normally, at least. In setting things up, I accidentally started a buckybuild, and cancelled it, so there are no results for it. 

But, before making the switch, I promoted the buckyBuild results to the normal staging site at  
~/downloads/releases/staging

And ... there is quite a difference. 1009M repo produced by bucky builder, 966M produced by aggregator. They were based on same .build files (though, as always, someone might have changed their repo in between runs). I haven't looked at metadata per se, but did a diff of the directories, which I'll attach. From a quick glance it appears is is mostly "extra" stuff in the bucky builder repo, but not entirely. 

So .. which is right? Is the aggregator doing a better job? Or missing something?
Comment 2 David Williams CLA 2010-05-12 23:45:43 EDT
Created attachment 168315 [details]
diff on the directories for quick read of how the repos differ

I have, by the way, now set the "--production" flag on both builds, so email should be sent out on failure. I think you can see the other options I specified by looking at the build logs. Let me know if the difference in repos is some setting I've missed. 

Obviously, we need to understand these differences before going much further, or even turning on the aggregator build to be "automatic".
Comment 3 David Williams CLA 2010-05-12 23:50:24 EDT
Oh, and perhaps I should be explicit, as this is reviewed, remember the bucky biulder is based on 3.5.2 (and the p2 that comes with it) where as the b3 aggregator is based on 3.6 M7 ... so, its not literally only the builder that differs ... but also the version of p2.
Comment 4 Thomas Hallgren CLA 2010-05-14 06:31:41 EDT
David asked me to produce list of compelling reasons to move to the b3.aggregator. So here it is:

1. New functionality.
  - The b3.aggregator makes it possible to create hybrid repositories that are
    directly consumable by both p2 and Maven (bug 312656).
  - All now known platform configurations can be added (bug 312011). Many
    of them are not supported in the old model.
  - Numerous improvements in p2 makes the new aggregator both
    faster and less of a memory hog. I'm convinced that this is also the
    reason for the exclusion of a lot of seemingly redundant files when
    the new aggregator is used (investigation is still ongoing).
  - Using the latest version of p2 makes it possible to read and understand
    the next generation repository format (negated requirements in
    particular).
  - Although not yet released, the b3 aggregator is already used in several
    production environments and has received a lot of attention. It is well
    documented [1] and the project is very active.

2. A much improved model.
The old buckyBuilder is based on the Amalgamation build model, designed with the objective to build a product. This was one major reason for creating the new aggregator and it now runs from a model that is tailor made for repository aggregation. 

At present, a model to model transformation layer is inserted that makes it possible to run from the exact same build files as before but after the initial release we will plan to change this and start using the new format.

The editor for the new format contains a repository browser which makes it very easy to create contributions. It is also possible to create contributions that includes everything without specifying any features, or by specifying exclusions rather than inclusions.

3. Moving forward benefits the community.
When trying the new aggregator for Helios, we immediately discovered one major bug in p2 (bug 312175) and raised the attention to another (bug 310591). We also discovered a flaw in the platform repository (bug 312181) and revealed some inconsistencies in one of the contributions (bug 312149). I think this is worth mentioning because the challenges faced by changing really helped getting other things on display and fixed.

[1] http://wiki.eclipse.org/Eclipse_b3/aggregator/manual
Comment 5 Thomas Hallgren CLA 2010-05-14 08:18:17 EDT
Karel Brezina has made a really thorough analyze of the differences between the results from the old and the new aggregator. I include his findings here together with some comments:

> I've done some more analyses of the MDRs with the following results:
>
> - IU with id "a.jre" is missing in the NEW build, however, it's not required by any other IU.
> There exists two IUs with similar ids "a.jre.javase" and "config.a.jre.javase" in the NEW build
>
This is to be expected. The old build was based on the Amalgamation product model and a product was included, hence also a generated 'a.jre' and associated entries.

> - there are 139 other IUs which are missing in the NEW build, but these are IUs with
> older versions. The NEW build always provides a newer version.
>
This is most likely due to a p2 bug. I didn't find the exact bug but the planner is obviously better at finding an optimal plan where redundant IU's are excluded.

> - there are 4 IUs which are missing in the OLD build
>
> org.eclipse.ecf.core.featurepatch.feature.group    3.3.0.v20100512-0830       missing in OLD 

Wow!
Completely forgot that one. The old aggregator will simply apply the patch! That's really bad. The new aggregator however, will include the patch IU and all IU's needed both when the patch is applied and when it's not. The details of this is covered in bug 295740.

This is yet another compelling reason to swap.

> org.eclipse.team.svn.ui.capabilities               0.7.9.I20100512-1900        missing in OLD 
> org.eclipse.team.svn.ui.capabilities.feature.group 0.7.9.I20100512-1900        missing in OLD 
> org.eclipse.team.svn.ui.capabilities.feature.jar   0.7.9.I20100512-1900       missing in OLD 
>

All bundles named something '.capabilities' or features named '.capabilities.feature.group' are skipped by the old builder today. The subversive team should no longer include this in their contribution unless they really want it.

> - there are 18 external bundles which are provided in multiple versions in the NEW
> build. The same versions are in the OLD build, I just wander if all of them are really
> needed. Some differences are only in version qualifier:
>
This is OK from a p2 standpoint but it is probably good cause for a warning. It means that there are conflicting references to those bundles but since the bundles are not singletons a viable plan can be produced anyway. But it's definitely not good.
Comment 6 David Williams CLA 2010-05-14 09:43:55 EDT
Thanks for the comments and analysis of the differences. This makes me comfortable to turn both on "automatic" for the next few days, so we can hopefully produce repositories which are equivalent. I say "hopefully" since even though they will be based on same set of .build files, I will let the old builder run first, and then the new one, and it is possible someone can change their repository while the jobs are running and give different results for that reason. But, seems worth trying. [And all this will have to wait for the hudson server to be restarted ... sigh].
Comment 7 David Williams CLA 2010-05-14 09:46:04 EDT
Can you clarify the status of b3.aggregator project? Its incubating? What is its release plan, exactly? Next year? off cycle? 

I'm not sure the old builder was ever released, per se (just a "build time tool"?) but I think that could be another reason to switch?
Comment 8 David Williams CLA 2010-05-14 15:57:55 EDT
Oh, and one thing that wasn't addressed is why some of the jars (with exact name and version number) are "different". Such as .. 

Files aggregation/final/aggregate/plugins/org.eclipse.stp.bpmn_1.2.0.201005050221.jar and /home/data/users/david_williams/downloads/releases/staging/aggregate/plugins/org.eclipse.stp.bpmn_1.2.0.201005050221.jar differ

They all come from the same project, so _maybe_ the project just corrected something and did an update (though, doesn't seem the qualifiers should ever be the same, if the jars are different). 

Some reasons for this might be unconditioned jars, which are pack 200'd? As recently discussed in bug 312802 where some checksums have been found to be in error. 

But, seems to me, ideally, the aggregator itself would never "create" a jar (just copy/mirror) this minimizing opportunities that it itself ever changes anything. 

Were those differences investigated? (Not sure if the differences still exist, in current state of repos).
Comment 9 David Williams CLA 2010-05-16 22:31:08 EDT
FWIW, I opened bug 313052 to cover the svn capabilities feature.
Comment 10 David Williams CLA 2010-05-17 10:27:13 EDT
Created attachment 168735 [details]
another 'diff' as of 5/17

This diff shows the difference in repositories produced as of this morning. There are some contributions that have been removed due to dependency problems, but I thought it instructive to show the latest. 

Note we still get the "jars differ" listing, for stp jars. Given the builder bug described in bug 312802, where jars are re-created from pack.gz files, but their checksum copied, it makes me wonder if the new aggregator has a different algorithm. Could it in fact be copying the jars instead of re-creating them? If so, I think that's the way it should work anyway (bug 312976) but then the option should be renamed to "COPY_BOTH". 

Just wanted to provide the latest comparison data in this diff out file.
Comment 11 Thomas Hallgren CLA 2010-05-17 10:55:39 EDT
(In reply to comment #10)
> ... Could it in fact be copying the jars instead of re-creating them? If
> so, I think that's the way it should work anyway (bug 312976) but then the
> option should be renamed to "COPY_BOTH". 
> 
No, only the packed jar is copied if it's found. Could the reason that the jars differ be that they are unconditioned and then unpacked on different platforms?
Comment 12 David Williams CLA 2010-05-17 12:13:45 EDT
(In reply to comment #11)
> (In reply to comment #10)
> > ... Could it in fact be copying the jars instead of re-creating them? If
> > so, I think that's the way it should work anyway (bug 312976) but then the
> > option should be renamed to "COPY_BOTH". 
> > 
> No, only the packed jar is copied if it's found. Could the reason that the jars
> differ be that they are unconditioned and then unpacked on different platforms?

Both Java 5, if that's what you mean. Different eclipse platforms, but both use Java 5.  In both cases, I launch with 'eclipse' and I explicitly set the VM and 
explicitly set 
-Dorg.eclipse.update.jarprocessor.pack200=${JAVA_5_HOME}/jre/bin

Anything in this area change between Eclipse 3.5.2 and 3.6 M7? 

Only funny thing I've noticed, is that some Java environment variables appear set to the "system vm" ... instead of our /shared/common/ java 5 ... but can't imagine the Eclipse Platform would pick one of those: 

[echoproperties] env.JAVA_HOME=/shared/common/ibm-java2-ppc-50

[echoproperties] env.JAVA_BINDIR=/usr/lib/jvm/java/bin
[echoproperties] env.JAVA_ROOT=/usr/lib/jvm/java
[echoproperties] env.JDK_HOME=/usr/lib/jvm/java
[echoproperties] env.JRE_HOME=/usr/lib/jvm/java/jre
[echoproperties] env.SDK_HOME=/usr/lib/jvm/java

Maybe I should unset them? To be safe?
Comment 13 David Williams CLA 2010-05-18 02:06:30 EDT
Another anomaly, that I've seen in the latest build, is that the b3 aggregator seems to "miss" a org.apache.commons.discovery pack.gz file: 

Only in buildresults/final/aggregate/plugins: org.apache.commons.discovery_0.2.0.v201004190315.jar.pack.gz

For details, buckyBuider ends up with these 2 pair: 

 67966 2010-05-17 23:48 org.apache.commons.discovery_0.2.0.v200905122109.jar
 66756 2010-05-17 23:48 org.apache.commons.discovery_0.2.0.v200905122109.jar.pack.gz
 67964 2010-05-17 23:46 org.apache.commons.discovery_0.2.0.v201004190315.jar
 66768 2010-05-17 23:57 org.apache.commons.discovery_0.2.0.v201004190315.jar.pack.gz


The aggregator ends up with this pair and a half: 

67966 2010-05-18 00:44 org.apache.commons.discovery_0.2.0.v200905122109.jar
66756 2010-05-18 00:44 org.apache.commons.discovery_0.2.0.v200905122109.jar.pack.gz
67964 2010-05-18 00:42 org.apache.commons.discovery_0.2.0.v201004190315.jar


I'm not sure how this would "make sense" .. but, wanted to document it here.
Comment 14 David Williams CLA 2010-05-18 02:12:06 EDT
To report a positive thing for b3 aggregator, today I saw buckyBuilder work and work and work, for 2 hours before deciding "not all dependencies could be satisfied". 

The b3 aggregator came to same conclusion in 5 or 10 minutes! Not exactly ran under scientific conditions ... but illustrates the improvements in the newer p2.
Comment 15 David Williams CLA 2010-05-18 02:25:51 EDT
So, were we are on the 18th (barely) -- the go/no go day -- and my view is we should switch to the b3 aggregator. I wish we could have gotten more builds in (only 3 or so that could be compared) ... too many had problems that caused them to fail for half a day, and by then, someone else would be failing, as it always goes. 

But, from what I've seen, the old builder has plenty of 
problems itself, (bug 312802) so part of my view is "how much worse could it get"? 

I know that's not a ringing endorsement, but seems to reflect the current state of things. 

So, while lots of fixes/explainations/tests still should be done, I plan to run on only the b3. aggregator and soon promote its results to staging, so it can pass through the EPP packaging jobs, to make sure there's no surprises.
Comment 16 Thomas Hallgren CLA 2010-05-18 03:26:11 EDT
(In reply to comment #13)
> The aggregator ends up with this pair and a half: 
> 
> 67966 2010-05-18 00:44 org.apache.commons.discovery_0.2.0.v200905122109.jar
> 66756 2010-05-18 00:44
> org.apache.commons.discovery_0.2.0.v200905122109.jar.pack.gz
> 67964 2010-05-18 00:42 org.apache.commons.discovery_0.2.0.v201004190315.jar
> 
> 
> I'm not sure how this would "make sense" .. but, wanted to document it here.

Do you still have the log output from the b3.aggregator that was generated when this happened (I'm interested in the part that says mirroring artifact ...).
Comment 17 David Williams CLA 2010-05-18 05:53:40 EDT
(In reply to comment #16)
> (In reply to comment #13)
> > The aggregator ends up with this pair and a half: 
> > 
> > 67966 2010-05-18 00:44 org.apache.commons.discovery_0.2.0.v200905122109.jar
> > 66756 2010-05-18 00:44
> > org.apache.commons.discovery_0.2.0.v200905122109.jar.pack.gz
> > 67964 2010-05-18 00:42 org.apache.commons.discovery_0.2.0.v201004190315.jar
> > 
> > 
> > I'm not sure how this would "make sense" .. but, wanted to document it here.
> 
> Do you still have the log output from the b3.aggregator that was generated when
> this happened (I'm interested in the part that says mirroring artifact ...).


https://build.eclipse.org/hudson/view/Repository%20Aggregation/job/helios.runBothBuilders/23/consoleFull
Comment 18 Thomas Hallgren CLA 2010-05-18 06:58:00 EDT
The explanation for the anomaly with org.apache.commons.discovery_0.2.0.v201004190315 is that it is found in more then one artifact repository:

This one has both an unpacked and packed version:
http://download.eclipse.org/webtools/downloads/drops/R3.2.0/S-3.2.0RC1-20100513125036/repository/

Here, the packed version is missing:
http://download.eclipse.org/birt/update-site/2.6-interim/

The b3.aggregator encounters the latter first which very likely is due to the use of much different and more efficient query mechanism in p2.

We currently have no mechanism to continue searching for the same artifact in other repositories. Once it's found and the copy is successful, packed or not, the aggregator is content.
Comment 19 David Williams CLA 2010-05-18 08:26:11 EDT
Thanks for digging into that commons.discovery issue, Thomas. 

Seems a small loss, if only one out of a thousand. 

I've opened bug 313332 for this one case, and guess in general, if we find a "jar only" we can look backwards in the log to see where it came from (and eventually determine if it should have a pack.gz counterpart or not). 

Thanks again,
Comment 20 David Williams CLA 2010-05-18 09:16:45 EDT
For the record, another difference I've discovered is the b3 aggregator doesn't include any mirrorsURL in artifacts repo. We used to find what it inserted, and changed it to be correct ... but, this is ok, I have an app that uses p2 APIs to add the mirrorsURL.
Comment 21 David Williams CLA 2010-05-18 09:18:12 EDT
I have just a bit ago promoted to b3 aggregation results to 'staging' which is where EPP packages pick it up ... hmm, has anyone told Markus? :)
Comment 22 Markus Knauer CLA 2010-05-18 09:24:18 EDT
(In reply to comment #21)
> I have just a bit ago promoted to b3 aggregation results to 'staging' which is
> where EPP packages pick it up ... hmm, has anyone told Markus? :)

Not really necessary, I am CC'ed to this bug.
I reverted my changes (that were not working anyway) and started a new build of the packages which hopefully finishes. The last builds have been interrupted when some people had to restart Hudson.
Comment 23 David Williams CLA 2010-06-14 02:38:52 EDT
I'd say this is done. But before marking as 'fixed', we need to decide when to move to "native" b3aggr files? I'd assume about a month after the release? 
I'm assuming we'd want to for maintenance release? (as well as future streams). 

We'll need a little "education plan", and be able to give people notice, so they have time to schedule any changes to scripts they might have that use/update the .build files.
Comment 24 David Williams CLA 2010-08-06 01:06:02 EDT
(In reply to comment #23)
> I'd say this is done. But before marking as 'fixed', we need to decide when to
> move to "native" b3aggr files? I'd assume about a month after the release? 
> I'm assuming we'd want to for maintenance release? (as well as future streams). 
> 

Just in time for Helios SR1 warm up, 
I have converted all the .build files to the "native" b3 aggregator format so I will count this as fixed. The conversion went smoothly and the first build worked, see https://build.eclipse.org/hudson/job/helios.runAggregator/239/

Naturally, will need to be confirmed by those providing content, but at least the build itself went fine. 

The 'helios.build' file was automatically converted to 'helios.b3aggr' file by b3 aggregator (in the IDE -- its always been doing that, during batch builds).

From there, there is a function where each contribution was selected in aggregator editor, and "detach resource" ran. I gave each contribution a file name similar to what it had, but with a .b3aggrcon extension, instead of .build

See http://wiki.eclipse.org/Eclipse_b3/aggregator/manual for more information about b3 aggregator. 

In general, projects can contribute (edit) .b3aggrcon files that change version numbers of features, or repository location URLs, but editing of things like categories, contributors names and email addresses, or even adding or removing features should be done by using the aggregator editor on the main helios.b3aggr file, which will update all relevant files related to those changes. 

If needed for comparison, or reverting!, I tagged 
org.eclipse.helios.build and 
org.eclipse.helios.tools with 
v201008060015preconversion

I will send message to cross-project list as the final step. 

Thanks to Filip and Thomas for their support and tools.