Community
Participate
Working Groups
Currently when a new service release comes out, the content of the simultaneous release repository gets replaced with the contents from the new release. This causes limitations for certain uses of the combined release repository: can't perform rollback, old builds using the site aren't reproducible, etc. Instead we should use a composite repository to add new releases to the repository without deleting old contents. David, Pascal and I had an offline discussion about this recently so I'll record some of the useful details from that discussion here: Currently the release site is a simple metadata repository, combined with a composite artifact repository with two children: <child location='http://download.eclipse.org/eclipse/updates/3.5/'/> <child location='aggregate'/> To avoid deletions, the repository for each release can instead be placed in a separate sub-directory, which can be added to the main release repository as children, e.g.: <child location='http://download.eclipse.org/eclipse/updates/3.5/'/> <child location='aggregate-SR0'/> <child location='aggregate-SR1'/> <child location='aggregate-SR2'/> Adding the child to the composite metadata/artifact repositories would be performed as a final step once all the child repository contents are fully mirrored, etc.
The most obvious disadvantage with this approach is that it will take longer to download the release. Downloading more fragmented meta-data also makes the download more error-prone. Sitting on the other side of the Atlantic gives you a different perspective problems like that. Other implications: An early requirement for the release trains was that everything they contain should be installable together without conflicts. That cannot be done for an aggregation like this. We can of course do it for each of the sources but what requirements do we have on the result? What tests should be made there? Everything "latest" is installable? A downstream consumer who uses the release as input to his own aggregation and relies on the fact that it is consistent with respect to the requirement I just mentioned, can no longer do that. He must change his aggregation to use the parts of the source instead of the source itself. The parts change over time so that opens up a new can of worms. I bring this up because we have live customers that do this today. I would suggest another approach: Let a release repository stay focused on the current release like today. Introduce another repository that is the composite of all releases (intended for builds etc.) but give it another name. After all, the use-cases where this composite makes sense is probably less then a percent of the total downloads. So why should every normal consumer be hit by this?
Another consequence of using a composite that currently breaks our builds is that people make mistakes. We cannot use the update site at downloads/eclipse/updates/3.5.x for instance. If we do, we run into problems with fragments that are in conflict, ultimately caused by an incorrect version on the last org.eclipse.equinox.executable feature. We are forced to use the workaround suggested by Andrew: bug 290653 comment# c8, option b). Not a problem if evertything is done right at all times. I just want to stress the point that validation controls is of essence. Not just on the indivitual parts, but on the composite as a whole.
I have opened a request to use Nexus as a repo manager to help us alleviate some of this pain. See bug #290806.
We also discussed a retention policy at the planning council while reviewing the the new guidelines for the simultaneous release (http://www.eclipse.org/helios/planning/EclipseSimultaneousRelease.php). I think it is important that we have a consistent retention policy across all projects participating in the common discovery site.
(In reply to comment #3) > I have opened a request to use Nexus as a repo manager to help us alleviate > some of this pain. See bug #290806. Not sure I see how Nexus would help with this. Can you please elaborate?
To me one of the root cause of this problem is the way we are creating repositories and how we manage them (and maybe the tool we currently use). For example why is it simpler to actually delete the old content than merge the two repos into one? Sure composite repos provide nice isolation if needed to be take one repo "out", however as you pointed out it creates additional points of failures and also contains a lot of duplicate of artifacts and metadata. Now how I see things fit with Nexus (note that I have not used the system yet and only saw demos and slideware) is that for each team contributing to the train, the project lead/release engineer would post on Nexus the version of their component they want to make available. From all of these components we would create the train candidate repo from which the testing would be done. Once everybody is happy and has signed off then this repo could be merged to the main repo thus cutting on the size of duplicate. So to summarize I think the advantages I see are around: - Staging and promotion of builds. - Repository merging - Web UI
(In reply to comment #6) > So to summarize I think the advantages I see are around: > - Staging and promotion of builds. > - Repository merging > - Web UI The Galileo Builder currently performs repository merging so I guess it's one of the tools you mention. It uses build contributions managed by each team as input to a fine grained aggregation process that uses the planner to verify the result on various platforms. Staging and build promotion is already place and has been used for some time. We have a web UI for Hudson and a graphical EMF based editor for the contributions. Over the last 3-4 months, we have been working hard on a vastly improved successor the Galileo Builder. If you have ideas on enhancements pertaining to repository merging I'd be very interested in hearing more about that. What has been mentioned so far in this bugzilla is mostly covered (we lack a good demo, but it's being worked on). Regardless of choice of tools I think there are two questions that needs to be answered: 1. Is it a good thing to expose each and every user to composites that contain far more then they need? I understand that there are some cases (Jeffs build for instance) where it comes in handy, but aren't they fairly rare? The approach does indeed have a negative performance impact as well. Why not let the consumer choose? I bet 9 out of 10 would select the one with only SR1 in it. 2. What kind of validation do we want to perform on the composite that contains all versions? How do we certify that it is coherent with respect to rollbacks etc. and what responsibilities will each time have in this?
The real tension here is between usecases where versions matter and where they don't. IMHO these are both well-represented in the community but today we only facilitate the latter case. It is not just build scenarios that matter here. There are a large number of teams out there that use tooling at a certain level and do not want to move. If they need a new install, where do they get it? In the past they got a zip for a package and then grovelled around for zips with all the other things they need. p2 enables us to rise above the zip muck but without the content we are still sucking dirt (love the images...). Or they had to replicate the parts of the repo they needed. The eclipse tooling itself (in the form of PDE target provisioning) follows the version-specific flow. This can be good or bad but clearly some people are interested in it. FWIW, in Yoxos we support both scenarios by maintaining a massive repo that we "slice" (not p2 slices) regularly into logical repos. Each slice is internally self-consistent and all are durable. This way users can choose the slice they want or simply use the latest. The need for durable repos is well-recognized in other communities (e.g., Maven central) and in Eclipse with all the discussion around Orbit and projects supporting builds. Makes sense to me that Eclipse support these cases.
(In reply to comment #6) > So to summarize I think the advantages I see are around: > - Staging and promotion of builds. > - Repository merging > - Web UI I think the same can be achieved with the new Aggregator with the additional benefit that anyone could run the same aggregation locally, should they need to debug. The area where we can increase quality is defining what the result should be (which repositories should be created, with what content, and what tests should be run on these repositories). Tools are after all just tools - had nexus been used, I believe the end result would have been the same.
(In reply to comment #4) > We also discussed a retention policy at the planning council while reviewing > the the new guidelines for the simultaneous release > (http://www.eclipse.org/helios/planning/EclipseSimultaneousRelease.php). I > think it is important that we have a consistent retention policy across all > projects participating in the common discovery site. I suggest you make a concrete proposal on planning mailing list, and get some buy-in there (eclipse.org-planning-council@eclipse.org).
adding myself as assignee, just to avoid mass emails. Be sure to add yourself to CC list if interested in getting all future updates.
Here's a collection of thoughts on this issue (not necessarily related to each other). Jeff, can you spell out when and why "version matters"? I'm not saying it doesn't, just not sure I understand clearly. In most cases I've seen, it was sort of a mistake on consumers end that made it matter. And, I think some of us projects implicitly support "only the latest version" (mostly). So, I just want to be sure we are explicit. Thomas, I too have concerns about performance. I know I saw degradation when we at webtools use to have multiple versions on same site, but I never measured it (and we no longer have multiple versions). Would you be willing to measure it? Devise some performance tests to quantify how much the increase would be? I ask because I think this issue may depend on details. If it was, for example, only a 10% increase, I'd think probably worth it, for the "extra" functionality, but if it was a 300% increase, that probably would not be worth it. Another, probably minor, concern I have is effect on mirrors. The current galileo site is approx. 400 Meg. So, if we have mulitple releases, that'd be, roughly, 800 meg., 1200 meg eventually. So ... is that too much? What is too much? Technically, if "old releases" were not used much, I suppose there is a way to move them to 'archives' and simply update the artifacts file, right? Pascal, I do not think we should start off "merging" repositories (to avoid duplicates, etc.). Partially since having separate directories for releases allows more and easier flexibility with things like archiving. Plus, separate directories allow some incremental "testing the waters" of having multiple releases in one site, and, if worst came to worse we could at least "delete" old ones from composite URL and just leave them in place with their own unique URL. I'm also concerned about UI. If someone unchecked "show only latest version" they will see quite a confusing list of stuff (or, maybe I should say, more confusing :). Anyone have any suggestions to improve that user experience? Is anyone else concerned? I should add, I do think its reasonable and good to have old versions around. If you imagine these (and/or project level) repos are supposed to replace zip files, then its similar to the way we currently archive old releases to 'archive' site. I'm less positive if the URL should be exactly the same, or if, as some suggested, people should use another URL to get to the old stuff. I'm also not clear on how individual projects should support this, or if they, in fact, should be the "source" of the older builds? They are currently for the archived zips. They would have to be for projects (or even components) not in the common repository. I'm not sure its a great practice to encourage consumers use the "common" URL, as opposed to using individual project URLs. Is there a particular reason the common repository URL is important for these scenarios where exact version and old releases are required? We dont normally do something "as a group" unless everyone in the group does it. Are we "forcing" some sort of support expectation on individual projects they may not know about, or agree to? Are we doing them a favor, or adding a burden? Can someone clarify this talk of "rollback"? I assume if someone installed something new, the old stuff was left in their installation, and a rollback was all handled locally with the installs metadata. Is that not the case? Or by "rollback" in this context, do people just mean it more conceptually ... reinstalling an older version? Thanks.
(In reply to comment #12) > Jeff, can you spell out when and why "version matters"? I'm not saying it > doesn't, just not sure I understand clearly. In most cases I've seen, it was > sort of a mistake on consumers end that made it matter. And, I think some of us > projects implicitly support "only the latest version" (mostly). So, I just want > to be sure we are explicit. Certainly. You have a team that is using tooling based on Galileo (not SR* but actually the June release). Your team has their own bundles or bundles from elsewhere and that are known to work with Galileo. You are interested in everyone on the team using a consistent toolchain. Version matters. If the content at Eclipse.org is not durable then you have to cache and manage your own copy. Similarly, say your team is using whatever tooling but building a product due to ship on Galileo (again, not whatever update happens to be available). Further, your team is using target platforms with the target provisioning software site support. This support identifies specific versions of features. If those are not durable, your target fails to load as soon as the repo is modified. The same here could be true of a build system that fetches things prior to building. In both cases yes, it would be possible for the team to grab a copy of whatever they need and cache it somewhere. That's not a great solution in general however and does not at all diminish the need for durability (in fact it precisely illustrates the need). > Thomas, I too have concerns about performance. I know I saw degradation when we > at webtools use to have multiple versions on same site, but I never measured it > (and we no longer have multiple versions). Would you be willing to measure it? > Devise some performance tests to quantify how much the increase would be? I ask > because I think this issue may depend on details. If it was, for example, only > a 10% increase, I'd think probably worth it, for the "extra" functionality, but > if it was a 300% increase, that probably would not be worth it. Note that in the end it is location durability that matters here. Aggregating into the current repo is one possible answer. In the end it does not matter as long as expectations are set relative to the management of the URIs. People need to know that this Galileo repo URI is not durable and, as is argued there, that there is a durable URI for each release content. > Another, probably minor, concern I have is effect on mirrors. The current > galileo site is approx. 400 Meg. So, if we have mulitple releases, that'd be, > roughly, 800 meg., 1200 meg eventually. So ... is that too much? What is too > much? Agreed. Mirrors should not have to mirror everything. > Technically, if "old releases" were not used much, I suppose there is a way to > move them to 'archives' and simply update the artifacts file, right? Moving on disk is fine. Moving in URI space breaks everyone and does not help. > I'm also concerned about UI. If someone unchecked "show only latest version" > they will see quite a confusing list of stuff (or, maybe I should say, more > confusing :). Anyone have any suggestions to improve that user experience? Is > anyone else concerned? Well if we are only going to have the current release in the repo perhaps we just remove the option to show multiple ;-P > I should add, I do think its reasonable and good to have old versions around. > If you imagine these (and/or project level) repos are supposed to replace zip > files, then its similar to the way we currently archive old releases to > 'archive' site. > > I'm less positive if the URL should be exactly the same, or if, as some > suggested, people should use another URL to get to the old stuff. IMHO changing the URL breaks people. .target files no longer work, PDE build map files with versions fail, ... Surely in this day and age we can manage to create 4 durable URLs per year?! > I'm also not clear on how individual projects should support this, or if they, > in fact, should be the "source" of the older builds? They are currently for the > archived zips. They would have to be for projects (or even components) not in > the common repository. I'm not sure its a great practice to encourage consumers > use the "common" URL, as opposed to using individual project URLs. Is there a > particular reason the common repository URL is important for these scenarios > where exact version and old releases are required? We dont normally do > something "as a group" unless everyone in the group does it. Are we "forcing" > some sort of support expectation on individual projects they may not know > about, or agree to? Are we doing them a favor, or adding a burden? See Maven central justification and usecase for answers to the above. Individual projects should not *have* to do anything. By participating in the train their release would be put into a durable repo location. That is step one. The next step is to allow projects to opt in with other/more content. > Can someone clarify this talk of "rollback"? I assume if someone installed > something new, the old stuff was left in their installation, and a rollback was > all handled locally with the installs metadata. Is that not the case? Or by > "rollback" in this context, do people just mean it more conceptually ... > reinstalling an older version? Unused content is garbage collected from your install. So it may or may not be retained locally depending on when the GC ran. From p2's point of view that is simply a caching strategy. It will happily go and redownload the bytes as needed when rolling back your install to something earlier.
(In reply to comment #12) > Thomas, I too have concerns about performance. I know I saw degradation > when we at webtools use to have multiple versions on same site, but I > never measured it (and we no longer have multiple versions). Would you > be willing to measure it? Devise some performance tests to quantify how > much the increase would be? I ask because I think this issue may depend > on details. If it was, for example, only a 10% increase, I'd think > probably worth it, for the "extra" functionality, but if it was a 300% > increase, that probably would not be worth it. > I think it's hard to measure the actual impact since it's dependent on a lot of factors. One thing is certain. The size of the meta-data when SR2 is released will be 3 times the size of one SR. That will give us at least 3 times the download time unless we actually merge the repositories into one file. It will also be 3 times as vulnerable for network failures. > Another, probably minor, concern I have is effect on mirrors. The current > galileo site is approx. 400 Meg. So, if we have mulitple releases, that'd be, > roughly, 800 meg., 1200 meg eventually. So ... is that too much? What is too > much? > IMO, the different releases should share one common artifact repository to avoid duplication of artifacts that remain the same between the releases. That in turn stresses one aspect of how things are built. Some build systems add timestamps or build numbers in the qualifiers regardless of if the actual thing has changed or not. That vouches for lots of unnecessary copies. My personal reflection on all of this is that we should provide the user with a choice. Durability is necessary to support a number of cases. But fact remains, the most common use-case (simply update or enrich your IDE), has no benefit from downloading meta-data with multiple versions stretching far back in time. It's rather the opposite. So why should that behavior be the default? If we can agree that that's not ideal, how do we present the different choices to the consumer? How is this done with Yoxos?
(In reply to comment #14) > (In reply to comment #12) > I think it's hard to measure the actual impact since it's dependent on a lot of > factors. If we can not figure out a way to measure some differences, I do not think we are in any position to "complain" that the performance is bad. And I'm not suggesting anything complex ... just a typical user-case. How about just picking a known, relatively complicated case, such as Java EE Feature installed into a bare Platform. That would pull many hundreds of plugins, with a wide range of possible range constraints. One test run could use sub-repos directly, such as /releases/helios/200911122150/ (which represents one "release", or milestone in this case) ... and another test run on the composite repo at /releases/helios/ (which as of just right now, contains two milestones). > IMO, the different releases should share one common artifact repository to > avoid duplication of artifacts that remain the same between the releases. I agree with this as a good idea in theory, but am concerned because I've not seen the Eclipse Platform do it this way ... why not? It seems the "staging" (or rollout) over mirrors would work best if there was a whole new sub-directory created, allowed to mirror, and then later "pointed to" via top level composite files. I'm not sure how to do that if there's only one big repo. [And, in case not obvious ... I'm not looking to have it explained to me ... but someone to actually do it! :) ]
(In reply to comment #15) > If we can not figure out a way to measure some differences, I do not think we > are in any position to "complain" that the performance is bad. I'm not so sure about that. Network performance degrades with at least a factor of N where N is the number of repositories present in the composite. That alone is grounds for complaints IMO.
(In reply to comment #15) > I agree with this as a good idea in theory, but am concerned because I've not > seen the Eclipse Platform do it this way ... why not? Mainly because it is easier to make changes when they are kept separate. Let's say we discover a serious bug in the latest build and want to remove that single build, but keep the old ones. Or, for the I-build site we want to retain the last N builds so we regularly remove old ones. It's also easier to rollout a new build in a more controlled way (as you say, wait for the entire new repository to mirror and then make the single change in the parent composite). It would be possible to support such scenarios with one single artifact repository, but would require more complex tooling and I suspect would be more error prone. We do make sure that for bundles that remain the same we copy forward the old metadata/artifacts rather than producing new ones (never want a different set of bits with the same id/version).
(In reply to comment #17) > (In reply to comment #15) > ... We do make sure that for bundles that remain the same we copy > forward the old metadata/artifacts rather than producing new ones (never > want a different set of bits with the same id/version). Maybe a side issue, but what does this mean, exactly. I sort of understand "the artifacts" part ... that'd be the jars of code, etc. Right? Seems sort of risky to me. I assume you use a checksum comparator to detect if version says they are the same, but they really are different? I'm just not sure what the generally recommended procedure should be? And as for copying or reusing metadata, I just don't understand what those words mean. Is that literally reusing some subset of elements from a content.xml file? I don't mean to get off topic, but seems relevant to our procedures. Thanks,
(In reply to comment #18) > Maybe a side issue, but what does this mean, exactly. I sort of understand "the > artifacts" part ... that'd be the jars of code, etc. Right? Seems sort of risky > to me. I assume you use a checksum comparator to detect if version says they > are the same, but they really are different? I'm just not sure what the > generally recommended procedure should be? There is a pluggable IArtifactComparator used by the p2 mirroring application. There is a simple checksum comparator (MD5), but for Java code we use a byte code comparator that compares the class files to see if the actual byte codes are the same. A simple checksum isn't sufficient because things like pack200 jar conditioning can alter the actual bits but keep the byte codes the same. > And as for copying or reusing metadata, I just don't understand what those > words mean. Is that literally reusing some subset of elements from a > content.xml file? I misspoke, we only do this check for artifacts, not metadata.
I think we can count this as fixed now. We've been "practicing" with Helios Milestones (learned a few things) and are now prepared to "leave" Galileo SR1 available while we roll out Galileo SR2. As described, I did "move" the current Galileo SR1 content "down" to a timestamp directory, and left composite jars/xml files at the root of http://download.eclipse.org/releases/galileo/ where P2 can find them. I hope "pulling" the specific repo jar, http://download.eclipse.org/releases/galileo/content.jar was not published any where as something to do. Its not, right? I've heard one report of "that URL to content.jar no longer works" and not sure how/where anyone would have thought to use that URL directly.