Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 358415 - Too many update sites in common repo pollutes p2 UI
Summary: Too many update sites in common repo pollutes p2 UI
Status: RESOLVED FIXED
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: Cross-Project (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: David Williams CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-21 09:44 EDT by David Williams CLA
Modified: 2011-09-26 09:31 EDT (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Williams CLA 2011-09-21 09:44:50 EDT
I'll jump ahead to what I think is part of the solution, before describing the problem: remove all update site references in the common repository. I think this can be done by the b3 aggregator (not requiring changes in any project). 

As I understand it, the current advice is to update site reference should only go in _products_ (such as EPP packages) not features as having in a feature does not do anyone much good. 

So, now, I'll better state the problem, as I see it, and maybe someone knows of better solutions. I'll use Indigo as concrete example, though I'm suggesting changes go only in Juno, no change anything in Indigo. 

When I first install the Java EE Package, I can see 4 repositories:
 
Indigo	http://download.eclipse.org/releases/indigo	Enabled
Mylyn for Eclipse Indigo	http://download.eclipse.org/mylyn/releases/indigo	Enabled
The Eclipse Project Updates	http://download.eclipse.org/eclipse/updates/3.7	Enabled
The Eclipse Web Tools Platform (WTP) software repository	http://download.eclipse.org/webtools/repository/indigo	Enabled

That's all well and good, by design, and they get added (and enabled) by the EPP package. (And, the exact list varies from package to package). 

But, then, if I do anything that accesses a repository, such a click on "add repository" button, or run update, there are _dozens_ of other repositories added, in a disabled state. These come from p2 querying the enabled repositories under the covers, if I understand it right. 

The problem is that a) this is too many to be useful to ordinary end-users, b) many of them are invalid anyway, and, c) I suspect, 99% of anything anyone could get from these additional repo could get gotten more simply from the main .../releases/indigo repository. 

Here's the list of 42 added simply by access the initial enabled repos: 

http://dev.eclipse.org/svnroot/dsdp/org.eclipse.tm.tcf/releases/0.4.0
http://download.eclipse.org/birt/update-site/3.7
http://download.eclipse.org/datatools/updates
http://download.eclipse.org/egit/updates
http://download.eclipse.org/jwt/update-site
http://download.eclipse.org/mat/1.1/update-site/
http://download.eclipse.org/modeling/emf/updates/
http://download.eclipse.org/modeling/emft/eef/updates/0.7.1/
http://download.eclipse.org/modeling/emft/mwe/updates/
http://download.eclipse.org/modeling/emft/updates/
http://download.eclipse.org/modeling/gmf/updates/milestones/
http://download.eclipse.org/modeling/gmf/updates/releases/
http://download.eclipse.org/modeling/m2t/updates/
http://download.eclipse.org/modeling/m2t/updates/releases/
http://download.eclipse.org/modeling/m2t/xpand/updates/
http://download.eclipse.org/modeling/mdt/papyrus/updates/milestones/0.8
http://download.eclipse.org/modeling/mdt/papyrus/updates/releases/
http://download.eclipse.org/modeling/mdt/updates/
http://download.eclipse.org/modeling/tmf/updates/
http://download.eclipse.org/mylyn/releases/3.6
http://download.eclipse.org/objectteams/updates/2.0
http://download.eclipse.org/objectteams/updates/contrib
http://download.eclipse.org/rt/rap/1.3/runtime
http://download.eclipse.org/rt/rap/1.3/tooling
http://download.eclipse.org/sequoyah/updates/2.0/
http://download.eclipse.org/stp/updates/
http://download.eclipse.org/technology/actf/0.9/update-site/
http://download.eclipse.org/technology/emft/updates/
http://download.eclipse.org/technology/epp/updates/udc/
http://download.eclipse.org/technology/subversive/0.7/update-site/
http://download.eclipse.org/tm/updates/3.3
http://download.eclipse.org/tools/cdt/releases/indigo
http://download.eclipse.org/tools/gef/updates/milestones/
http://download.eclipse.org/tools/gef/updates/releases
http://download.eclipse.org/tools/mylyn/update/e3.4/
http://download.eclipse.org/tools/orbit/downloads/drops/R20100519200754/repository
http://download.eclipse.org/tools/orbit/downloads/drops/updateSite
http://download.eclipse.org/tools/ptp/updates/indigo/
http://www.eclipse.org/modeling/emft/?project=search#search
http://www.eclipse.org/modeling/emft/updates/
http://www.eclipse.org/modeling/mdt/?project=papyrus#papyrus
http://www.eclipse.org/modeling/updates
Comment 1 Bouchet Stéphane CLA 2011-09-21 10:03:49 EDT
wow, 

this one is 2 years old and not part of indigo :
http://download.eclipse.org/modeling/emft/eef/updates/0.7.1/

i really don't know where it can be catched by eclipse...
Comment 2 David Williams CLA 2011-09-21 10:43:06 EDT
The total number can get even worse ... if a user enables those 40 update sites, and pressed "check for updates" again, they get even more update sites added. If they enable those additional ones, and "check for updates" again, even more get added ... this "growth" seems to exhaust itself after about 4 rounds of this ... I guess finally runs out of new update sites referenced in features ... and the total list is about 80 sites long. And, the list of "obviously invalid" sites is even longer. 

While one solution would be to "police" everyone's contributed update sites, and make sure they were necessary and valid, I suspect this is beyond our practical capability. 

So, I feel a good first step would be to remove them from the "common repo" as a good first step in cleaning things up? I do not think they are really useful for much, anyway. I'll tentatively plan on investigating this approach for Juno M2, unless someone knows why that would clearly be the wrong thing to do. 

Then, beyond that, I think there still could be a lot of clean up to do ... but, hopefully then it'd be more obvious when "installing project x results in unnecessary or invalid sites being added"  and the community can police what effects them. I don't know about others ... but, I routinely delete a lot of sites from my lists anyway because there's too many to read and no way to tell where they come from. :/ 

Oh, and FYI ... this type of "query the metadata just to retrieve additional update site info might be one of the causes of excessive p2 traffic to eclipse.org ... though, no idea what percentage it would be and if large enough to worry about for that reason.
Comment 3 Greg Watson CLA 2011-09-21 19:49:33 EDT
From the PTP perspective, our repository is not only up-to-date, but it is used constantly by our users to update to the latest version since we bring out releases much more frequently than the normal Eclipse release cycle.

We would actually like to go further and have it enabled by default, but I haven't worked out a way to do that.

For EPP, these update sites are actually not very useful. By design (a bug in our view), Check for Updates only updates things that have been explicitly installed, not all plugins/features. This means that the update site won't be checked for the EPP package. See bug 345503.
Comment 4 David Williams CLA 2011-09-21 20:44:28 EDT
(In reply to comment #3)
> From the PTP perspective, our repository is not only up-to-date, but it is used
> constantly by our users to update to the latest version since we bring out
> releases much more frequently than the normal Eclipse release cycle.
> 

Not that you are saying this, but I should emphasize, update sites in general won't be going away. 

And, I notice your update site is added (and enabled by default) for your PTP EPP package ... as it should be. (You might even want to add a few others, such as cdt, or mylyn, or other pre-reqs?) ... and bug 345503 is fixable :) 

The "lost function" scenario here will be if users install just Eclipse SDK (or platform), then scrolls down to find 

    http://download.eclipse.org/tools/ptp/updates/indigo

in that list of 40, enables it, and installs from there. Under my proposal, your users would have to click the "add" button and paste in the URL, if they do not install an EPP Package. 

Is this "lost function" scenario your concern? To me ... and maybe its just me ... that list of 40 is confusing and too much to wade through, and I'd prefer just to add new ones by the "add" button (if I don't use a "product" such as an EPP package). 

Providing "extra" update sites, used to be a way we encouraged Eclipse users to explore/find other things at Eclipse that were not installed by default. But ... I think that's been "eclipsed" by the market place client (pun intended :).

One scenario that might make sense to continue to have the list-of-40 ... faulty as it is ... is if there are some situations where before installing some adopter "add-on" users have to go in and enable like 6 or 10 other update sites, in order to pull in everything needed by their add on. In that case, probably would be easier to click "enable" 6 times, rather then "add/paste" 6 URLs. But, I do not know of any situations like that? And doubt anyone is counting on that? 
 
I would have preferred to go with providing a moderate list of accurate URLs ... like, one for each TLP ... but ... that'd be even more work to get right! :/
Comment 5 Bouchet Stéphane CLA 2011-09-22 06:05:17 EDT
(In reply to comment #4)
> Is this "lost function" scenario your concern? To me ... and maybe its just me
> ... that list of 40 is confusing and too much to wade through, and I'd prefer
> just to add new ones by the "add" button (if I don't use a "product" such as an
> EPP package). 
> 

+1, I prefer adding myself repository than querying the planet when i install new software and forgetting to uncheck "contact every update site ..."
Comment 6 Greg Watson CLA 2011-09-22 07:36:38 EDT
(In reply to comment #4)
> 
> The "lost function" scenario here will be if users install just Eclipse SDK (or
> platform), then scrolls down to find 
> 
>     http://download.eclipse.org/tools/ptp/updates/indigo
> 
> in that list of 40, enables it, and installs from there. Under my proposal,
> your users would have to click the "add" button and paste in the URL, if they
> do not install an EPP Package. 
> 
> Is this "lost function" scenario your concern? To me ... and maybe its just me
> ... that list of 40 is confusing and too much to wade through, and I'd prefer
> just to add new ones by the "add" button (if I don't use a "product" such as an
> EPP package). 

Yes. Users used to do this, but it requires knowing the URL. Currently all they need to do is find PTP in the list and enable it. I don't disagree that wading through 40 is a pain, but so is having to find the URL, copy it, create a new site, paste the URL, etc.
Comment 7 John Arthorne CLA 2011-09-22 09:58:42 EDT
I completely agree David. The root of the problem is feature discovery and update site URLs in feature.xml files. In the Eclipse TLP we removed these URLs from all of our features. I think it was a design mistake to have this information specified at the feature level. It is the product designer/packager that should be able to specify where updates come from, and what available extras are seen by their end user. I.e., it is should be a consumer decision rather than a producer decision.

In p2 we attempted to correct this design mistake by removing update site references from features entirely, but for backwards compatibility the p2 publisher converts feature update site references into repository-level references. I definitely think the long term direction should be that everyone removes all such URLs from their feature.xml files. Product designers (such as EPP packages) then have the control over what sites are presented, and can avoid this pollution problem.

Since getting all teams to fix their feature.xml files might be too difficult, this could possibly be addressed at aggregation time by stripping *all* repository references out of the release train repository. Maybe the tooling could support such an option to make things easier, but to be honest I don't know exactly what tool chain is used to produce the aggregate release train repo so I don't have a concrete suggestion there.
Comment 8 David Williams CLA 2011-09-23 20:17:04 EDT
I have made the changes to our scripts to no longer copy the feature.xml sites to the common repo for Juno M2. And seems to work just fine (i.e. no NPEs  during aggregation or updates :). 

To document the details in case we need to revert, there is a b3 batch command argument --mirrorReferences (in 'production.properties') that we had been using to cause them to be copied/published to common repo during aggregation. Additionally, we had been excluding "site.xml" files ... not sure there are still any ... but, there was a for a year or two ... --referenceExcludePattern .*/site.xml 

For more details on aggregator settings, 
see http://wiki.eclipse.org/Eclipse_b3/aggregator/manual
Comment 9 John Arthorne CLA 2011-09-26 09:31:51 EDT
As a follow-on to this change, I have opened bug 358887 to suggest that project repository links appear on auto-generated project pages.