Community
Participate
Working Groups
There is an inherent "cost" to mirroring the downloads area. It's a balancing act between enough mirrors to save bandwidth, and having too many mirrors without enough downloads to justify them. However, there is a major structure flaw at eclipse.org in respect to downloads, which I believe leads to lots of wasted bandwidth: Most of the planet doesn't care about daily/nightly builds, and this is likely true about all the projects. So for instance, if Platform ran 5 builds x 3 GB = 15GB yesterday (it's milestone week), and we have about 21 "Full" mirrors (out of about 50), then the total output for those builds would be 15GB x 21 = 315 GB (322,560 MB). Assuming 30% of our bandwidth went to those 21 mirrors (optimistic estimate), it would take them close to 40 hours to catch up, or just under 2 days. So full saturation for 2 days to mirror files very very few will download. .. and what about daily/nightly builds from all the other projects... As of this writing, it's Sunday night at 5:32PM, and we're still saturated from M5 (which seems to have started Thursday), with 49 mirrors currently syncing. The workweek will be beginning shortly in Europe, and we're still saturated. Two considerations: a) the internal mirrors (IBM's fullmoons, and other companies) need the complete download.eclipse.org. They are the consumers of these daily/nightly builds. b) Not every project uses the N- R- M- S- and I- naming convention, so that makes exclusions difficult to catch. Proposed solution is to build an exclusion list for daily/nightly builds for the projects, and exclude this list from our default RSYNC stanza. We would also create a separate RSYNC stanza for Internal mirrors (like IBM's fullmoons) which need the entire download set. Thoughts? Concerns?
Just a few thoughts ... I'm not even sure that we IBM, to continue your example, are a big consumer of daily/nightly builds ... sure, there's a few people that downlaod parts of each build, but its probably less that you think. (or, maybe I just don't know about other sites/teams?) ... [also, I believe some apparent "mirrors" in IBM (e.g. fullmoon.ottawa, actually have some content first (e.g. the base eclipse), and then push that up to eclipse.org ... so, maybe that's enough for IBM's use?]. Second, the exclusion filters would have to be more than "R-", "N-", etc., but, if a little more (and perhaps a regex expresseion?) it would work for webtools project. We actually, sometimes use the "R" letter, for example, to create warmup "release builds", put it in directory ~/downloads/webtools/committers/drops/R-xxxxx first, to confirm and test them, and do not expect most of the world to be interested in these. Once a "committers" drop suits us, we copy it to ~/downloads/webtools/downloads/drops/R-xxxx .. and it is that version we announce ("declare") to the world, and would want to be sure are mirrored. The only ones we "delcare" are weekly builds (and milestones and releases). There are, though, other directories such as ~/downloads/webtools/presentations/.... where we put "big" zip files of downloadable presentations that would probably profit (some) from mirroring.
*** Bug 128187 has been marked as a duplicate of this bug. ***
I agree with Denis' proposed solution. We push all platform builds to eclipse.org from fullmoon.ottawa server. If you don't want to push the non-R platform builds to mirrors, that is fine with me because they will still reside on the fullmoon servers. As an aside, on Friday I deleted all but two of the last integration builds. So the mirrors should not be trying to catch up with the all the integration builds from last week, because there are only two remaining. As for the other projects, David there is a requirement to store all builds on fullmoon mirrors.....let me know if you need further info. But we could have a separate script from the non-internal mirrors as Denis suggested. All non-platform builds are pulled from eclipse.org to the fullmoon ottawa server which then acts as the distribution point to other internal mirrors.
Sorry, should have mentioned, that stable builds should still be sent to external mirrors, but I think this is the intent.
So we can agree that Releases (R-) *need* to be mirrored (and when I say mirrored, I exclude internal mirrors), and so do S- builds, which don't happen very often and are still quite popular. Kim, David, can you come up with a non-regexp list of paths (using wildcards) that I could test exclusions with? For instance, /eclipse/downloads/drops/I-* /eclipse/downloads/drops/N-* > All > non-platform builds are pulled from eclipse.org to the fullmoon ottawa server > which then acts as the distribution point to other internal mirrors. Are you sure? I have an entry for RTP and Torolab in my RSYNC config. I'd have to check the log files, but I think they're pulling from us. It would be really really great if Ottawa could pull from us, and all other IBM internal mirrors pulled from Ottawa. D.
Mark verified with the RTP and Toronto mirror admins that they both pull from the Ottawa mirror. Perhaps the Toronto and RTP rsync definitions are just old because originally all the fullmoon servers pulled from ottawa when eclipse.org was located here. Today, only fullmoon.ottawa should pull a full mirror from eclipse.org.
For webtools, the exclusion pattern would be ~/downloads/webtools/committers/* This would exclude all our "warm up" and "continuous" builds from the mirroring system, but, natually, leave all the "declared" builds.
I need to modify download.php for it to respect the exclusion list - otherwise a user that wants to download such files will be offered a full list of mirrors. D.
BTW, I'd like /callisto/staging and /callisto/testUpdates added to whatever "exclusion list" it can be, so that most mirrors will *not* replicate these. These are often temporary "getting ready" directories and somethings may change a great deal over the course of a few days. (the /releases and /interim directories should still be fully mirrored).
Requested directories have been removed from the mirrors. -M
Closing as fixed. Some preliminary exclusions on drops/I* and drops/N* seem to have reduced our bandwidth (and mirror disk space) substantially. Closing as fixed. D.
M6 was just rolled out? Wow, I didn't feel a thing :) Cutting N- and I- builds globally from the RSYNC stanza has, so far, cut our average bandwidth consumption by 6 to 10 Mbps. This was a great improvement, thanks for helping with this D.