| Summary: | Emphasize p2.index should be included in p2 repository sites | ||
|---|---|---|---|
| Product: | Community | Reporter: | Pascal Rapicault <pascal> |
| Component: | Cross-Project | Assignee: | David Williams <david_williams> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | CC: | alex.blewitt, david_williams, denis.roy, d_a_carver, irbull, john.arthorne, nicolas.bros, sbouchet, stepper |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Mac OS X - Carbon (unsup.) | ||
| Whiteboard: | |||
|
Description
Pascal Rapicault
Why just Indigo? Why not Helios? Why not all? And, would it kill 'ya to attach the files you want put there? :) I say that (with good humor) mostly since the documentation provided at http://wiki.eclipse.org/Equinox/p2/p2_index is sparse, at best. For example, the section on http://wiki.eclipse.org/Equinox/p2/p2_index#Jar_vs_XML_extension is completely empty. And, while the description talks about "order to search", there is not one example of ordered files. Seems one would always want to provide a complete list? Just vary the order? And, since it does document clearly "if its wrong it will break the repository", I'd prefer to get your concrete attachments. But, sincerely appreciate the p2 team's help improving performance! I found this comment in bug 310441#c2 interesting ... <quote> The p2.index file is an optional control file that is used in rare cases (where you have multiple different repositories at the same location and you want to instruct p2 which repo to load). </quote> Sounds like the purpose of this file is evolving :) (In reply to comment #2) > I found this comment in bug 310441#c2 interesting ... > > <quote> > The p2.index file is an optional control file that is used in rare cases (where > you have multiple different repositories at the same location and you want to > instruct p2 which repo to load). > </quote> > > Sounds like the purpose of this file is evolving :) No, nothing is evolving, this is still optional. The index file will tell p2 which 'type' of repository to load if there is more than one repository at a given location. If there is only one 'type' of repository at a given location this file is not needed as p2 will 'figure it out'. However, the file can 'help' p2 'figure it out' faster in this case. I'm not sure it would actually speed up much in this case. The release train already uses the highest priority file format. In fact the only extra round trips are due to searching for the p2.index file :( (In reply to comment #4) > I'm not sure it would actually speed up much in this case. The release train > already uses the highest priority file format. In fact the only extra round > trips are due to searching for the p2.index file :( Yes, in this case you're probably right. IIRC compositeContent.jar is third on our search order (behind content.jar and content.xml). With a p2.index we would find it on the 4th request (p2.index, content.jar, content.xml, compositeContent.jar). We could move that up to number 2. If the child repos are content.jar, then we would find them with two requests. However, if the child repos are again more composite repos, then the p2.index file may make sense. (In reply to comment #5) > Yes, in this case you're probably right. IIRC compositeContent.jar is third on > our search order (behind content.jar and content.xml). With a p2.index we > would find it on the 4th request (p2.index, content.jar, content.xml, > compositeContent.jar). We could move that up to number 2. If the child repos > are content.jar, then we would find them with two requests. I wasn't seeing any hits on content.jar and content.xml when I traced it, but that's because we remember the "last successful suffix" and try that one first. So after the first search, we hit the right file every time (except for the extra lookkup of the p2.index file itself). Maybe we could also remember when a repository doesn't have an index file... I'll open a separate bug for that. So short answer: Adding a p2.index file will improve performance on the first repository lookup only, but still well worth doing. The problem is more visible in the case of builds as described in bug #337022. This is a follow up of bug #347403. It can also be done to Helios and other repositories. I just wanted to not scare everybody off. And as for the scarcity of the page, I actually just created it to open this bug :) I will improve it.
>
> So short answer: Adding a p2.index file will improve performance on the first
> repository lookup only, but still well worth doing.
What? For Indigo? We are past RC2 now, right? Time for blocking or critical bugs only, IMHO. I don't have the impression this is blocking or critical, but instead a little tweak. I don't mind trying things like this at the start of Juno ... maybe "back-porting" to Indigo and Helios maintenance but, IMHO this idea needs a little more time to flush out. I'm not exactly opposed to the concept (though, conceptually does seem a bit of a work-around hack) ... but I am reluctant to change anything (especially "central" things) this late in the release. (Unless blocking or critical regression and obviously safe, of course).
(In reply to comment #9) > ... And as for the scarcity of the page, I actually just > created it to open this bug :) I will improve it. I appreciate your honesty. :) *** Bug 350030 has been marked as a duplicate of this bug. *** From bug 350030, here's what the contents of the p2.index file should be for the http://download.eclipse.org/releases/indigo/ version=1 metadata.repository.factory.order=compositeContent.jar,! artifact.repository.factory.order=compositeArtifacts.jar,! I have used this file in several P2 sites in the past without any problems, and will reduce the amount of 404s seen by the server when (a) requesting this file, and (b) in not hitting the content.jar with a HEAD subsequently, at least for first time connections. (NB there's a bug in the wiki page - artifact.repository.factory.order= compositeContent.xml, ! should be compositeAritfacts.xml) *** Bug 307075 has been marked as a duplicate of this bug. *** I think the most likely approach now is to start doing this in Juno immediately, and possibly add to Indigo SR1 if all goes well. I just want to say I have not lost track of this bug :) and still willing to see how to implement for Juno, since would appear to reduce a few empty trips to eclipse.org. But, I've got to ask ... a) "hand editing" (or, placement) of files is usually a bad idea. If p2 "wants" these files, why don't p2 publishers publish them? (I realize they were original seen as "rare" to handle special cases, but seems they are not advocated as "normal" to reduce a few file guesses. and b) why does p2 use so many files to begin with? content.xml vs. compositeContent.xml? What the heck? Why doesn't p2 just use one file, and ... like xml was designed to do ... describe its own content with tags? (I'm asking, seriously, if it is worth opening a p2 enhancement for that? Am I missing something obvious?) ... are not advocated as "normal" ==> ... are _now_ advocated as "normal" (In reply to comment #13) > From bug 350030, here's what the contents of the p2.index file should be for > the http://download.eclipse.org/releases/indigo/ > > version=1 > metadata.repository.factory.order=compositeContent.jar,! > artifact.repository.factory.order=compositeArtifacts.jar,! > > I have used this file in several P2 sites in the past without any problems, and > will reduce the amount of 404s seen by the server when (a) requesting this > file, and (b) in not hitting the content.jar with a HEAD subsequently, at least > for first time connections. Just to not leave a misleading post here, I think this information is in error, as far as I can tell. for .../releases/indigo I think it would be version=1 metadata.repository.factory.order=compositeContent.xml,\! artifact.repository.factory.order=compositeArtifacts.xml,\! I do not think there is a "jar" factory at all. I tried some local tests, and using "jar" seemed to fail, so I don't think that's correct at all ... if it worked for you, you may have already had some information cached about what worked before. A good test has to start completely fresh. Or ... else I'm seeing something wrong ... but, pretty sure 'xml' is the correct value to use and have tried to clarify this in http://wiki.eclipse.org/Equinox/p2/p2_index another question on the "docs" and effects (time) of 404 errors. The doc at http://wiki.eclipse.org/Equinox/p2/p2_index says " Given that a composite repository is just a repository that refers to other repositories, the full benefit of p2.index can only be achieved if every child repo has the file. " But, I know of at least one case in our common repo where I don't think this is true ... we have a bit of a different structure, so that (some of) the "artifacts" are in their own directory, with an artifacts.jar file. (no content, no composites). I think when p2 goes to look there for artifacts, is looks for p2.index, gets 404, and then tries artifacts.jar file. So, I could imagine it might take a little longer for p2 to find the p2.index file, read it, find out it should look for artifacts.jar file, and then read the artifacts.jar file. Is there some "hidden cost" of 404 errors? If not, this would be one case, where a p2.index file doesn't really do any good. (Probably one of the few cases). Clarifications welcome. (In reply to comment #19) > (In reply to comment #13) > > for .../releases/indigo I think it would be > > version=1 > metadata.repository.factory.order=compositeContent.xml,\! > artifact.repository.factory.order=compositeArtifacts.xml,\! > For completeness, I'll document here that the p2.index files for the three .../releases/indigo/<datetimedirecotry/ would be version=1 metadata.repository.factory.order=content.xml,\! artifact.repository.factory.order=compositeArtifacts.xml,\! So, after studying this a while (and trying some modest local tests) I'm fine adding these 4 files to the main 4 directories in common repo. Heck, I'd even do it for Helios, given the number of hits still going on there! Any objections? If this happens to change the date/time stamps so mirrors appear invalidated, and back down to zero, I know how to 'touch' them gently to reset to the original time. Comments welcome. P.S. After going to all this trouble ... I hope someone has some profound before/after measurements :) [I know, just kidding, I think we've already established "wouldn't be profound" ... but, if anyone can, please do the performance tests so maybe would motivate others? In addition to all the metadata JAR files and their potential XML equivalents, now we have p2.index? Who comes up with these ideas? :)
> I'm not sure it would actually speed up much in this case. The release train
> already uses the highest priority file format. In fact the only extra round
> trips are due to searching for the p2.index file :(
From yesterday's logfile for download.eclipse.org... BTW these are all "404 Not Found" :
hits file
264757 /releases/indigo/p2.index
237864 /releases/indigo/201202240900/p2.index
221080 /technology/epp/packages/indigo/SR2/p2.index
220684 /releases/indigo/201109230900/p2.index
220180 /technology/epp/packages/indigo/SR1/p2.index
219370 /technology/epp/packages/indigo/R/p2.index
215248 /releases/indigo/201106220900/p2.index
185022 /eclipse/updates/3.7/R-3.7-201106131736/p2.index
182851 /eclipse/updates/3.7/R-3.7.1-201109091335/p2.index
180052 /eclipse/updates/3.7/R-3.7.2-201202080800/p2.index
179895 /eclipse/updates/3.7/p2.index
129375 /mylyn/drops/3.6.5/v20120215-0100/p2.index
110441 /releases/helios/p2.index
104294 /eclipse/updates/3.7/categories/p2.index
102986 /releases/helios/201006230900/p2.index
102398 /technology/epp/packages/helios/p2.index
101479 /releases/helios/201009240900/p2.index
101315 /technology/epp/packages/helios/R/p2.index
101074 /technology/epp/packages/helios/SR1/p2.index
100671 /releases/helios/201102250900/p2.index
... and the list goes on, to total 5,407,368 404's looking for various p2.index files. For yesterday.
(In reply to comment #21) > P.S. After going to all this trouble ... I hope someone has some profound > before/after measurements :) [I know, just kidding, I think we've already > established "wouldn't be profound" ... but, if anyone can, please do the > performance tests so maybe would motivate others? A standard "404" error for a non-existent p2.index file is at least 43 bytes of totally useless data, so providing 119 bytes of useful (to p2) data is already a step in the right direction. In terms of the release train repos, does putting a p2.index file up save us from some of this? "GET /eclipse/updates/4.2/p2.index HTTP/1.1" 404 13 "-" "Jakarta Commons-HttpClient/3.1" "HEAD /eclipse/updates/4.2/content.jar HTTP/1.1" 200 - "-" "Jakarta Commons-HttpClient/3.1" "GET /releases/juno/p2.index HTTP/1.1" 404 13 "-" "Jakarta Commons-HttpClient/3.1" "HEAD /releases/juno/compositeContent.jar HTTP/1.1" 200 - "-" "Jakarta Commons-HttpClient/3.1" "GET /releases/juno/201202030900/p2.index HTTP/1.1" 404 13 "-" "Jakarta Commons-HttpClient/3.1" "HEAD /releases/juno/201202030900/content.jar HTTP/1.1" 200 - "-" "Jakarta Commons-HttpClient/3.1" "GET /releases/juno/201112160900/p2.index HTTP/1.1" 404 13 "-" "Jakarta Commons-HttpClient/3.1" "HEAD /releases/juno/201112160900/content.jar HTTP/1.1" 200 - "-" "Jakarta Commons-HttpClient/3.1" "GET /technology/epp/packages/juno/p2.index HTTP/1.1" 404 13 "-" "Jakarta Commons-HttpClient/3.1" "HEAD /technology/epp/packages/juno/compositeContent.jar HTTP/1.1" 200 - "-" "Jakarta Commons-HttpClient/3.1" "GET /technology/epp/packages/juno/M5.180/p2.index HTTP/1.1" 404 13 "-" "Jakarta Commons-HttpClient/3.1" "HEAD /technology/epp/packages/juno/M5.180/content.jar HTTP/1.1" 200 - "-" "Jakarta Commons-HttpClient/3.1" "GET /technology/epp/packages/juno/M4.174/p2.index HTTP/1.1" 404 13 "-" "Jakarta Commons-HttpClient/3.1" "HEAD /technology/epp/packages/juno/M4.174/content.jar HTTP/1.1" 200 - "-" "Jakarta Commons-HttpClient/3.1" "GET /technology/epp/packages/juno/M3.116/p2.index HTTP/1.1" 404 13 "-" "Jakarta Commons-HttpClient/3.1" "HEAD /technology/epp/packages/juno/M3.116/content.jar HTTP/1.1" 200 - "-" "Jakarta Commons-HttpClient/3.1" "GET /technology/epp/packages/juno/M2.53/p2.index HTTP/1.1" 404 13 "-" "Jakarta Commons-HttpClient/3.1" "HEAD /technology/epp/packages/juno/M2.53/content.jar HTTP/1.1" 200 - "-" "Jakarta Commons-HttpClient/3.1" "GET /technology/epp/packages/juno/M1.9/p2.index HTTP/1.1" 404 13 "-" "Jakarta Commons-HttpClient/3.1" "HEAD /technology/epp/packages/juno/M1.9/content.jar HTTP/1.1" 200 - "-" "Jakarta Commons-HttpClient/3.1" "GET /e4/updates/0.12/p2.index HTTP/1.1" 404 13 "-" "Jakarta Commons-HttpClient/3.1" "HEAD /e4/updates/0.12/content.jar HTTP/1.1" 200 - "-" "Jakarta Commons-HttpClient/3.1" BTW -- for how long does p2 cache these results? I ran a Check for Software updates 3 hours ago, and again just now, and it returned to the server for all the goodness above. > BTW -- for how long does p2 cache these results? I ran a Check for Software
> updates 3 hours ago, and again just now, and it returned to the server for all
> the goodness above.
The expected behaviour is for p2 to cache the various content.jar/xml as well as artifacts.jar, and to only check for the file timestamp. The p2.index file is not cached.
Does that match what you see?
> Does that match what you see?
Yes, thanks
David, I think we should definitely go ahead with the addition of the p2.index. The "profound measurement" will be obtained by looking at the http logs. In terms of additional improvements, I think that exposing one content.xml over multiple would also make for a big improvement since the user would not download a lot of duplicated metadata when contacting repos like indigo. But this is the topic of another bug if you think it worth it (I think it does) :). I have, just now, 2:30 Eastern, 3/1/2012, put p2.index files in following directories. This of course reduces the 404s for those files, and might reduce 404's for "content.xml" from the main (top level) sites (for first time installers, since no longer will look for them at all on that URL in that first-time update (pre-cached) case ... but doubt that number is too large anyway). I'll change title and leave open a bit to change focus to "everyone, especially with composite repos, should use p2.index". = = = = = .../releases/helios/ .../releases/helios/201006230900/ .../releases/helios/201009240900/ .../releases/helios/201102250900/ .../releases/indigo/ .../releases/indigo/201106220900/ .../releases/indigo/201109230900/ .../releases/indigo/201202240900/ .../releases/juno .../releases/juno201112160900/ .../releases/juno201202030900/ (In reply to comment #26) > David, I think we should definitely go ahead with the addition of the p2.index. > The "profound measurement" will be obtained by looking at the http logs. Well, getting 404s was designed into p2 on purpose, apparently, as part of their "look for all these files" logic ... instead of having one and only one file that contained everything needed (as the magic of XML should easily allow) ... so I think "number of 404s" doesn't mean much ... I was hoping for more quantitative "round trip" numbers as given in bug 347403. (But, no big deal, mostly my way of saying I've spent 8 hours on this! and think those that made the design decisions could have done more to better document what's needed ... such as so far, no one from p2 team has commented on comment 13 which I'm pretty sure is wrong, and if not wrong, it'd be nice to have someone explain). [And, I'm just grossing about the lost opportunity of using one XML file with self-described-data ... don't mean to flame anyone ... I'm sure the decisions seemed right at the time.] > In terms of additional improvements, I think that exposing one content.xml over > multiple would also make for a big improvement since the user would not > download a lot of duplicated metadata when contacting repos like indigo. But > this is the topic of another bug if you think it worth it (I think it does) :). We, at common repo, do only have one ... one per release (SR0, SR1, SR2). If you mean have one, period, that's changed for SR1 and SR2 ... then ... volunteers welcomed! :) Changing to help "finish off" this bug as being about documenting and encouraging use by everyone with a p2 repo. If any of you on Denis' "hot list" have questions or "special repo shapes" that are not covered by the instructions in http://wiki.eclipse.org/Equinox/p2/p2_index Then please ask, and will try to help figure out and document special cases. (In reply to comment #6) > (In reply to comment #5) > So after the first search, we hit the right file every time (except for the > extra lookkup of the p2.index file itself). Maybe we could also remember when a > repository doesn't have an index file... I'll open a separate bug for that. To cross reference, there are two related bug open in p2-land ... bug 310546 2010-04-26 Add caching support to the p2.index file reader bug 302909 2010-02-15 [publisher] Generate the p2.index I've added a blub to the IT Infrastructure document at http://wiki.eclipse.org/IT_Infrastructure_Doc#Include_a_p2.index_file_at_p2_repository_site.3F and a little "clarification" clause in Sim. Rel. Document, just to help spread the word. I'll send note to cross-project list about this and the p2.mirrorsURL "requirement". And, I don't say thanks enough ... I know, I must always come across as complaining :) ... but in this case I do want to say a special thanks to Thomas Hallgren. He (or team) added the function to publish these p2.index files in the b3 aggregator -- and they are even commented! -- and without that, I would not have been able to figure out and feel confident about what was needed here. Thanks Thomas. I've updated all the docs I can think of, and sent note to cross-project list, so will close this particular bug as "fixed". But, if anyone sees any especially bad problem cases, I suggest bugs be opened for those specific projects. If anyone ever encounters oddities or has "how to" questions, feel free to ask on cross-project list. (or, p2-dev, if seems real p2 specific ... but cross-project if appears to be something that would effect several projects). |