| Summary: | p2 again installs optional dependencies (unwanted) | ||
|---|---|---|---|
| Product: | [Eclipse Project] Equinox | Reporter: | ekkehard gentz <ekke> |
| Component: | p2 | Assignee: | Pascal Rapicault <pascal> |
| Status: | RESOLVED WORKSFORME | QA Contact: | |
| Severity: | blocker | ||
| Priority: | P3 | CC: | benno.baumgartner, christian.campo, elias, florian.pirchner, gregory.amerson, leberre, lothar, pascal, remy.suen, rsternberg, ruediger.herrmann |
| Version: | 3.6 | ||
| Target Milestone: | 3.6 | ||
| Hardware: | All | ||
| OS: | All | ||
| See Also: | https://bugs.eclipse.org/bugs/show_bug.cgi?id=306279 | ||
| Whiteboard: | |||
| Attachments: | |||
|
Description
ekkehard gentz
Created attachment 168098 [details]
import these updatesites - failed in 1 of 5 times for me
Created attachment 168099 [details]
import these updatesites - failed in 3 of 5 times for me
Created attachment 168100 [details]
after deleting rap plugins from IDE - target platform reports error
I've updated the target platform for our headless PDE build to eclipse and riena M7 this week. After that the resulting build includes RAP bundles. This bundles prevent our RCP from starting because of bug #307432. Also the build takes about 10 minutes longer now (the p2 director does something for about 5 minutes, also the p2.mirror step takes several minutes) As a workaround I've tried to removed the RAP bundles from our target platform. But the the director, and hence the build, fails (although this bundles are optional): [stderr] BUILD FAILED [stderr] /var/local/hudson/jobs/MonoRCP/eclipse-target-36M7/eclipse/plugins/org.eclipse.pde.build_3.6.0.v20100428/scripts/productBuild/productBuild.xml:44: The following error occurred while executing this line: [stderr] /var/local/hudson/jobs/MonoRCP/eclipse-target-36M7/eclipse/plugins/org.eclipse.pde.build_3.6.0.v20100428/scripts/build.xml:118: The following error occurred while executing this line: [stderr] /var/local/hudson/jobs/MonoRCP/workspace/rcp/ch.mobility.mono.releng/build-files/customTargets.xml:70: The following error occurred while executing this line: [stderr] /var/local/hudson/jobs/MonoRCP/workspace/rcp/ch.mobility.mono.releng/build-files/allElements.xml:5: The following error occurred while executing this line: [stderr] /var/local/hudson/jobs/MonoRCP/workspace/rcp/ch.mobility.mono.releng/build-files/allElements.xml:25: The following error occurred while executing this line: [stderr] /var/local/hudson/jobs/MonoRCP/eclipse-target-36M7/eclipse/plugins/org.eclipse.pde.build_3.6.0.v20100428/scripts/genericTargets.xml:186: The following error occurred while executing this line: [stderr] /var/local/hudson/jobs/MonoRCP/workspace/pluginbuilder/assemble.org.eclipse.pde.build.container.feature.all.xml:25: The following error occurred while executing this line: [stderr] /var/local/hudson/jobs/MonoRCP/workspace/pluginbuilder/assemble.org.eclipse.pde.build.container.feature.all.xml:15: The following error occurred while executing this line: [stderr] /var/local/hudson/jobs/MonoRCP/workspace/rcp/ch.mobility.mono.releng/build-files/customTargets.xml:56: The following error occurred while executing this line: [stderr] /var/local/hudson/jobs/MonoRCP/workspace/pluginbuilder/assemble.org.eclipse.pde.build.container.feature.p2.xml:38: The following error occurred while executing this line: [stderr] /var/local/hudson/jobs/MonoRCP/workspace/pluginbuilder/assemble.org.eclipse.pde.build.container.feature.p2.xml:514: Messages while mirroring artifact descriptors. [stderr] Artifact not found: canonical: osgi.bundle,org.eclipse.rap.jface,1.3.0.20100504-1139. I've realized, that our RCP works when I just delete the rap bundles in the product builded by the PDE build script. So as a workaround I remove the rap bundles with a bash script from the finalized product after the pde build has finished. The only drawback so far is that the RCP logs hundreds of messages when starting like: !ENTRY org.eclipse.update.configurator 4 0 2010-05-12 12:37:08.535 !MESSAGE Could not install bundle plugins/org.eclipse.update.ui_3.2.300.v2010042 8.jar Bundle "org.eclipse.update.ui" version "3.2.300.v20100428" has already b een installed from: reference:file:plugins/org.eclipse.update.ui_3.2.300.v201004 28.jar Created attachment 168115 [details]
build log
Created attachment 168127 [details]
fix the order of the elements in the objective function and add a penalty to non greedy variables
Here are some insights about the observed behavior:
1) reproducibility of the results
We iterate over the slide in the projector to define the objective function.
Different orders of the elements in the objective function may end up with different optimal behavior.
=> we should sort those elements
2) rap appears sometimes in the solution
it means that for the solver, rap being part or not of the solution does not matter (the objective function has the same value).
I suggest to put a penalty on non greedy variables that will prevent the solver to install different non greedy requirements if the same can be used for everything.
shouldn't the title of the bug be "p2 again installs **non greedy** dependencies (unwanted)"? I feel a need to let you know some thoughts about the progress on this bug and it's predecessors. Despite the "p2 works, go fix your stuff" attitude shown in some of the previous bugs, we still have a massive problem and that shortly before the release candidate for Helios. There were several people that showed that the problem exists in real life use cases, so I can only wonder as to why the p2 team did not use use these use cases to verify that they fixed the problematic behavior? Why is it that only dedicated volunteers like Ekke find these (same) bugs over and over? My kudos here to Ekke who put all this work in to create reproducible scenarios for you to work with. I did not even try M7 (yet), as I am swamped with other (non Eclipse related) work and I new that if I don't read a "yeah, it works" in Ekkes's blogs, it's not even worth wasting my time to try. P.S. Sorry for the rant, but the situation is very frustrating. > Despite the "p2 works, go fix your stuff" attitude shown in some of the > previous bugs, we still have a massive problem and that shortly before the > release candidate for Helios. I'm sorry that this is how you feel, but this comment shows one thing, you don't understand the problem and probably not even the nature of the interaction that happened, nor even how open source work. Sorry for you. To set the record straight, there were problems in both sides p2 and Riena and we all (Daniel, Christian and I) spent non trivial amount of time working on a solution. Fixed got released on p2 sides, metadata got tweaked on the Riena side, and several test cases were added. > There were several people that showed that the problem exists in real life use > cases, so I can only wonder as to why the p2 team did not use use these use > cases to verify that they fixed the problematic behavior? Why is it that only > dedicated volunteers like Ekke find these (same) bugs over and over? My kudos > here to Ekke who put all this work in to create reproducible scenarios for you > to work with. All the test cases we wrote were based on the scenarios that got given to us with detailed steps and try to match as close as possible the reality. Ekke is the one that has the most intricate scenarios and has vested interest in this so it is normal that he tests this. I'm testing all the scenarios that I'm aware of to the best extent that I know. Again you seem to misunderstand how open source work. > Sorry for the rant, but the situation is very frustrating. This is less than optimal. Next time make to breath before doing this. Pascal, I really respect the huge amount of time you, christian and others spent on this. I also couldn't count the days and nights I spent on this ;-) but its Open Source and I want that Helios will rock, so I'm testing much and also try to create reproducable scenarios, which isn't easy in this cases. these bugs around RAP - in - IDE are not the only bugs I'm involved with - last 2 months I think 30% or ore of my time was spent on helping to fix bugs. (have in mind that I'm self-employed, so I have to pay this all by myself, but on the other side I couldn't do my work with the help from the eclipse community, so its my way to give value back) what I'm missing is some kind of monitoring tools where I can see what P2 is doing while fetching content and finding out dependencies. the worse thing for me is: I'm using Riena in IDE since 2 years now without any problems and trouble started directly before EclipseCon as Riena M6 moved into Helios repository and I got rap into IDE. then all the work on different bugs around this happens and 2 weeks ago or so I could report that all works again. Now as Riena M7 was moved into Helios repository again RAP comes into IDE. But this time not every time, only randomly - so it seems to be another reason. All p2.inf files are correct from Riena - I verified each of them. perhaps is has to do from which remote repository P2 fetches artifacts ? I don't know - the only thing I know is, that redView cannot work without using a Designer Editor and after 2 years hard work we want to release 1.0 together with Helios we also want to move to EclipseLabs (if available) and later to Eclipse and there are some other Eclipse projects waiting to use redView. I really hope that this can be fixed - and I imagine that I'm not the only one running in such scenarios, because more and more projects support RAP and so it may happen again. this is the other reason I'm working on this: to avoid frustration of other consumers of Eclipse products after Helios release. I also would like to put redView into Eclipse Market Place and Yoxos wants to provide an installation - but all this couldn't be done while the problem of RAP in IDE exists. thx again to all helping. ekke (In reply to comment #8) > I feel a need to let you know some thoughts about the progress on this bug and > it's predecessors. > ... > > P.S. > Sorry for the rant, but the situation is very frustrating. Hi Lothar I feel with you. I've spend countless days trying to set up and keep our headless builds running. And it's a very, very frustrating work. For now the build sometimes works and sometimes it does not (because of missing dependencies). I guess the problem is that a sat solver is used which produces random, unpredictable, results with unpredictable execution time. It is a complete mystery to me why a sat solver is required to compile a few java classes and zip them into a jar file. But, I also understand P2s position. They are confronted with "it doesn't work, and I don't know why" bug reports (especially from me:-). What else can they do then say: "but it does work!"? I'm convinced that they do there best to fix the issues. The problem for me is that I can not write good bug reports (like ekke does) because I don't understand P2. I've ordered the book "OSGi and Equinox: Creating Highly Modular Java Systems" and hope that this will at least enable me to write better bug reports. Benno (In reply to comment #11) ...> issues. The problem for me is that I can not write good bug reports (like ekke > does) because I don't understand P2. Benno, I'm no P2 expert yet, but hopefully will become one over the next year (one of my todo's: P2 and also learn B3 and build systems) but I'm developing software since 30 years and I know what to do to make a scenario reproducable ;-) > I've ordered the book "OSGi and Equinox: > Creating Highly Modular Java Systems" and hope that this will at least enable > me to write better bug reports. > this is a great idea - the book is more then worth the money ekke (In reply to comment #9) > All the test cases we wrote were based on the scenarios that got given to us > with detailed steps and try to match as close as possible the reality. Ekke is > the one that has the most intricate scenarios and has vested interest in this > so it is normal that he tests this. I'm testing all the scenarios that I'm > aware of to the best extent that I know. Again you seem to misunderstand how > open source work. I, too spent several hours chasing down update problems starting with M6. See bug 306279 and in https://bugs.eclipse.org/bugs/show_bug.cgi?id=306279#c30 I pointed to bug 307432 where I mention that I simply try to install all of Helios except for the runtime part https://bugs.eclipse.org/bugs/show_bug.cgi?id=307432#c5 That is an easy and reproducible way to test that anything a user may want to install can be installed. IMHO this simple tests should be done by the release team for every milestone (and release) before it's made publicly available and if something does not work they can pinpoint it to the components that cause the problem(s) and get them fixed. I am running that install right now and will report how it went. (In reply to comment #13) > (In reply to comment #9) > That is an easy and reproducible way to test that anything a user may want to > install can be installed. IMHO this simple tests should be done by the release > team for every milestone (and release) before it's made publicly available and > if something does not work they can pinpoint it to the components that cause > the problem(s) and get them fixed. > > I am running that install right now and will report how it went. There is an inherent chicken and egg problem here. Helios milestone content follows the SDK content (which includes p2) by some lag time. It doesn't get stabilized and aggregated until afterward. So if there is something in the Helios M7 content that demonstrates a bug in p2, then it has to be addressed in the next milestone pass When you as a user try to install everything in the M7 repo, you are seeing it for the first time at the same time that we are. Sure, we can try it against the previous milestone, but I've been in involved in bugs that aren't observed with the older content and only appear after Helios milestone testing begins. This is the nature of multiple moving targets. I'm sure there are improvements we could make in our testing, but we can only test against data that is there. (In reply to comment #14) > There is an inherent chicken and egg problem here. > Helios milestone content follows the SDK content (which includes p2) by some > lag time. It doesn't get stabilized and aggregated until afterward. So if > there is something in the Helios M7 content that demonstrates a bug in p2, then > it has to be addressed in the next milestone pass Maybe you need a staging repo, that reflects what the next milestone is going to be and test against that, so that when the milestone is produced there's a certain confidence that it will work. > When you as a user try to install everything in the M7 repo, you are seeing it > for the first time at the same time that we are. Sure, we can try it against > the previous milestone, but I've been in involved in bugs that aren't observed > with the older content and only appear after Helios milestone testing begins. > This is the nature of multiple moving targets. I always had the impression that especially the later milestones M5 and up were considered stable enough for regular users to try and report bugs against. So maybe after M5 a cycle with a staging repo for internal testing might be a wise choice going forward. This seems more important than ever, as more and more projects join the release trains (and may cause interdependency problems). > I'm sure there are improvements we could make in our testing, but we can only > test against data that is there. Thanks for the clarification about how testing is done today, Susan (In reply to comment #13) > I am running that install right now and will report how it went. The install finished and seems working OK. I did not have the time to do a thorough test, though. However if it's true, that it only fails 1 out of five times, that proves nothing. (In reply to comment #11) > I feel with you. I've spend countless days trying to set up and keep our > headless builds running. And it's a very, very frustrating work. For now the > build sometimes works and sometimes it does not (because of missing > dependencies). I guess the problem is that a sat solver is used which produces > random, unpredictable, results with unpredictable execution time. It is a > complete mystery to me why a sat solver is required to compile a few java > classes and zip them into a jar file. The sat solver is predictable and highly reliable, provided that it is always fed the same way. The fact that rap appears sometimes in the IDE is a problem because it is hard to reproduce. The SAT solver does not compile anything, it just computes which bundles should be installed to satisfy the dependencies. If you are interested in understanding what's going on here, and why we use a SAT solver, I would suggest you take a look at that paper: http://www.cril.univ-artois.fr/spip/publications/iwoce907-leberre.pdf (In reply to comment #15) I usually stay out of these discussions, but here goes. > Maybe you need a staging repo, that reflects what the next milestone is going > to be and test against that, so that when the milestone is produced there's a > certain confidence that it will work. There is a staging repo. The content changes constantly. I don't see how you can expect something to reflect "what the next milestone is going to be" before the milestone happens. Can you give me a "staging" drop today of the code you will be developing tomorrow? That way I can test my code today against your future code to make sure what you do tomorrow does not expose a new problem. I'm sorry if that sounds harsh, but I find your suggestion to test against "what the next milestone is going to be" rather funny. I don't know if you intend it that way. What I hear is that you are saying that you want the stability of the annual release. You should probably stay with 3.5.2 until 3.6 ships if this is the stability you want. We spend 4-6 weeks locking down the platform, and then another four weeks making sure the release train members are running smoothly. Development of the platform stops for this period with fixes only going in if we determine the train is broken. Progress is slow during this time but we get stability. This is the nature of the releases. Milestones are a completely different thing. This is where folks like ekke jump in and see what's working. Sure, there are problems they don't want to deal with and they have to spend days and days figuring it out. And with so many release train components, we all spend time on things that weren't on our component's agenda. We have to make sure it all works together. > I always had the impression. that especially the later milestones M5 and up were > considered stable enough for regular users to try and report bugs against. So > maybe after M5 a cycle with a staging repo for internal testing might be a wise > choice going forward. This seems more important than ever, as more and more > projects join the release trains (and may cause interdependency problems). A milestone is a milestone. In fact, depending on the nature of the milestone (API freeze, function freeze, etc.) we all know that some milestones are more "sketchy" than others. (google on M5 and M5a or M5eh for history)... Only yearly do we stop development long enough to do full-board integration testing of the train. > Thanks for the clarification about how testing is done today, Susan You are welcome. It's really not the test process I'm trying to clarify, it's the nature of open source and milestones. You as a user are empowered to choose nightly builds, integration builds, milestone builds, or wait for the final release. The expectations you are describing (stable staging, a period of time to find integration problems, going back to fix the platform to deal with a problem found in the train)...that happens once a year at release time. If you are wishing it happened sooner so you could get all the cool stuff that has been developed since 3.5, well...that stuff only gets developed because we only play the release end game once a year. (In reply to comment #18) > (In reply to comment #15) > I usually stay out of these discussions, but here goes. > Milestones are a completely different thing. This is where folks like ekke > jump in and see what's working. Sure, there are problems they don't want to > deal with and they have to spend days and days figuring it out. And with so > many release train components, we all spend time on things that weren't on our > component's agenda. We have to make sure it all works together. > Susan, you hit exactly the point. Normaly I start testing next release trains with M3 to see what's new and if all runs well. starting with M4 or M5 I switch my whole development to the Milestones and if there are bugzillas I reported or try to help, sometimes between two Milestones I'm also using I-Builds or N-Builds to test like this year happens between M6 and M7. this is much work, but this is the way open source works. If I compare 3.6 against 3.5 there are so many new things developed, new projects included into the release train and all will help me in the future to work better and to create better products or help customers. My projects are using and based on many eclipse projects from modeling to runtime and I know I always tap into traps ;-) - but at the end with the help from others (like pascal, christian and others in this case) I know it will work. esp. scenarios like the current are very difficult to test and projects like P2, PDE, Riena, RAP etc are involved and I really appreciate their help to make it all run. ekke ...after all this 'global' discussions about milestones, the eclipse-process etc which sometimes is needed to help others to understand it, we should go back to the issue why there are situations where P2 randomly installs RAP into the IDE while installing new software ;-) If I could help more please let me know. thx ekke a) some mirrors dont carry all bundles from m7 and sometimes p2 sees bundle metadata from m6 b) since the dependencies between the riena bundles are i believe not versioned so bucndle X is dependent on Y but on any version of Y, p2 might sometimes pick the metadata from the wrong version ( m6) does that make sense and is any of this possible ? if X is dependent of Y, without mentioning a specific version of Y, the latest version of Y should be preferably picked-up. The fact that the behavior is not always reproducible is a problem. Are we certain that if several update sites are selected they are always contacted in the same order? Ekke, first of all, I think you should try on the very latest I-build, so that we know the very latest fixes are in play. (For example bug 312169 could cause older category content to be shown, which would not effect what the solver would find, but could affect scenarios if categories were selected in the Helios repo and assumed to point to the latest). (In reply to comment #22) > if X is dependent of Y, without mentioning a specific version of Y, the latest > version of Y should be preferably picked-up. > > The fact that the behavior is not always reproducible is a problem. Are we > certain that if several update sites are selected they are always contacted in > the same order? (In reply to comment #0) > I noticed that in the cases where it fails the step > "calculating requirements and dependencies" while installing new software > took very long time to execute This suggests there is something different in either the request itself or the set of repos contacted. I assume that in your trials [ ] Contact all sites to find requirements is always unchecked? For example, when a repo first loads, the sites it references might be added to the repository manager. The next time that you tried this test, you would be contacting more sites (if the checkbox were enabled). If they in turn had references and were loaded, then.... As for sorting, we currently sort the list of repositories using a comparator that ensures local sites appear first. So, Ekke, in order to stabilize the test case, I suggest: - use the latest I-build (one that has the fix for 312169) so we know when you are selecting categories that you are selecting what you think you are - ensure that [ ] contact all sites is unchecked so that we aren't potentially changing the pool of contacted sites on each request If you again see the problem, and you see that "calculating requirements" is taking a long time, can you watch the wizard progress bar (and the status progress in the workbench window) and see if there's any files being shown downloaded, etc. that may help understand the problem. (In reply to comment #23) > Ekke, first of all, I think you should try on the very latest I-build, ok - I'll do > I assume that in your trials [ ] Contact all sites to > find requirements is always unchecked? No - this IS checked because some features depend on others, which will be found from other update sites so I have to check it if I only test using a small updatesite containing the needed Riena bundles, then it never fails problems happen if I rely on features from other sites like Helios etc. > > So, Ekke, in order to stabilize the test case, I suggest: > - use the latest I-build (one that has the fix for 312169) so we know when you > are selecting categories that you are selecting what you think you are will use the newest one > - ensure that [ ] contact all sites is unchecked so that we aren't potentially > changing the pool of contacted sites on each request > as I told above - I need other sites to be contacted but each of my tests runs in a new fresh installation with fresh workspace, so there's never a bundlepool filled from requests before (this was one of the time-consuming parts that I have to use for each test a fresh installation without side-effects from bundle pool ;-) > If you again see the problem, and you see that "calculating requirements" is > taking a long time, can you watch the wizard progress bar (and the status > progress in the workbench window) and see if there's any files being shown > downloaded, etc. that may help understand the problem. I'll do my best ekke I imported once the slicer of p2 into my workspace to debug a problem and added maybe 4 or 5 system outs that showed the number of bundles that it added to the considered list and what dependencies where checked because of that (including some greedy metadata information). Is it too late to ask to have that added to the slicer so that it spits out a tracefile (say if system property is set). My experience was that it made detecting problems sooo much easier. Watching a flickering progress bar is error prone and the changes to add tracing weren't that complicated and so helpful. @Pascal what do you think ? (In reply to comment #22) > if X is dependent of Y, without mentioning a specific version of Y, the latest > version of Y should be preferably picked-up. > > The fact that the behavior is not always reproducible is a problem. Are we > certain that if several update sites are selected they are always contacted in > the same order? I misspoke in an earlier comment about sorting local first. That is only done for artifacts. In Ekke's case (contact all sites is checked), we get the list of known locations from the repo manager. These locations are stored in a HashTable and the values are retrieved, so there's no specific ordering or sorting implied. These are further manipulated by the ProvisioningContext and stored in a set, with the result passed to a CompoundQueryable. So we aren't doing anything specific to dictate a stable ordering of repositories from request to request. I'll start testing again in 1 hour or so - had to finish some other work before ....will report my experiences soon ekke Susan, I did some more tests: Installed Helios SDK M7 updated to latest I-Build 20100513-0800 each test was made in a new fresh installation and a new fresh workspace, so there's no bundlepool with side-effects I did tests on: OSX-cocoa-64, Windows7-64 and Ubuntu-64 from 10 tests I got: 6 times RAP installed in IDE 4 times it was ok - no RAP I watched the details of progress while installing the software, but couldn't detect something useful ekke How are things looking with RC2 now being staged? I went through the test case again today and I have not been able to reproduce the problem. I would think that the issue was mostly caused by old metadata sitting around in a repo. This shows the limitation of the non-greedy approach. The non-greedy approach is meant for someone to express that they have a dependency on something but it does not need to be brought in. However it is also implied that if the IU satisfying this dependency was being brought in, then everything would just be fine. However in this particular case, this last part of the statement is not true. Bringing RAP in the wrong environment can cause problems. So, even assuming p2 was flawless wrt non-greedy (which it probably is), it it would only take one bad IU / bad dependency to cause RAP to be brought in and break the installation. Up until now, we have been using non-greedy in 3.5 and it has worked (in reality partially worked). However this has never been a perfect solution and it was just palliating for p2's inability to express strict negation, since indeed what we really wanted to express is that SWT and RWT can not be installed together. At this point I'm going to close this issue and just point you to the bug where the introduction of negation in the RAP IUs is being explored: bug #306709. |