Community
Participate
Working Groups
First of, a disclaimer. I'll have to admit that my understanding of how equinox / p2 / osgi is supposed to work is rather poor. As such I'm not totally convinced what I'm reporting here is a bug in equinox / p2, or just a misunderstanding on my part on how it is supposed to work (perhaps I am expecting more from p2 then it is designed / supposed to do?) Here's the situation: A user reported to us that after downloading STS 4.7.0.RELEASE... they installed 'Papyrus for UML' from update site. (Original report here: https://github.com/spring-projects/sts4/issues/499) After that STS no longer starts up. There is a NPE and a series of package-use conflicts litering the error log. My assumption when seeing these kinds of bundle resolution issues would be that it means the set of constraints implied by all the installed bundles are not 'solveable'. However, when we relaunch STS subsequently with a `-clean` option the problem disapears. Subsequent restarts are also fine. This leads me to conclude that the set of constraints must actually have solveable after all. My somewhat idealistic understanding is that *if* the constraints are solveable, equinox / p2 should be able to find a solution. Therefore, the fact that it failed to do so should be considered as bug in equinox / p2. Is my understanding correct? (If not, I'll be more than happy to be corrected :-) If however you agree this sounds like a bug... here is a way to reproduce the issue and hopefully debug and fix it. - download this distribution https://drive.google.com/file/d/1dLpnDBHcDEYpdQKIVELfQ_chHujCIWkT/view?usp=sharing - start it making sure to use a **JDK 8** for launch (problem is magically solved on newer JDK for some reason). => it will crash and a bunch of package-use conflicts will get logged in the error log - Edit the STS.ini file... and add -clean - Relaunch => no more package-use constraints The -clean option can now be removed and subsequent launches are also fine. Note: my distribution is a Linux-based one, but the problem appears to be not dependent on OS. It can also be reproduced by downloading STS 4.7.0 from here: - https://download.springsource.com/release/STS4/4.7.0.RELEASE/dist/e4.16/spring-tool-suite-4-4.7.0.RELEASE-e4.16.0-linux.gtk.x86_64.tar.gz - https://download.springsource.com/release/STS4/4.7.0.RELEASE/dist/e4.16/spring-tool-suite-4-4.7.0.RELEASE-e4.16.0-macosx.cocoa.x86_64.dmg - https://download.springsource.com/release/STS4/4.7.0.RELEASE/dist/e4.16/spring-tool-suite-4-4.7.0.RELEASE-e4.16.0-win32.win32.x86_64.self-extracting.jar Then launch it and install "Papyrus for UML" from 'all available update sites' that come pre-configured with the distribution. Note: some vague resemblance to this bug: https://bugs.eclipse.org/bugs/show_bug.cgi?id=564475 (problem is also fixed by `-clean`, but doesn't seem to involve package-use conflicts).
Just to clarify. The google drive link is a zip of what you get when you install 'Papyrus for UML' into STS 4.7.0 for Linux. It is in a 'still broken' state. I.e. was not yet started with the '-clean' option.
Tom, I don't think this is a p2 issue. Looking at https://github.com/spring-projects/sts4/issues/499 it does look very similar to https://bugs.eclipse.org/bugs/show_bug.cgi?id=564475 in that also the PartRenderingEngine has problems: java.lang.NullPointerException at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine$5.run(PartRenderingEngine.java:1091) Given that -clean fixes the problem and given there are apparently wiring problems being logged, the overall installation is in principle runnable so p2 has not installed an invalid/inappropriate combination of bundles.
Isn't this just a duplicate of bug 564475?
It could be the same cause. But to my thinking the types of errors in each log are quite different. The other bug shows no evidence at all of any bundles being unresolved. I.e. there are no 'package use conflict' type of errors in the logs. So in short: - the other bug to me as about the NPE which may point at something *triggered* by p2 as well, but when all is said and done, it may actually be *not* pointing to *problem* with p2 but rather a problem with the bundles making assumptions about startup order where startup order may not be guaranteed. So this could just be a problem with the bundles themselves rather than the resolver. - this bug is however is about the package-use conflicts and I think it points at buggy incremental caching optimisation in the resolver. It is possible of course they are actually the same bug and solving the p2 resolver issue also addresses the NPE. In this bug however, I'd rather ignore the NPE as somewhat irrellevant to diagnosing the resolver issue and focus instead on the package use conflicts only. If it is easier to lump both problems together, I have no objections if we call this bug a duplicate of the other. (I care more about whether the problems get solved then about how we clasaify them as the same or different :-)
> Given that -clean fixes the problem and given there are apparently wiring problems being logged, the overall installation is in principle runnable so p2 has not installed an invalid/inappropriate combination of bundles. Okay so but there is a difference between 'runnable' when it runs with a bunch of package use conflicts issues. Versus 'runnable' when it runs without any bundles resolution errors being logged. The NPE may be the cause of the crash... but could the NPE be the cause of package-use conflicts? That seems unlikely to me (but I admit I don't really understand how the resolver does its magic, so maybe the NPE somehow does end up disrupting.confusing the resolver). Anyway, the reason for raising this bug is that intuitively I would expect that (A + B) + C = (A + B + C) We have an identical set of constraints. This set of constraints is either solveable without conflicts ... or it is not. The ablity of the resolver to find a good wiring that satisfies all the constraints should not depend on where we put the parenthesis in that expression. So its doesn't make a lot of sense to me that starting with a '-clean' makes conflicts go away, unless the resolver is malfunctioning.
Yes, it's more a question did p2 install something inconsistent and hence this is a p2 bug versus "OSGi remembers (caches) older wirings and when newer wirings are added incrementally, it results a wiring that doesn't work (where a clean forces a full new rewiring that does actually work)". And then of course there is the meta-question, if it is indeed a p2 bug, who will actually pay (spend the time) to fix it?
I guess I'm not familar enough with how it all works together to make a distinction between 'p2 vs osgi vs equinox'. At 'birds eye view' to me, they are all just 'the thing that is supposed to wire all the bundles together correctly'. Which part of this wiring is done by p2 versus osgi versu equinox, or what is really the difference between all of these things is really beyond me. How the various components of 'the resolver' work together isn't (and I think shouldn't really be) my concern. > "OSGi remembers (caches) older wirings and when newer wirings are added incrementally, it results a wiring that doesn't work (where a clean forces a full new rewiring that does actually work)". So I think what you are saying is: "(A + B) + C = wiring that does not work" is the incremental scenario. And (A + B) + C = wiring that doesnt work And "(A + B + C) = wiring that does work" is the `-clean` scenario An so clearly that means what we have is a situation where "(A + B + C) != (A + B) + C". Now the question is, should we consider that a bug in 'the resolver'? (With 'the resolver' I mean all the collective stuff that works together on wiring my bundles, not specifically blaming p2 or osgi or another specific component of the whole system). If we can agree this is indeed a bug then we can try to figure out things like how/where/what might be done to fix. If however we think this actually how it is supposed to work, for whatever reason, then there is nothing more to do, and the bug ticket can be closed as invalid.
Jumping the gun a bit in making the assumption that we agree there is a bug here :-) > OSGi remembers (caches) older wirings and when newer wirings are added incrementally, it results a wiring that doesn't work (where a clean forces a full new rewiring that does actually work)" So... if that is a good description of why things are going wrong here... then why not make it so that installing extra bundles should automatically force a full rewiring on the next restart? I.e. just make it work as if the '-clean' flag is added automatically on the next restart.
(In reply to Kris De Volder from comment #8) > Jumping the gun a bit in making the assumption that we agree there is a bug > here :-) > > > OSGi remembers (caches) older wirings and when newer wirings are added incrementally, it results a wiring that doesn't work (where a clean forces a full new rewiring that does actually work)" > > So... if that is a good description of why things are going wrong here... > then why not make it so that installing extra bundles should automatically > force a full rewiring on the next restart? I.e. just make it work as if the > '-clean' flag is added automatically on the next restart. Yes, I wondered the same thing.
(In reply to Ed Merks from comment #9) > (In reply to Kris De Volder from comment #8) > > Jumping the gun a bit in making the assumption that we agree there is a bug > > here :-) > > > > > OSGi remembers (caches) older wirings and when newer wirings are added incrementally, it results a wiring that doesn't work (where a clean forces a full new rewiring that does actually work)" > > > > So... if that is a good description of why things are going wrong here... > > then why not make it so that installing extra bundles should automatically > > force a full rewiring on the next restart? I.e. just make it work as if the > > '-clean' flag is added automatically on the next restart. > > Yes, I wondered the same thing. It is true that the framework caches resolution from previous runs of Eclipse. This does mean that the following can happen: (A+B)+C!=(A+B+C) This happens because decisions to resolve (A+B) get locked in which may prevent proper decisions from happening to resolve C. The p2 simpleconfigurator could be updated to tell the framework to simply forget about all the resolution wirings for all bundles except for the framework itself and simple configurator when new bundles are installed. I can provide a gerrit review to consider such a solution if we have someone willing to test that out in their scenario.
I'm certainly willing to test it with our scenario. The trouble that I'm not totally sure how I would test it. The problem is easy to reproduce with STS's release, but to see if a fix is effective, I guess we'd need to figure out some way to create a version of STS's build that is pretty much the same, except with the 'fixed' version of 'p2 / equinox'. I don't know if this is possible. On the other hand, if we try to reproduce the problem with a never snapshot build of Eclipse platform, I think you can no longer even run that with JDK 8 I think, and so the problem wouldn't be reproducible anymore even without the fix (since folks have noticed the issue doesn't affect newer JDK). So, yes I'm willing to test this, but... I may need some creative input in figuring out how to do that.
The crux is, if a fix is availabe / installable into Eclipse 4.16, testing shouldn't be a problem, if fix is only available for e 4.17, then I don't know how/if I can test it.
(In reply to Kris De Volder from comment #12) > The crux is, if a fix is availabe / installable into Eclipse 4.16, testing > shouldn't be a problem, if fix is only available for e 4.17, then I don't > know how/if I can test it. At a minimum you could simply replace the simpleconfigurator JAR in your install with the one with the fix. You would have to do that replace from the very start right after you unzipped/installed the base of your test install scenario. If this is reproducible on Mac I could try it out myself.
New Gerrit change created: https://git.eclipse.org/r/c/equinox/rt.equinox.p2/+/166216
I was able to reproduce and I confirmed that if I replace the simpleconfigurator bundle with the proposed fix then it does solve the problem. I noted that in the reproducing scenario (on Mac) the restart after install of 'Papyrus for UML' would have a long delay waiting for the resolver to try and work out a valid solution. I did not time it exactly, but seemed > 30 seconds while the CPU worked overtime. With the proposed fix the restart was much faster even though it is re-resolving all 1000+ bundles instead of trying to do the incremental resolution. There may be some concern about performance of such a solution in some scenarios where the initial set of bundles is very high (2000+ bundles) because it will force a resolution of all bundles each time a new set of bundles is installed. I argue this is only bringing the behavior of an initial start with the full set of bundles to be consistent when we do an incremental install.
>There may be some concern about performance of such a solution in some >scenarios where the initial set of bundles is very high (2000+ bundles) because >it will force a resolution of all bundles each time a new set of bundles is >installed. I would argue that: 1. it is worth it to get consistent behavior rather than behavior that depends on installation order. 2. it is not clear incremental resolve is actually faster. For point 2, indeed as you probably also noticed for the reproduction scenario... the incremental resolve takes much longer and sends the CPU into overdrive (for me my fans start blazing like mad). So it seems that, for whatever reason, the 'difficulty' of the SAT problem being solved under the hood doesn't necessarily scale in a simplistic/intuitive way just with the number of bundles. Maybe the fact that some bundle resolutions are already fixed ahead of time just increases the number of constraints on the SAT problem? And maybe that actually makes it harder and more time consuming to solve?
The suggested changes seem like good ones to me. It does generally seem better to me that wiring not depend on history but rather only the overall set of available bundles.
(In reply to Kris De Volder from comment #16) > So it seems that, for whatever reason, the 'difficulty' of the SAT problem > being solved under the hood doesn't necessarily scale in a > simplistic/intuitive way just with the number of bundles. Maybe the fact > that some bundle resolutions are already fixed ahead of time just increases > the number of constraints on the SAT problem? And maybe that actually makes > it harder and more time consuming to solve? To clarify p2 uses the SAT based resolver for resolution during provisioning operations. This resolution is done before any bundles are installed into the actual framework. AFAIK this resolution step has no incremental behaviors, it always does a full resolution. For this issue here we are talking about the bundle resolution done at runtime after p2 has installed all the bundles the Equinox OSGi framework itself. The Equinox OSGi framework does not use SAT4J in its resolver implementation. The framework always has a current set of wirings for the existing bundles that are installed and then it is trying to resolve (incrementally) the new set of bundles just installed on top of the existing wirings. My proposed fix forces the runtime OSGi framework to throw out the existing wirings for all the previously installed/resolved bundles so they can be re-resolved with all the newly installed bundles together. Side note, some thought has been had to have the framework use a SAT4J resolver for runtime bundle resolution, but there are complications in mapping the package uses directive construct into the necessary boolean expressions needed to use the SAT4J resolver. It may be possible, but it is far from being a straightforward task and nobody has spent enough time to get it to work. This does imply that the p2 resolution step totally ignores all package uses constraints for which the runtime actually enforces.
(In reply to Ed Merks from comment #17) > The suggested changes seem like good ones to me. It does generally seem > better to me that wiring not depend on history but rather only the overall > set of available bundles. I agree here.Tom, please land it in.
(In reply to Alexander Kurtakov from comment #19) > (In reply to Ed Merks from comment #17) > > The suggested changes seem like good ones to me. It does generally seem > > better to me that wiring not depend on history but rather only the overall > > set of available bundles. > > I agree here.Tom, please land it in. I had to modify the solution a bit to make sure the existing behavior still happens if anyone tries to kick p2 to do a dynamic update while the platform is running (instead of at startup like the default). For example by calling the org.eclipse.equinox.internal.provisional.configurator.Configurator.applyConfiguration methods. The UI for p2 no longer gives the option to apply the changes while the platform is running, but the service still exists to do this. Forcing a refresh of all bundles while the platform is running is doomed to cause the system to fail pretty badly. It is more safe to keep this old behavior in that scenario incase anyone is using it to dynamically update a running RCP application.
Gerrit change https://git.eclipse.org/r/c/equinox/rt.equinox.p2/+/166216 was merged to [master]. Commit: http://git.eclipse.org/c/equinox/rt.equinox.p2.git/commit/?id=907ec00a829e2fe03f156089775c70ddac834d6e
*** Bug 564475 has been marked as a duplicate of this bug. ***