| Summary: | [planner] P2 does not pick up higher version of already installed plug-in from dropins | ||
|---|---|---|---|
| Product: | [Eclipse Project] Equinox | Reporter: | Natalia Bartol <natalia.bartol> |
| Component: | p2 | Assignee: | Daniel Le Berre <leberre> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | major | ||
| Priority: | P3 | CC: | aniefer, dj.houghton, john.arthorne, leberre, martin.skorsky, pascal |
| Version: | unspecified | ||
| Target Milestone: | 3.6 M6 | ||
| Hardware: | PC | ||
| OS: | Windows XP | ||
| See Also: | https://bugs.eclipse.org/bugs/show_bug.cgi?id=301446 | ||
| Whiteboard: | |||
| Bug Depends on: | |||
| Bug Blocks: | 303202, 303203 | ||
| Attachments: | |||
|
Description
Natalia Bartol
Created attachment 158853 [details]
NewerVersionOfPluginNotPickedUpTestCase
Test case for simple "Hello world" plug-ins, for 3_4_maintenance branch.
.profile includes <unit id='com.dcns.rsm.hello' version='1.0.0.v20100101'>
In repository are:
<unit id='com.dcns.rsm.hello' version='1.0.0.v20100101'>
<unit id='com.dcns.rsm.hello' version='1.0.1.v20100102'>
<unit id='com.dcns.rsm.hello' version='1.0.2.v20100103'>
Provisioning plan returned for this set of plug-ins is empty - no update will be performed.
Apart from the different bundle names, this looks the same as bug 301446. To be clear it's not as simple as the description in comment #0. The problem happens in a system that already has several thousand optionally installed bundles. (In reply to comment #2) > Apart from the different bundle names, this looks the same as bug 301446. To be > clear it's not as simple as the description in comment #0. The problem happens > in a system that already has several thousand optionally installed bundles. The difference is, that test case for bug 301446 passes for 3_5_maintenance branch classes. This test case does not. And this test case is just with one "hello world" plug-in in three versions, so it is much simpler than in bug 301446. I hope this will be helpful during analysis. Could you also please check the behavior on 3.6 and report. I understand that 3.4 is what you have to maintain, but 3.6 is also what the next generation of products will be shipping on and having these problems fixed before it ships is as important. Maybe this is just coincidence (due to Bug 301893) but behavior seems to differ if bundle is installed via dropins from scratch (bug 301446) or if update should be performed. Created attachment 158873 [details]
3.6 test case
fyi the HelloWorld test case passes when run against the current code from 3.6.
Created attachment 158883 [details] Bug302582.java Test case passes but only because there is one line missing: assertFalse("Nothing in plan to add.", expectedOperands.length == 0); The returned plan is empty for me in 3.6. So generally test fails. I am attaching corrected test case class. During repeating tests with 3.4.2 once I got result: The plan: [R]com.dcns.rsm.hello 1.0.0.v20100101 will be replaced with [R]com.dcns.rsm.hello 1.0.2.v20100103 So test passed. So we have the same situation here as in bug 301446. Plan returned from SAT4J solver is inconsistent. > During repeating tests with 3.4.2 once I got result:
> The plan:
> [R]com.dcns.rsm.hello 1.0.0.v20100101 will be replaced with
> [R]com.dcns.rsm.hello 1.0.2.v20100103
> So test passed.
>
> So we have the same situation here as in bug 301446. Plan returned from SAT4J
> solver is inconsistent.
I've realised that this one test passed only because I removed
<unit id='com.dcns.rsm.hello' version='1.0.0.v20100101'>
from repo.
Maybe this could be a hint ;>
Created attachment 158894 [details] Bug302582.java There was an error in comparing iu.getVersion() and org.osgi.framework.Version - some differences between 3.4 and 3.6 branches. This is why DJ's test case passed - profile change request was empty. I'm attaching test case class once more. Now this works as should. Yep, I realized that when debugging... the change request being generated was empty due to the change from Version to OSGIVersion. I have released the test case to HEAD with modifications. I confirm that the test fail on 3.6. I have the feeling that the test case is going against the behavior that we were asked for in Bug 259537 and that has been released in 3.6M5. (In reply to comment #12) > I have the feeling that the test case is going against the behavior that we > were asked for in Bug 259537 and that has been released in 3.6M5. No, we're not consistent here currently. We have an automated test (BasicTests#testSingleton) that does the following: 1) Add version 1.0 to dropins 2) Reconcile -> v 1.0 is resolved 3) Add version 2.0 to dropins 4) Reconcile -> v 2.0 is resolved So in a very simple test case this is working as expected (in 3.6 M5 and latest HEAD). However in the attached test where there are a large number of other completely unrelated optional bundles, this same sequence fails to find the higher version. Either we're not getting consistent solutions here, or perhaps SAT4J is timing out before it finds the optimal solution. >or perhaps SAT4J is timing out before it finds the optimal solution.
This is most likely the case, as when I tried it on my old mac I had to set the time out on conflicts to 20000 (see Projector#encode)
(In reply to comment #13) > (In reply to comment #12) > > I have the feeling that the test case is going against the behavior that we > > were asked for in Bug 259537 and that has been released in 3.6M5. > > No, we're not consistent here currently. We have an automated test > (BasicTests#testSingleton) that does the following: > > 1) Add version 1.0 to dropins > 2) Reconcile -> v 1.0 is resolved > 3) Add version 2.0 to dropins > 4) Reconcile -> v 2.0 is resolved > > So in a very simple test case this is working as expected (in 3.6 M5 and latest > HEAD). However in the attached test where there are a large number of other > completely unrelated optional bundles, this same sequence fails to find the > higher version. Either we're not getting consistent solutions here, or perhaps > SAT4J is timing out before it finds the optimal solution. BasicTests.testSingleton is currently failing for me in my workspace. Just to be clear, I talked to Andrew and he is running against an old build from the middle of January. The tests in question pass in the latest integration build. (In reply to comment #15) > BasicTests.testSingleton is currently failing for me in my workspace. Sorry, false alarm, I was running an older build. The test is passing for me. (In reply to comment #14) > This is most likely the case, as when I tried it on my old mac I had to set > the time out on conflicts to 20000 (see Projector#encode) Actually this one doesn't appear to be timing out. We tried setting the timeout to 50000 and got the same result, and it didn't take any longer so I think it's not hitting a limit like that. I suspect Daniel is right that this is related to the recent change to make the install more stable, but I don't understand why the simple case doesn't behave like the complex case. John, When you have many packages, chances that the weight of a criteria compensates another criteria is greater. How can I run BasicTests#testSingleton? I got the following error: junit.framework.AssertionFailedError: Test suite not initialized, check log for previous errors. The reconciler tests are different in that they don't test what you have in your workspace, but what is in a zip file. You need to download a platform zip from the downloads page and then set the:
org.eclipse.equinox.p2.reconciler.tests.platform.archive
System property to point to it.
The test suite then unzips the archive and runs the code in it for the test.
I usually use nightly builds since 1) they contain the latest code and 2) all the qualifiers are the same so it is easier to replace bundles with updates.
If you have updates or changes that you want to make and test, you have to replace the bundles in the zip file.
Also if you want to remote debug the reconciliation, take a look at AbstractReconcilerTest#reconcile. You will have to change the command-line to be something like:
String[] command = new String[] {(new File(output, "eclipse/eclipse")).getAbsolutePath(), "--launcher.suppressErrors", "-nosplash", "-application", "org.eclipse.equinox.p2.reconciler.application", "-vm", exe.getAbsolutePath(), "-vmArgs", "-Dosgi.checkConfiguration=true", "-Xdebug", "-Xnoagent", "-Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000"};
If you run the specific tests for bug 302582 and bug 302580 then it runs the code in your workspace and you can debug it right from there without jumping through hoops.
Created attachment 159062 [details]
Objective function taking into account already installed plugins
DJ,
I updated the objective function in this patch to take into account the already installed plugins.
That way, we should not see any difference on examples with few or many already installed plugins.
Pascal, could you take a look at it?
That patch is not meant to solve the current issue, but to have a consistent behavior across all type of profiles.
I confirm that with this patch all the existing tests pass. I'm attaching another patch improving how we count for what was installed. Created attachment 159083 [details]
Improved version of the patch
Feel free to release Daniel. (In reply to comment #22) > I confirm that with this patch all the existing tests pass. I'm attaching > another patch improving how we count for what was installed. I've tested posted patches with 3.6 test case and unfortunately test fails. Natalia, as already mentioned, that patch does not aim at solving your issue, but to have the same behavior with a huge installed codebase or a new system. your test case cannot pass with current objective function in 3.6. To sort this out, we may need to have a special policy for IUs available in dropins. DJ, Pascal, Andrew, should we consider that in contrast with what is available on update sites, bundles available in dropins should always be updated to the latest version? I used to update Eclipse like that in the old days (started with Eclipse 1.0 :)), so I can imagine that it makes sense to have a different policy for update sites and dropins folder. For instance, we know that the plugins are already there, that they have been dropped in on purpose, etc. (In reply to comment #26) > DJ, Pascal, Andrew, should we consider that in contrast with what is available > on update sites, bundles available in dropins should always be updated to the > latest version? Yes, I think this makes sense to have a different policy for dropins. People who use this are not looking for stability, they are always expecting the new version to be found (these are "trendy" rather than "paranoid" users). I tried a quick test on the weekend to hack this by adding the following in Projector#isInstalled: if (iu.getId().equals("com.dcns.rsm.hello")) return false; I.e., always pretend this bundle is not installed. I also removed this entry from the alreadyInstalledIUs field in the Projector#encode method. However even with these changes the test still failed, which suggests it's not the fact that version 1.0.0 is already installed that is causing the problem (not the same failure occurred in 3.6 M4 as well prior to the recent projector change for M5). DJ is working on a simpler test case - one with a profile that fails to upgrade, and one with a profile that succeeds to upgrade - to help isolate the problem further. Created attachment 159219 [details]
user friendly version of the problem we try to solve
Here is a textual version of the problem we try to solve in the 3.6 use case.
One thing that I do not understand is why the 1.0.2 plugin is considered as "already installed" (weight one).
Else the objective function and the constraints look ok.
ok, I know what is the problem (I have no solution yet). The 1.0.0 plugin got twice a negative weight, so it becomes the best option because of that. I need further investigations to understand why it gets that weight twice. So it is neither an encoding (constraints) problem nor a SAT4J (e.g. timeout, bug, etc) problem: it is an optimization function problem. I have released a simple test case (Bug302582c) which contains only a couple of IUs which might help you. The problem is coming from here: (near line 264 in Projector.createOptimizationFunction()) Collection<IRequirement> reqs = metaIu.getRequiredCapabilities(); reqs contains several times the same IU (in bug302582c, contains twice aaa 1.0.1). Pascal, John, DJ, is it normal? Yes, I was debugging and came to the same conclusion. I believe it is in/around SimplePlanner#updatePlannerInfo. Pascal are you familiar with this code? Created attachment 159259 [details]
Fix for this bug
ok, I got a simple fix for that bug.
I guess it should not be a problem to backport it to 3.4.
I let Pascal check it and commit it.
I have verified the tests also pass if we change the gatheredRequirements to a Set. (equivalent change) I'll let you guys decide which is better since you are more familiar with the code. Thanks for looking into this. The fix with a set is nicer. I tried that first, but stopped when I saw it was incompatible with one method signature. I did not pay attention that the modification propagation would have stopped there. I released that fix in head. DJ, I also updated the objective function to take into account already installed IUs to compute the weights (I used Pascal's improved version). Ok, now I'm confused. Has any of this been released...? On the code that went into the build v20100215, I have applied Daniel's patch https://bugs.eclipse.org/bugs/attachment.cgi?id=159259 and the patch https://bugs.eclipse.org/bugs/attachment.cgi?id=159083 and added to the AllTest class a bunch of new tests (Bug302582*) and all the tests passed However in HEAD, there are changes that make both patches *no longer* applicable and also cause the test Bug302582d (formerly Bug300104bis) to fail. So now that we have something that works and the patches look good, can someone please release everything that is necessary so we can move on :) (In reply to comment #37) > Ok, now I'm confused. Has any of this been released...? > On the code that went into the build v20100215, I have applied Daniel's patch > https://bugs.eclipse.org/bugs/attachment.cgi?id=159259 and the patch > https://bugs.eclipse.org/bugs/attachment.cgi?id=159083 and added to the AllTest > class a bunch of new tests (Bug302582*) and all the tests passed > > However in HEAD, there are changes that make both patches *no longer* > applicable and also cause the test Bug302582d (formerly Bug300104bis) to fail. > > So now that we have something that works and the patches look good, can someone > please release everything that is necessary so we can move on :) Yes, I released everything (the change in the objective function and the fix for that bug) (see my previous comments). I am checking Bug300104bis. Pascal, It is the change in SimplePlanner (use of a set instead of a List) that makes the test fail. I cannot dig into it right now. When I revert to lists and add the last patch Daniel created all the tests pass. I have done this change in HEAD. Please review. hum, so it probably means that here, the order matters. The last patch posted by Daniel works great with 3.6 (It is still not released to HEAD, though). I applied this patch to 3.5. Test with simple "Hello world" plugin passes, however test with more plugins (the initial one, case1 for Bug300104bis in cvs) fails with result: The plan: [R]com.dcns.rsm.profile.equipment 1.0.3.v20090513 will be installed (should be 1.2.2.v20100108) [R]com.dcns.rsm.profile.gemo 3.7.2.v20100108 will be installed [R]com.dcns.rsm.profile.system 4.2.2.v20100112 will be installed [R]com.dcns.rsm.rda 5.1.0.v20100106 will be installed Before running test, I applied also Pascal's patch 302582.patch in 3.5, with little changes. Maybe it is not applied correctly - there are differences in parameters passed to encoding between 3.5 and 3.6. Pascal, could you prepare backport of this patch for 3.5 and 3.4 ? Can this be the reason why tests fail? Applying Daniel's patch to 3.4 does not seem to have any effect at all. Test for "Hello world" plugin still fails. Natalia, The only change that is required in 3.6 to fix this bug is my last patch to be applied on SimplePlanner. No changes are required on Projector class. (In reply to comment #42) > Pascal, could you prepare backport of this patch for 3.5 and 3.4 ? DJ and I will work on backporting the fix to 3.5 and 3.4. Officially the Equinox project no longer does any maintenance builds for these older releases so this is purely an IBM support problem. Daniel/Pascal: thank you for all your work on solving this! Daniel, Pascal - Thank you! Patches have been released to HEAD. We believe the test failure for Bug300104bis is covered by the scenario in bug 301446. Thanks to everyone for looking into this. Closing. I don't think we want to completely close that yet, because the set vs list thing is intriguing. Assigning to Daniel for investigation on that. Pascal, the 302582d test data seems to try to install multiple versions of the dcns bundles and *none* are already installed in the profile. Is that the right scenario? I thought that the old versions should be already existing in the profile, that way we would use the "favor what is already installed" code. DJ, If the test fails, then it is a good scenario :) It should pass, just like the others. Let's try to solve it today. I won't go to bed before fixing it. Got it. The problem is the way SAT4J guesses which value to assign to a specific literal when a literal appears several times in the objective function. When using a list, the order is preserve, and it is compatible with the best choice. When using a list, the order is not preserve, and the solver makes the wrong decision, thus takes ages to find the best solution. I fixed that in SAT4J: I merge all the coefficients in the objective function in DependencyHelper. That way, the orders of the IUs does not matter. Created attachment 159494 [details]
updated version of SAT4J core (2.2.0v20100217)
Created attachment 159495 [details]
updated version of SAT4J pb (2.2.0v20100217)
Note that the merging problem should not appear in 3.4 and 3.5. It has been introduced in 3.6, because of the weight of 1 assigned to already installed plugins. Since the installed IU from the dropins folder are considered as optional, they also get a negative weight to force the solver to select them. So for a same IU a, the objective function looks like 1 a + ... + -xxxxxxx a The solver should try to satisfy it (because the weight is negative). This is what happens in SAT4J because the last weight is negative. However, if the objective function appears like -xxxxxxx a + .... + 1 a then the solver will try to falsify a, which is a bad idea. To prevent that, I make sure to sum up the weight before calling the solver. That's the reason of the different behavior between Set and List, and why that problem should not affect the previous releases (no need IMHO to update SAT4J in 3.4 or 3.5). My bad. It might be the cause of the problems in 3.4 and 3.5 as well: the weight of the versions (positive) and the optionality (negative) also need to be summed up. The updated version of SAT4J will fix that issue in 3.5, because it calls directly SAT4J API. It won't work for 3.4, that calls SAT4J as a black box. Just let me know if the updated version of SAT4J fixes the problem in 3.5. Daniel, did you try changing the list to a set to confirm that the test is good. I think DJ is concerned about the validity of the test and I just quickly looked at it and I think I agree. Natalia, could you please go through all the test cases that are in HEAD and related to this bug report (named Bug302582*) and verify that they are testing something relevant. Thx. Yes, I did. If you increase a lot the timeout, the current solver will eventually find the expected solution. (In reply to comment #54) > My bad. > > It might be the cause of the problems in 3.4 and 3.5 as well: the weight of the > versions (positive) and the optionality (negative) also need to be summed up. > > The updated version of SAT4J will fix that issue in 3.5, because it calls > directly SAT4J API. It won't work for 3.4, that calls SAT4J as a black box. > > Just let me know if the updated version of SAT4J fixes the problem in 3.5. Daniel, New version of SAT4J fixes issue in 3.5. It works visibly faster than previous one. (In reply to comment #50) > I fixed that in SAT4J: I merge all the coefficients in the objective function > in DependencyHelper. That way, the orders of the IUs does not matter. If order of the IUs does not matter any more, can bug 301893 be closed? (In reply to comment #55) > Natalia, could you please go through all the test cases that are in HEAD and > related to this bug report (named Bug302582*) and verify that they are testing > something relevant. All test cases are relevant. Maybe test case Bug302582d could be assigned to bug 301446, but I think the problem is caused rather by IUs order and duplicated IUs entries that are already fixed, than timeout. I committed the changes to SimplePlanner using a Set instead of a List. The 302582d test will fail until we upgrade to SAT4J 2.2.0. I believe we're using SAT4J 2.2.0 now in 3.6 M6. Does this mean we are all good here and this can be closed? Yes. Closing. |