| Summary: | Provide ability to adjust default projector timeout value | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] Equinox | Reporter: | DJ Houghton <dj.houghton> | ||||||||
| Component: | p2 | Assignee: | DJ Houghton <dj.houghton> | ||||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||||
| Severity: | normal | ||||||||||
| Priority: | P3 | CC: | krzysztof.daniel, leberre, pascal | ||||||||
| Version: | 3.7 | ||||||||||
| Target Milestone: | 3.7 M6 | ||||||||||
| Hardware: | PC | ||||||||||
| OS: | Mac OS X - Carbon (unsup.) | ||||||||||
| Whiteboard: | |||||||||||
| Bug Depends on: | |||||||||||
| Bug Blocks: | 336968, 363963 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
DJ Houghton
Yes, we can do this. It is not a big deal. It would be nice to collect those use cases, it order to see if we can speed up that resolution process. Same thing than Daniel. I'm all for putting the new support, but we need to also a system that gathers the information so we can more easily recreate the problem. Note that I suspect that this is on dropins style install where a complete product is installed through dropins which means that the search paste is much less restricted than in typical p2 use cases. Yes, this is another case where everything (~3000 plug-ins) are installed via the dropins mechanism so everything is considered optional, etc. I've collected the data (profile registry before and after the execution, content.xml of the IUs we are trying to install) and will try and put together a test case. Created attachment 188812 [details]
patch
Patch. Only sets the timeout to be the user-specified value if it is a positive integer larger than the default. (currently 1000)
Patch released to HEAD. I had a good chat with Pascal (thanks!) and he explained how this value is really used. I'll paste some information here just so we have it recorded for others to read and reference. ------------- Rather than referring to a 1 second timeout, a "1000 timeout" refers to the number of times to retry when there are conflicts. So when the value is 1000, it tries to find an optimal solution and if it doesn't find one by the end of 1000 tries, it returns the best one it has found so far. By increasing the number to 10,000 it means it will try to find a solution 10,000 times at the most. This was done this way so it will produce consistent results across multiple machines. If the timeout value was a real timeout, then the result for the same call on multiple machines would be highly dependent on processor, etc and most likely be different in cases where there are a lot of conflicts. Another subtle aspect of the "timeout on conflict" value is that it is reset each time a better solution is found. So it really means: - found a solution - try up to 1000 times to find a better one - found a better one - count is reset to 0 - try up to 1000 times to find a better one This should explain (because I know you are all asking) why it takes longer than 1 second (or 10 seconds) to present the solution to the installation problem when trying to install new software. Each attempt to find a solution could take an arbitrary amount of time so it is hard to predict how much longer installs will take if you increase this value by too much. In the general non-dropins-install case it shouldn't matter much because everything is installed via the UI or API calls and the dependencies and requirements are considered strict. The way that things are installed through the drop-ins, everything is installed optionally so when we try and compute what needs to be installed, everything (including all previously install 3500 bundles) is considered optional and we try and recalculate the best solution. That is why we hit so many conflicts and why it takes so many tries in order to get the optimal solution. This should not be the case. If it is indeed the case, then it is a bug in SAT4J. There is a notion of grouped calls to the solver in which the timeout should not be reset. It is true that I usually do it on time, not on conflicts. I will check that ASAP. Daniel, do not worry. I did not check the SAT4J code. I was telling DJ about the restart behaviour from memory and it seems that I have mislead him. Apologies to you both. I opened the following bug for SAT4J: http://jira.ow2.org/browse/SAT-5 I noticed that there is a possible issue when the timeout in seconds is reached between two calls to the isSatisfiable() method. I need to investigate further to see if it can happen also with conflict based timeout. Created attachment 190468 [details]
New version of sat4j core with a fix for the timeout during optimization
Created attachment 190469 [details]
New version of sat4j pb with a fix for the timeout during optimization
DJ, could you give a try at your test cases with those new jars for sat4j?
Their version number is 2.3.0.v20110305.
It should fix the issues you met when changing the value of the timeout.
Unfortunately I don't still have access to the machine which exhibited the problem, but I do have a copy of the profile, repo, etc that I was trying to put together to get a reproducible stand-alone test case. I got access to the test machine and tested the new JARs and they worked 7 out of 8 times. During the 6th invocation, lower versions of the bundles were installed. I was just using all default values and not passing in any special System properties, etc. Also, I've released a new (currently disabled) test to the p2.tests called Bug301446. It has a copy of the profile along with the content.xml from the metadata repository of the dropins. I cannot get the test to fail consistently yet but wanted to capture the data so we have it on-hand. Thanks DJ! It is strange that the behavior is not exactly the same each time, with a conflict based timeout. We must feed slightly differently the solver each time (i.e. the order of the IUs must change). Yes, I'm not sure the input is the same every time. We are relying on the reconciler to discover what is needed to be installed. And we are running "eclipse -clean" each time so if something was installed the first time, then it wouldn't be included in the "potential IUs to install" the second time. That is, it wasn't a clean run each time, it was based on the previous results. Also note that the test is being run on a VMWare image so there are some more constraints there. |