Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 407855

Summary: Fragment resolution raises exception when fragment is listed after host
Product: [RT] Virgo Reporter: GianMaria Romanato <gm.romanato>
Component: toolingAssignee: Project Inbox <virgo-inbox>
Status: CLOSED DUPLICATE QA Contact:
Severity: major    
Priority: P3 CC: glyn.normington, mlippert
Version: 3.6.1.RELEASEKeywords: helpwanted
Target Milestone: ---   
Hardware: PC   
OS: All   
Whiteboard:
Attachments:
Description Flags
A trivial bundle host and bundle fragments to demonstrate the problem (Eclipse Projects in Virgo Tools format) none

Description GianMaria Romanato CLA 2013-05-13 05:08:01 EDT
Created attachment 230845 [details]
A trivial bundle host and bundle fragments to demonstrate the problem (Eclipse Projects in Virgo Tools format)

=== Configuration ===
* Virgo Server For Apache Tomcat 3.6.1
* Virgo Tools 1.0.0
* Eclipse 4.2.2

Reproduced on: Ubuntu Linux 12.04 and 13.04 64bit, Windowz 7, Mac OS X. Seems to be OS independant.

=== Symptom ===
If a fragment is added to a server instance in virgo tools and is listed after its host, the quasi resolver will raise a resolution exception, with no reason message.

-----------------------------------------
Started bundle 'SampleHost' version '1.0.0'. 
[2013-05-06 15:10:18.830]  TCP Connection(2)-127.0.0.1 <DE0000I> Installing bundle 'SampleFragment' version '1.0.0'. 
[2013-05-06 15:10:24.230]  TCP Connection(2)-127.0.0.1 <ME0003I> Dump 'serviceability/dump/2013-05-06-15-10-946' generated 
[2013-05-06 15:10:24.238]  TCP Connection(2)-127.0.0.1 <DE0002E> Installation of bundle 'SampleFragment' version '1.0.0' failed. org.eclipse.virgo.kernel.osgi.framework.UnableToSatisfyBundleDependenciesException: Unable to satisfy dependencies of bundle 'SampleFragment' at version '1.0.0': Cannot resolve: SampleFragment

	at org.eclipse.virgo.kernel.install.pipeline.stage.resolve.internal.QuasiResolveStage.process(QuasiResolveStage.java:46)
	at org.eclipse.virgo.kernel.install.pipeline.internal.StandardPipeline.doProcessGraph(StandardPipeline.java:62)
	at org.eclipse.virgo.kernel.install.pipeline.internal.CompensatingPipeline.doProcessGraph(CompensatingPipeline.java:73)
	at org.eclipse.virgo.kernel.install.pipeline.stage.AbstractPipelineStage.process(AbstractPipelineStage.java:41)
	at org.eclipse.virgo.kernel.install.pipeline.internal.StandardPipeline.doProcessGraph(StandardPipeline.java:62)
	at org.eclipse.virgo.kernel.install.pipeline.stage.AbstractPipelineStage.process(AbstractPipelineStage.java:41)
	at org.eclipse.virgo.kernel.deployer.core.internal.PipelinedApplicationDeployer.driveInstallPipeline(PipelinedApplicationDeployer.java:359)
	at org.eclipse.virgo.kernel.deployer.core.internal.PipelinedApplicationDeployer.doInstall(PipelinedApplicationDeployer.java:185)
	at org.eclipse.virgo.kernel.deployer.core.internal.PipelinedApplicationDeployer.install(PipelinedApplicationDeployer.java:140)
	at org.eclipse.virgo.kernel.deployer.core.internal.PipelinedApplicationDeployer.deploy(PipelinedApplicationDeployer.java:253)
	at org.eclipse.virgo.kernel.deployer.management.StandardDeployer.deploy(StandardDeployer.java:52)
-----------------------------------

No exception is raised if the fragment is listed before the host.

This occurs even if the fragment does not require any extra import package besides those required by the host, i.e. it is not a problem related to a fragment trying to "rewrite" the host metadata after the host has been resolved.

In fact, I attached to this bug report two empty OSGi bundles created with the Virgo Tools, one being the host and the other being the fragment, with which I can reproduce the problem. 

If the fragment is listed before the host the server bootstraps properly, but if the fragment is listed after the host I get the above exception.

Interesting enough, if I log into the user region console after the fragment appears as resolved, even when the excpetion has been logged:

-------------------------------------
osgi> ss Sample
"Framework is launched."


id	State       Bundle
151	ACTIVE      SampleHost_1.0.0
	            Fragments=152
152	RESOLVED    SampleFragment_1.0.0
	            Master=151
-------------------------------------

I am filing this bug to the Virgo Tools, because no error appears if the bundles are built, packaged as binaries and deployed in Virgo Server using a plan, regardless of the order in which they are listed in the plan.

Finally, I am giving this issue a major priority because we use fragments for testing purposes and in some cases we need the fragment to be listed after the host. In those situations, the server bootstrap is very slow because of the exception and related dumps. (Application consists of about 50 bundles plus some fragments)
Comment 1 Glyn Normington CLA 2013-05-13 05:13:20 EDT
Thanks for raising this bug.

Flagging as helpwanted because the SpringSource tooling team do not have sufficient resources to work on the Virgo tooling, so others are invited to develop the necessary skills/contribute/become committers.
Comment 2 GianMaria Romanato CLA 2013-05-13 05:25:23 EDT
Hi Glyn,

anyone available to give hints where to look for the issue? 

I spent a couple of hours debugging the code beckward from the exception notification. 

I found a small difference in behavior in class org.eclipse.virgo.kernel.userregion.internal.quasi.DependencyCalculator
limited to statement:
    
     "StateDelta delta = state.resolve(bundles);"

If the fragment is listed before the host, the statebits of the BundleDescription are set to 0x201.

If the fragment is listed after the host the statebits are set to 0x200.

Maybe this is related to the problem, maybe not?
Comment 3 Glyn Normington CLA 2013-05-13 05:55:26 EDT
Hi GianMaria

I wonder if the tooling is deploying each bundle separately? If so, in the failing scenario, the host will be resolved before the fragment is resolved (in a separate resolution step, that is). That could explain the additional state bit - refer to Equinox source code for the meaning of the bit.

However, the fragment *should* be able to attach to an already-resolved host (the fragment-attachment directive of the host's Bundle-SymbolicName header, which is not specified in the attached example, defaults to "always"), unless this OSGi standard behaviour is somehow overridden by Equinox. That wouldn't be the first time that non-standard behaviour was the default, see for example, bug 393848.

To reproduce this behaviour outside the tooling and check the Equinox behaviour, you could drop the host into pickup, wait for it to deploy, and then drop the fragment into pickup and see if it attaches ok.

If you get the same behaviour, then there are some potential areas for improvement: poor diagnostics from Virgo's resolution failure detective, a restriction in the current tooling that you have to list fragments before their hosts, an improvement in the tooling to remove that restricting (by installing all bundles before starting them), and why Equinox (or perhaps Virgo) is not allowing the fragment to attach to a resolved host.

If the behaviour isn't the same, then more debugging will be required to find out why. You might like to check the state bits in the pickup scenario to see if they are the same as the tooling scenario.

Hope that gives you some lines of investigation.

Regards,
Glyn
Comment 4 GianMaria Romanato CLA 2013-05-13 06:35:32 EDT
Hi Glyn,

thank you very much for your reply.

I exported the host and the fragment as JARs and dropped them in the pickup folder, the host first, and the bundle after the host has been resolved, and I got the same problem, but this time with some log. 

----
[2013-05-13 12:19:58.820] fs-watcher                   <DE0000I> Installing bundle 'SampleHost' version '1.0.0'. 
[2013-05-13 12:19:58.840] fs-watcher                   <DE0001I> Installed bundle 'SampleHost' version '1.0.0'. 
[2013-05-13 12:19:58.852] fs-watcher                   <DE0004I> Starting bundle 'SampleHost' version '1.0.0'. 
[2013-05-13 12:19:58.855] start-signalling-3           <DE0005I> Started bundle 'SampleHost' version '1.0.0'. 
[2013-05-13 12:20:02.860] fs-watcher                   <HD0001I> Hot deployer processing 'CREATED' event for file 'SampleFragment.jar'. 
[2013-05-13 12:20:02.923] fs-watcher                   <DE0000I> Installing bundle 'SampleFragment' version '1.0.0'. 
[2013-05-13 12:20:06.026] fs-watcher                   <ME0003I> Dump 'serviceability/dump/2013-05-13-12-20-931' generated 
[2013-05-13 12:20:06.027] fs-watcher                   <DE0002E> Installation of bundle 'SampleFragment' version '1.0.0' failed. org.eclipse.virgo.kernel.osgi.framework.UnableToSatisfyBundleDependenciesException: Unable to satisfy dependencies of bundle 'SampleFragment' at version '1.0.0': Cannot resolve: SampleFragment
    Resolver report:
        The fragment could not be resolved because of a constraint conflict with a host, possibly because the host is already resolved. The affected fragment is org.eclipse.persistence.jpa.equinox_2.4.1.v20121003-ad44345. Resolver error data <Import-Package: org.eclipse.persistence.internal.jpa; version="0.0.0">. Caused by missing constraint in bundle <org.eclipse.persistence.jpa.equinox_2.4.1.v20121003-ad44345>
             constraint: <Import-Package: org.eclipse.persistence.internal.jpa; version="0.0.0">
            Possible hosts:
                org.eclipse.persistence.jpa.osgi_2.4.1.v20121003-ad44345 (resolved)
            Constraint conflict:
                Import-Package: org.eclipse.persistence.internal.jpa; version="0.0.0"
        The fragment could not be resolved because of a constraint conflict with a host, possibly because the host is already resolved. The affected fragment is org.eclipse.persistence.jpa.equinox_2.4.1.v20121003-ad44345. Resolver error data <Import-Package: org.eclipse.persistence.jpa.equinox.weaving; version="0.0.0">. Caused by missing constraint in bundle <org.eclipse.persistence.jpa.equinox_2.4.1.v20121003-ad44345>
             constraint: <Import-Package: org.eclipse.persistence.jpa.equinox.weaving; version="0.0.0">
            Possible hosts:
                org.eclipse.persistence.jpa.osgi_2.4.1.v20121003-ad44345 (resolved)
            Constraint conflict:
                Import-Package: org.eclipse.persistence.jpa.equinox.weaving; version="0.0.0"
        The fragment could not be resolved because of a constraint conflict with a host, possibly because the host is already resolved. The affected fragment is org.eclipse.persistence.jpa.equinox_2.4.1.v20121003-ad44345. Resolver error data <Import-Package: org.eclipse.persistence.jpa.osgi; version="0.0.0">. Caused by missing constraint in bundle <org.eclipse.persistence.jpa.equinox_2.4.1.v20121003-ad44345>
             constraint: <Import-Package: org.eclipse.persistence.jpa.osgi; version="0.0.0">
            Possible hosts:
                org.eclipse.persistence.jpa.osgi_2.4.1.v20121003-ad44345 (resolved)
            Constraint conflict:
                Import-Package: org.eclipse.persistence.jpa.osgi; version="0.0.0"

	at org.eclipse.virgo.kernel.install.pipeline.stage.resolve.internal.QuasiResolveStage.process(QuasiResolveStage.java:46)
	at org.eclipse.virgo.kernel.install.pipeline.internal.StandardPipeline.doProcessGraph(StandardPipeline.java:62)
	at org.eclipse.virgo.kernel.install.pipeline.internal.CompensatingPipeline.doProcessGraph(CompensatingPipeline.java:73)
	at org.eclipse.virgo.kernel.install.pipeline.stage.AbstractPipelineStage.process(AbstractPipelineStage.java:41)
	at org.eclipse.virgo.kernel.install.pipeline.internal.StandardPipeline.doProcessGraph(StandardPipeline.java:62)
	at org.eclipse.virgo.kernel.install.pipeline.stage.AbstractPipelineStage.process(AbstractPipelineStage.java:41)
	at org.eclipse.virgo.kernel.deployer.core.internal.PipelinedApplicationDeployer.driveInstallPipeline(PipelinedApplicationDeployer.java:359)
	at org.eclipse.virgo.kernel.deployer.core.internal.PipelinedApplicationDeployer.doInstall(PipelinedApplicationDeployer.java:185)
	at org.eclipse.virgo.kernel.deployer.core.internal.PipelinedApplicationDeployer.install(PipelinedApplicationDeployer.java:140)
	at org.eclipse.virgo.kernel.deployer.core.internal.PipelinedApplicationDeployer.deploy(PipelinedApplicationDeployer.java:253)
	at org.eclipse.virgo.nano.deployer.hot.HotDeploymentFileSystemListener.deploy(HotDeployerFileSystemListener.java:225)
	at org.eclipse.virgo.nano.deployer.hot.HotDeploymentFileSystemListener.onChange(HotDeployerFileSystemListener.java:79)
	at org.eclipse.virgo.util.io.FileSystemChecker.notifyListeners(FileSystemChecker.java:373)
	at org.eclipse.virgo.util.io.FileSystemChecker.check(FileSystemChecker.java:282)
	at org.eclipse.virgo.nano.deployer.hot.WatchTask.run(WatchTask.java:48)
	at java.lang.Thread.run(Thread.java:662)

[2013-05-13 12:20:06.029] fs-watcher                   <DE0003E> Install failed for bundle 'SampleFragment' version '1.0.0'. 
---

Apparently, the resolver fails to resolve my fragment but raises an error about another fragment in my usr bundle repositories.

I repeated the test with a vanilla installation of Virgo 3.6.1 for Apache Tomcat and the problem is no more. 

So, the fact that some other fragment is failing in the usr repository is preventing this unrelated fragment from resolving when deployed after the host. 

I'll try to figure out why the eclipselink fragment fails. Is it the expected behavior that an unrelated fragment in error prevents my fragment from resolving?

Thanks for your help.

GianMaria.
Comment 5 Glyn Normington CLA 2013-05-13 06:47:36 EDT
(In reply to comment #4)

> Is it the expected behavior that an unrelated fragment in error prevents
> my fragment from resolving?

Perhaps. I took a quick look at DependencyCalculator and I didn't see any logic to try ditching specific fragments which cause resolution failures.

IIRC the rationale for this was that:

(a) it is difficult to know for sure which fragment(s) to discard without attempting multiple resolutions (which could be time-consuming for a complex application), and

(b) although it is technically acceptable to discard a fragment to get the host to resolve, that's probably not what users want most of the time. If there is a fragment in the system, it should be there for a good reason.
Comment 6 GianMaria Romanato CLA 2013-05-13 08:33:36 EDT
Hi Glyn,

if that is the case feel free to reject this bug. I will figure out what's wrong with my Eclipselink bundles.

And thank you again, your suggestions where very helpful to help diagnose the problem.

GianMaria.
Comment 7 Glyn Normington CLA 2013-05-13 08:55:18 EDT
Working as designed.
Comment 8 GianMaria Romanato CLA 2013-05-14 05:19:33 EDT
I think I rejoiced too early.

Yesterday I solved the issue with my offending Eclipselink fragment, and I stopped experiencing resolution errors when the host and the fragment are deployed as binaries via the /pickup folder. In such case it is however mandatory that the host is deployed first, otherwise the fragment won't resolve.

So, it was indeed the lack of resolution of the unrelated Eclipselink fragment that was preventing my fragment from attaching to the host when deployed via /pickup folder, and that made me think that the behavior of the Virgo server and the Virgo tools was the same.

But after fixing the issue with the offending Eclipselink fragment, deployment via Virgo /pickup folder started to work as expected, but the test environment offered by the Virgo tools continues to fail if the fragment is listed after the host. Interesting enough, if I start the server in Virgo tools, without listing host and fragment, and then I add host and fragment in this order at a later time, to the running server, via context menu "Add and Remove..." everything works even in the Virgo tools.

I also checked the state bit as per Glyn suggestion, and I confirm that when the fragment resolves correctly (both via /pickup, or if added to an already running test environment via action "Add and Remove..."), the statebits are set to 0x201. When the fragment does not resolve (that is, when part of the initial test environment configuration and listed after its host), the statebits are set to 0x200.

So, this defect should be reopened. And, if someone is available to give a hint about where to look for the problem, I am available to spend another few hours to try to figure out the cause of the problem.
Comment 9 Glyn Normington CLA 2013-05-14 05:35:30 EDT
(In reply to comment #8)
> So, this defect should be reopened. And, if someone is available to give a
> hint about where to look for the problem, I am available to spend another
> few hours to try to figure out the cause of the problem.

You should be able to re-open the bug.

I suggest finding the meaning of the state bits using Equinox source as a next step.
Comment 10 GianMaria Romanato CLA 2013-05-14 05:41:31 EDT
Reopening because I wrongly suggested to reject this issue but I later realized that the bug is still still present.
Comment 11 GianMaria Romanato CLA 2013-08-13 10:34:35 EDT
I spent another couple of hours trying to understand the source of the problem.
As per Glyn suggestion in comment 9, I tried to figure out why the state bits change from 201 (0xC9, resolved) to 200 (0xC8). 

The RESOLVED bit is switched off in line 134 of DependencyCalculator:

  StateDelta delta = state.resolve(bundles);

In debug mode I noticed that the 'state' object comes from the Equinox project, not from Virgo. So I stepped into the resolve() method in countless lines of code to try to understand what's going on.
It turns out that Equinox finds my fragments and finds also a matching host for it. Before attaching the fragment to the host and declaring the fragment resolved, a check is performed to verify that there is not another version of the same fragment attached to the host. This check takes place in lines 331 to 338 of class ResolverBundle: 

  // need to make sure there is not already another version of this fragment 
  // already attached to this host
  for (Iterator<ResolverBundle> iFragments = fragments.iterator(); iFragments.hasNext();) {
	ResolverBundle existingFragment = iFragments.next();
	String bsn = existingFragment.getName();

	if (bsn != null && bsn.equals(fragment.getName()))
		return;
  } 

Now, the issue is that the above lines of code find the fragment already attached to the host!
To double check what was appenning I blocked in debug the resolution process to make sure that the fragment was not yet resolved and I logged into the OSGi console. Well, using the "ss" command I could indeed verify that an instance of the fragment was already deployed and attached to the host.

So the real source of this problem is that for some strange reason the fragments seem to be installed twice, with the same name and version. The first instance is properly resolved and attached to the host, the second instance results in a dump and an exception.
That's why, as I reported in comment 1, the application works even if a dump is generated and an exception is logged, because the fragment gets attached.

So the problem now changes to understanding why a fragment gets duplicate in the container when it is listed after its host.

Also to add some noise to the problem, further testing by my colleagues reveals that the operating system has some influence on this problem. The error disappears if fragments are listed before the host in Linux, while they must be listed after the host in Windows and MacOS. Still, in my case I need some fragments to be always listed after the host, regardless of the operating system.
Comment 12 GianMaria Romanato CLA 2014-01-23 04:33:55 EST
In my previous comment I described how the fragment resolution failure was due to the fact that the fragment got installed and started twice (same symbolic name, same version, two different numeric bundle id), and  the second instance would result in an annoying exception slowing down the server bootstrap. My application would still work despite the exception, because the first instance of the fragment was correctly resolved and attached to the host.

Now, in the last months I started to experience the problem of duplicate bundles in the Virgo Tools test environment, meaning that certain bundles would be installed and started twice in the test environment. This was verified by logging into the user region console and issuing an "ss" command that printed some bundles twice, with a different id, although the symbolic name and version was the same. This seems very similar to the subject of this bug report.
It was annyoing as I had to log into the user region console to manually uninstall the duplicate bundles on every server start.

I had a bundle A that was being duplicated. Yesterday I spotted a mistake in a bundle B that is importing packages of bundle A. Fixed the MANIFEST.MF of bundle B and from that moment A was no more duplicated.

For backward compatibility, my application contains two versions of the same bundle A, let's call them A1 and A2. 
A1 exports all of its packages with version 1.0.0, while A2 exports all of its packages with version 2.0.0.
Bundle B is meant to depend from A2 and to import all A2 packages specifying version 2.0.0. Unluckily, amongst the imported packages there was a package which was imported without specifying the version.

This resulted in A2 being installed and started twice in the Virgo Tools test environment. Once the import in the MANIFEST.MF of B was corrected to properly specify that version 2.0.0 was required, the duplicate A2 instance disappeared.

This odd behaviour seem to show up only in the Virgo Tools test environment, in the sense that we never had duplicates in an actual Virgo server instance.
Still, my application was working fine even with the duplicate, I just had to uninstall it manually and everything would work. B was always wired to the first instance of A2 and as such it always worked properly. I still had to manually uninstall the duplicate because it was fooling some extenders.

Any idea why such a behaviour may be introduced by a wrong import in an consumer bundle? At least now I have a hint at what has to be looked after if a similar problem shows up again: double check the MANIFEST.MF of projects that are consuming the duplicate bundle.

As a last note, this was reproducible on several workstations: people having a similar server configuration as mine were also experiencing double instances of a bundle, although the duplicated bundle was not always the same!
Comment 13 GianMaria Romanato CLA 2015-05-22 07:22:28 EDT
I believe that the root cause for this issue is the same as #454015 which I also opened.
As such I am now marking it as duplicate of #454015.

*** This bug has been marked as a duplicate of bug 454015 ***