Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 348873

Summary: Stopping/Starting DS component remoted with R-OSGi fails to restart
Product: [RT] ECF Reporter: Alex Blewitt <alex.blewitt>
Component: ecf.remoteservicesAssignee: ecf.core-inbox <ecf.core-inbox>
Status: RESOLVED INVALID QA Contact:
Severity: normal    
Priority: P3 CC: bugs.eclipse.org, slewis
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: All   
Whiteboard:

Description Alex Blewitt CLA 2011-06-09 07:51:30 EDT
I'm testing a declarative services setup with r-osgi.

I have a service registered under DS in one VM, and another DS client in a different VM.

I see the warnings "WARNING: Port 9278 already in use. This instance of R-OSGi is running on port 9280" in both client VMs (the client is 9279).

When I start both of these, I see that the component is registered. All is good.

If I stop the DS component service, it stops but I see this error:

osgi> dis 1
Sent request for disabling component foo

osgi> [log;+0100 2011.06.09 12:41:11:953;ERROR;org.eclipse.ecf.osgi.services.distribution;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.distribution;code=4;message=org.eclipse.ecf.internal.osgi.services.distribution.DiscoveredServiceTrackerImpl:handleDiscoveredServiceAvailable:getRemoteServiceReferences result is empty. containerHelper=RemoteServiceContainer [containerID=StringID[AqtyFGtobI6szak3Cydmsw49HBg=], container=org.eclipse.ecf.provider.generic.TCPClientSOContainer@e964fe, containerAdapter=org.eclipse.ecf.provider.remoteservice.generic.RegistrySharedObject@ba8180]remoteReferences=null;severity4;exception=null;children=[]]]

The client (as expected) becomes unsatisfied, because the remote service has gone away.

However, problems occur when I re-enable the service:

osgi> en 1

[log;+0100 2011.06.09 12:41:13:609;INFO;org.eclipse.ecf.osgi.services.distribution;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.distribution;code=0;message=Exception creating container from ContainerTypeDescription=ContainerTypeDescription[name=ecf.generic.server;instantiator=org.eclipse.ecf.provider.generic.GenericContainerInstantiator@9db992;desc=ECF Generic Server;;severity4;exception=org.eclipse.ecf.core.ContainerCreateException: createInstance;children=[]]]
org.eclipse.ecf.core.ContainerCreateException: createInstance
	at org.eclipse.ecf.provider.generic.GenericContainerInstantiator.createInstance(GenericContainerInstantiator.java:158)
	at org.eclipse.ecf.core.ContainerFactory.createContainer(ContainerFactory.java:288)
	at org.eclipse.ecf.core.ContainerFactory.createContainer(ContainerFactory.java:246)
	at org.eclipse.ecf.osgi.services.distribution.AbstractContainerFinder.createContainer(AbstractContainerFinder.java:165)
	at org.eclipse.ecf.osgi.services.distribution.AbstractHostContainerFinder.createRSContainer(AbstractHostContainerFinder.java:295)
	at org.eclipse.ecf.osgi.services.distribution.AbstractHostContainerFinder.createDefaultRSContainers(AbstractHostContainerFinder.java:235)
	at org.eclipse.ecf.osgi.services.distribution.AbstractHostContainerFinder.createAndConfigureHostContainers(AbstractHostContainerFinder.java:205)
	at org.eclipse.ecf.osgi.services.distribution.DefaultHostContainerFinder.findHostContainers(DefaultHostContainerFinder.java:47)
	at org.eclipse.ecf.internal.osgi.services.distribution.EventHookImpl.findHostContainers(EventHookImpl.java:175)
	at org.eclipse.ecf.internal.osgi.services.distribution.EventHookImpl.handleRegisteredServiceEvent(EventHookImpl.java:98)
	at org.eclipse.ecf.internal.osgi.services.distribution.EventHookImpl.event(EventHookImpl.java:62)
	at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.notifyEventHooksPrivileged(ServiceRegistry.java:1143)
	at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.publishServiceEventPrivileged(ServiceRegistry.java:743)
	at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.publishServiceEvent(ServiceRegistry.java:711)
	at org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.register(ServiceRegistrationImpl.java:130)
	at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.registerService(ServiceRegistry.java:206)
	at org.eclipse.osgi.framework.internal.core.BundleContextImpl.registerService(BundleContextImpl.java:507)
	at org.eclipse.equinox.internal.ds.InstanceProcess.registerService(InstanceProcess.java:504)
	at org.eclipse.equinox.internal.ds.InstanceProcess.buildComponents(InstanceProcess.java:259)
	at org.eclipse.equinox.internal.ds.Resolver.buildNewlySatisfied(Resolver.java:441)
	at org.eclipse.equinox.internal.ds.Resolver.enableComponents(Resolver.java:213)
	at org.eclipse.equinox.internal.ds.SCRManager.performWork(SCRManager.java:800)
	at org.eclipse.equinox.internal.ds.SCRManager$QueuedJob.dispatch(SCRManager.java:767)
	at org.eclipse.equinox.internal.ds.WorkThread.run(WorkThread.java:89)
	at org.eclipse.equinox.internal.util.impl.tpt.threadpool.Executor.run(Executor.java:70)
Caused by: java.net.BindException: Address already in use: JVM_Bind
	at java.net.PlainSocketImpl.socketBind(Native Method)
	at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:359)
	at java.net.ServerSocket.bind(ServerSocket.java:319)
	at java.net.ServerSocket.<init>(ServerSocket.java:185)
	at java.net.ServerSocket.<init>(ServerSocket.java:141)
	at org.eclipse.ecf.provider.comm.tcp.Server.<init>(Server.java:39)
	at org.eclipse.ecf.provider.generic.TCPServerSOContainerGroup.putOnTheAir(TCPServerSOContainerGroup.java:65)
	at org.eclipse.ecf.provider.generic.TCPServerSOContainer.<init>(TCPServerSOContainer.java:60)
	at org.eclipse.ecf.provider.generic.TCPServerSOContainer.<init>(TCPServerSOContainer.java:96)
	at org.eclipse.ecf.provider.generic.GenericContainerInstantiator.createInstance(GenericContainerInstantiator.java:153)
	... 24 more
[log;+0100 2011.06.09 12:41:13:609;WARNING;org.eclipse.ecf.osgi.services.distribution;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.distribution;code=2;message=org.eclipse.ecf.internal.osgi.services.distribution.EventHookImpl:handleRegisteredServiceEvent:No remote service containers found for serviceReference={com.example.foo.IFoo}={component.name=foo, component.id=1, service.exported.interfaces=*, service.id=53}. Service NOT EXPORTED;severity2;exception=null;children=[]]]
Sent request for enabling component foo

It looks like at this point it's trying to create an ECF generic server (instead of using the r-osgi service) and failing, presumably because there's an ECF Generic Server already running somewhere.

So what's happening? It looks like it forgets that it should be distributed over R-OSGi and is trying to spin up a generic ECF container. Of course, only the first ECF container is ever going to suceed on a single machine as it maintains a singleton port; but it's less clear why it's even trying to do this?
Comment 1 Alex Blewitt CLA 2011-06-09 08:14:50 EDT
It seems here's the probem:

		Collection rsContainers = findExistingHostContainers(serviceReference,
				serviceExportedInterfaces, serviceExportedConfigs,
				serviceIntents);

		if (rsContainers.size() == 0 && autoCreateContainer) {
			// If no existing containers are found we'll go through
			// finding/creating/configuring/connecting
			rsContainers = createAndConfigureHostContainers(serviceReference,

It can't be finding the container after the declarative service is disabled and then re-enabled, and thus going for the createAndConfigure to create a new one.

(I'm also happy if this is an issue with the org.eclipse.ecf.osgi.services.distribution but I think the problem is the R-OGSi container seems to be crashing.)
Comment 2 Scott Lewis CLA 2011-06-09 11:44:53 EDT
(In reply to comment #1)
> It seems here's the probem:
> 
>         Collection rsContainers = findExistingHostContainers(serviceReference,
>                 serviceExportedInterfaces, serviceExportedConfigs,
>                 serviceIntents);
> 
>         if (rsContainers.size() == 0 && autoCreateContainer) {
>             // If no existing containers are found we'll go through
>             // finding/creating/configuring/connecting
>             rsContainers = createAndConfigureHostContainers(serviceReference,
> 
> It can't be finding the container after the declarative service is disabled and
> then re-enabled, 

I don't (yet) understand why it's not finding the r-osgi container when the ds component is reactivated...as unless you explicitly remove the r-osgi container when your ds component deactivates, it should still be present and available.  

>and thus going for the createAndConfigure to create a new one.

I agree with your analysis...but I don't understand why it's not finding the existing r-osgi container.

In your remote service registration...are you specifying the standard OSGI remote services property 'service.exported.configs'? e.g. (in ds markup)

   <property name="service.exported.configs" type="String" value="ecf.r_osgi.peer"/>

The reason this is important is that when there are two or more providers present (e.g. r-osgi and generic)...and no service.exported.configs is specified...one has to be picked as the default...and it seems from your stack trace that perhaps the generic one is being picked.  The OSGi remote service spec defines 'service.exported.configs' as the way to specify a specific provider.

Note also that if you don't want the generic provider as the default for ECF RSA to use, then you can reset the default with this system property:

org.eclipse.ecf.osgi.services.remoteserviceadmin.hostDefaultConfigType

e.g.

-Dorg.eclipse.ecf.osgi.services.remoteserviceadmin.hostDefaultConfigType=ecf.r_osgi.peer

>(I'm also happy if this is an issue with the
>org.eclipse.ecf.osgi.services.distribution but I think the problem is the
>R-OGSi container seems to be crashing.)

You may be right...but I can't tell yet if it's just an issue of which provider is being used to export/reexport, or whether something is going wrong with r-osgi on your component deactivation.
Comment 3 Alex Blewitt CLA 2011-06-09 12:15:52 EDT
I'm not specifying the services.exported.configs property; I'm just hoping it will find it by virtue of the r-osgi container being present (and started). I'm also not sure why the generic container is there - I'm guessing that it's because I had the org.eclipse.ecf.provider bundle in my runtime, which seems to be needed via the provider.r_osgi and provider.remoteservice bundles. (So, as a corollary, there's no way of using r_osgi without the generic ECF server being installed - yet another reason why the generic ECF server needs to auto-choose a port!)

It's fairly easy for me to reproduce - is there an OSGi service I can look for that disappears which will confirm one way or another?
Comment 4 Alex Blewitt CLA 2011-06-09 12:24:07 EDT
FWIW I only have one component. It may also be a race condition with my remote container. When the service gets removed, it's possible that it's being returned by the remote container (as the service may not have gone away at that point).
Comment 5 Alex Blewitt CLA 2011-06-09 12:35:06 EDT
So, putting the service.exported.configs switched it over to using R-OSGi. So I was obviously using the generic container before unknowingly.

The question is then why the generic container goes away comes to mind :) Also, if the port is selected at startup and then it attempts to re-use that port, the serversocket may not have been closed (and so tries to re-use the same one, giving that error message). 

I get an error when I disable and re-enable the component again. Let me see if I can replicate the example on one of the out-of-the-box ECF samples to rule out problems with my discovery container.
Comment 6 Scott Lewis CLA 2011-06-09 12:47:07 EDT
(In reply to comment #3)
> I'm not specifying the services.exported.configs property; I'm just hoping it
> will find it by virtue of the r-osgi container being present (and started). I'm
> also not sure why the generic container is there - I'm guessing that it's
> because I had the org.eclipse.ecf.provider bundle in my runtime, which seems to
> be needed via the provider.r_osgi and provider.remoteservice bundles. (So, as a
> corollary, there's no way of using r_osgi without the generic ECF server being
> installed 

You can use service.exported.configs or set the default provider as per comment 2.

(In reply to comment #5)
> So, putting the service.exported.configs switched it over to using R-OSGi. So I
> was obviously using the generic container before unknowingly.
> 
> The question is then why the generic container goes away comes to mind :) 

I don't understand what you mean by this.

> Also,
> if the port is selected at startup and then it attempts to re-use that port,
> the serversocket may not have been closed (and so tries to re-use the same one,
> giving that error message). 
> 
> I get an error when I disable and re-enable the component again. 

What error are you receiving now that you are explicitly specifying the use of the r-osgi container?  Is it the same as reported in the original bug report? (i.e. port in use?)

Let me see if
> I can replicate the example on one of the out-of-the-box ECF samples to rule
> out problems with my discovery container.

Ok.
Comment 7 Markus Kuppe CLA 2011-06-09 12:58:36 EDT
(In reply to comment #3)
> I'm not specifying the services.exported.configs property; I'm just hoping it
> will find it by virtue of the r-osgi container being present (and started).

This goes into the direction of bug #326132
Comment 8 Alex Blewitt CLA 2011-06-09 13:04:46 EDT
I like the idea of bug #326132 which looks up an OSGi service based on priority and picks the highest one :)

I've tried to reproduce this problem with the Hello DS example, switching the ecf.generic.server for ecf.r_osgi.peer, but that seems to work as expected. So something else is odd with my set up.

At least now I know it's registering it via r_osgi, thanks to me setting it in the service.exported.configs.

Let me see if I can create a pared down example to attach to this bug, or dig deeper into the problem.
Comment 9 Alex Blewitt CLA 2011-06-09 13:18:09 EDT
Hmm... wondering if it's my container at fault (again). It's a different error:

osgi> [log;+0100 2011.06.09 18:15:21:172;ERROR;org.eclipse.ecf.osgi.services.distribution;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.distribution;code=4;message=org.eclipse.ecf.internal.osgi.services.distribution.DiscoveredServiceTrackerImpl:registerRemoteServiceReferences:Remote service is null for remote reference RemoteServiceReference[remoteServiceID=org.eclipse.ecf.remoteservice.RemoteServiceID[containerID=r-osgi://myhostname:9280;containerRelativeID=63];ref=RemoteServiceReference{r-osgi://myhostname:9280#63-[com.gs.example.foo.IFoo]}];severity4;exception=null;children=[]]]

I think I'm ending up with two services for my published service, so when I unregister it tries to bind to the phantom twin. But by the time the phantom twin goes, tie 'getService' will return null. And so my container still thinks there's a service there, when in fact the service reference will never resolve. In turn, that throws an exception whcih prevents any other listeners down the line seeing it.
Comment 10 Scott Lewis CLA 2011-06-09 13:28:29 EDT
(In reply to comment #7)
> (In reply to comment #3)
> > I'm not specifying the services.exported.configs property; I'm just hoping it
> > will find it by virtue of the r-osgi container being present (and started).
> 
> This goes into the direction of bug #326132

I have no objection to such a strategy...in fact, the IHostContainerSelector structure allows such strategies to be pretty easily implemented and substituted for the default behavior.
Comment 11 Alex Blewitt CLA 2011-06-09 14:21:19 EDT
I think this can be resolved INVALID - I was returning the set of advertised services as well as discovered, and I had another bug which was tickling a different kind of problem. With those two fixed, this seems to work now.
Comment 12 Markus Kuppe CLA 2011-06-09 14:22:47 EDT
Closing as per comment