| Summary: | Stopping/Starting DS component remoted with R-OSGi fails to restart | ||
|---|---|---|---|
| Product: | [RT] ECF | Reporter: | Alex Blewitt <alex.blewitt> |
| Component: | ecf.remoteservices | Assignee: | ecf.core-inbox <ecf.core-inbox> |
| Status: | RESOLVED INVALID | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | CC: | bugs.eclipse.org, slewis |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | All | ||
| Whiteboard: | |||
It seems here's the probem:
Collection rsContainers = findExistingHostContainers(serviceReference,
serviceExportedInterfaces, serviceExportedConfigs,
serviceIntents);
if (rsContainers.size() == 0 && autoCreateContainer) {
// If no existing containers are found we'll go through
// finding/creating/configuring/connecting
rsContainers = createAndConfigureHostContainers(serviceReference,
It can't be finding the container after the declarative service is disabled and then re-enabled, and thus going for the createAndConfigure to create a new one.
(I'm also happy if this is an issue with the org.eclipse.ecf.osgi.services.distribution but I think the problem is the R-OGSi container seems to be crashing.)
(In reply to comment #1) > It seems here's the probem: > > Collection rsContainers = findExistingHostContainers(serviceReference, > serviceExportedInterfaces, serviceExportedConfigs, > serviceIntents); > > if (rsContainers.size() == 0 && autoCreateContainer) { > // If no existing containers are found we'll go through > // finding/creating/configuring/connecting > rsContainers = createAndConfigureHostContainers(serviceReference, > > It can't be finding the container after the declarative service is disabled and > then re-enabled, I don't (yet) understand why it's not finding the r-osgi container when the ds component is reactivated...as unless you explicitly remove the r-osgi container when your ds component deactivates, it should still be present and available. >and thus going for the createAndConfigure to create a new one. I agree with your analysis...but I don't understand why it's not finding the existing r-osgi container. In your remote service registration...are you specifying the standard OSGI remote services property 'service.exported.configs'? e.g. (in ds markup) <property name="service.exported.configs" type="String" value="ecf.r_osgi.peer"/> The reason this is important is that when there are two or more providers present (e.g. r-osgi and generic)...and no service.exported.configs is specified...one has to be picked as the default...and it seems from your stack trace that perhaps the generic one is being picked. The OSGi remote service spec defines 'service.exported.configs' as the way to specify a specific provider. Note also that if you don't want the generic provider as the default for ECF RSA to use, then you can reset the default with this system property: org.eclipse.ecf.osgi.services.remoteserviceadmin.hostDefaultConfigType e.g. -Dorg.eclipse.ecf.osgi.services.remoteserviceadmin.hostDefaultConfigType=ecf.r_osgi.peer >(I'm also happy if this is an issue with the >org.eclipse.ecf.osgi.services.distribution but I think the problem is the >R-OGSi container seems to be crashing.) You may be right...but I can't tell yet if it's just an issue of which provider is being used to export/reexport, or whether something is going wrong with r-osgi on your component deactivation. I'm not specifying the services.exported.configs property; I'm just hoping it will find it by virtue of the r-osgi container being present (and started). I'm also not sure why the generic container is there - I'm guessing that it's because I had the org.eclipse.ecf.provider bundle in my runtime, which seems to be needed via the provider.r_osgi and provider.remoteservice bundles. (So, as a corollary, there's no way of using r_osgi without the generic ECF server being installed - yet another reason why the generic ECF server needs to auto-choose a port!) It's fairly easy for me to reproduce - is there an OSGi service I can look for that disappears which will confirm one way or another? FWIW I only have one component. It may also be a race condition with my remote container. When the service gets removed, it's possible that it's being returned by the remote container (as the service may not have gone away at that point). So, putting the service.exported.configs switched it over to using R-OSGi. So I was obviously using the generic container before unknowingly. The question is then why the generic container goes away comes to mind :) Also, if the port is selected at startup and then it attempts to re-use that port, the serversocket may not have been closed (and so tries to re-use the same one, giving that error message). I get an error when I disable and re-enable the component again. Let me see if I can replicate the example on one of the out-of-the-box ECF samples to rule out problems with my discovery container. (In reply to comment #3) > I'm not specifying the services.exported.configs property; I'm just hoping it > will find it by virtue of the r-osgi container being present (and started). I'm > also not sure why the generic container is there - I'm guessing that it's > because I had the org.eclipse.ecf.provider bundle in my runtime, which seems to > be needed via the provider.r_osgi and provider.remoteservice bundles. (So, as a > corollary, there's no way of using r_osgi without the generic ECF server being > installed You can use service.exported.configs or set the default provider as per comment 2. (In reply to comment #5) > So, putting the service.exported.configs switched it over to using R-OSGi. So I > was obviously using the generic container before unknowingly. > > The question is then why the generic container goes away comes to mind :) I don't understand what you mean by this. > Also, > if the port is selected at startup and then it attempts to re-use that port, > the serversocket may not have been closed (and so tries to re-use the same one, > giving that error message). > > I get an error when I disable and re-enable the component again. What error are you receiving now that you are explicitly specifying the use of the r-osgi container? Is it the same as reported in the original bug report? (i.e. port in use?) Let me see if > I can replicate the example on one of the out-of-the-box ECF samples to rule > out problems with my discovery container. Ok. (In reply to comment #3) > I'm not specifying the services.exported.configs property; I'm just hoping it > will find it by virtue of the r-osgi container being present (and started). This goes into the direction of bug #326132 I like the idea of bug #326132 which looks up an OSGi service based on priority and picks the highest one :) I've tried to reproduce this problem with the Hello DS example, switching the ecf.generic.server for ecf.r_osgi.peer, but that seems to work as expected. So something else is odd with my set up. At least now I know it's registering it via r_osgi, thanks to me setting it in the service.exported.configs. Let me see if I can create a pared down example to attach to this bug, or dig deeper into the problem. Hmm... wondering if it's my container at fault (again). It's a different error:
osgi> [log;+0100 2011.06.09 18:15:21:172;ERROR;org.eclipse.ecf.osgi.services.distribution;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.distribution;code=4;message=org.eclipse.ecf.internal.osgi.services.distribution.DiscoveredServiceTrackerImpl:registerRemoteServiceReferences:Remote service is null for remote reference RemoteServiceReference[remoteServiceID=org.eclipse.ecf.remoteservice.RemoteServiceID[containerID=r-osgi://myhostname:9280;containerRelativeID=63];ref=RemoteServiceReference{r-osgi://myhostname:9280#63-[com.gs.example.foo.IFoo]}];severity4;exception=null;children=[]]]
I think I'm ending up with two services for my published service, so when I unregister it tries to bind to the phantom twin. But by the time the phantom twin goes, tie 'getService' will return null. And so my container still thinks there's a service there, when in fact the service reference will never resolve. In turn, that throws an exception whcih prevents any other listeners down the line seeing it.
(In reply to comment #7) > (In reply to comment #3) > > I'm not specifying the services.exported.configs property; I'm just hoping it > > will find it by virtue of the r-osgi container being present (and started). > > This goes into the direction of bug #326132 I have no objection to such a strategy...in fact, the IHostContainerSelector structure allows such strategies to be pretty easily implemented and substituted for the default behavior. I think this can be resolved INVALID - I was returning the set of advertised services as well as discovered, and I had another bug which was tickling a different kind of problem. With those two fixed, this seems to work now. Closing as per comment |
I'm testing a declarative services setup with r-osgi. I have a service registered under DS in one VM, and another DS client in a different VM. I see the warnings "WARNING: Port 9278 already in use. This instance of R-OSGi is running on port 9280" in both client VMs (the client is 9279). When I start both of these, I see that the component is registered. All is good. If I stop the DS component service, it stops but I see this error: osgi> dis 1 Sent request for disabling component foo osgi> [log;+0100 2011.06.09 12:41:11:953;ERROR;org.eclipse.ecf.osgi.services.distribution;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.distribution;code=4;message=org.eclipse.ecf.internal.osgi.services.distribution.DiscoveredServiceTrackerImpl:handleDiscoveredServiceAvailable:getRemoteServiceReferences result is empty. containerHelper=RemoteServiceContainer [containerID=StringID[AqtyFGtobI6szak3Cydmsw49HBg=], container=org.eclipse.ecf.provider.generic.TCPClientSOContainer@e964fe, containerAdapter=org.eclipse.ecf.provider.remoteservice.generic.RegistrySharedObject@ba8180]remoteReferences=null;severity4;exception=null;children=[]]] The client (as expected) becomes unsatisfied, because the remote service has gone away. However, problems occur when I re-enable the service: osgi> en 1 [log;+0100 2011.06.09 12:41:13:609;INFO;org.eclipse.ecf.osgi.services.distribution;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.distribution;code=0;message=Exception creating container from ContainerTypeDescription=ContainerTypeDescription[name=ecf.generic.server;instantiator=org.eclipse.ecf.provider.generic.GenericContainerInstantiator@9db992;desc=ECF Generic Server;;severity4;exception=org.eclipse.ecf.core.ContainerCreateException: createInstance;children=[]]] org.eclipse.ecf.core.ContainerCreateException: createInstance at org.eclipse.ecf.provider.generic.GenericContainerInstantiator.createInstance(GenericContainerInstantiator.java:158) at org.eclipse.ecf.core.ContainerFactory.createContainer(ContainerFactory.java:288) at org.eclipse.ecf.core.ContainerFactory.createContainer(ContainerFactory.java:246) at org.eclipse.ecf.osgi.services.distribution.AbstractContainerFinder.createContainer(AbstractContainerFinder.java:165) at org.eclipse.ecf.osgi.services.distribution.AbstractHostContainerFinder.createRSContainer(AbstractHostContainerFinder.java:295) at org.eclipse.ecf.osgi.services.distribution.AbstractHostContainerFinder.createDefaultRSContainers(AbstractHostContainerFinder.java:235) at org.eclipse.ecf.osgi.services.distribution.AbstractHostContainerFinder.createAndConfigureHostContainers(AbstractHostContainerFinder.java:205) at org.eclipse.ecf.osgi.services.distribution.DefaultHostContainerFinder.findHostContainers(DefaultHostContainerFinder.java:47) at org.eclipse.ecf.internal.osgi.services.distribution.EventHookImpl.findHostContainers(EventHookImpl.java:175) at org.eclipse.ecf.internal.osgi.services.distribution.EventHookImpl.handleRegisteredServiceEvent(EventHookImpl.java:98) at org.eclipse.ecf.internal.osgi.services.distribution.EventHookImpl.event(EventHookImpl.java:62) at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.notifyEventHooksPrivileged(ServiceRegistry.java:1143) at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.publishServiceEventPrivileged(ServiceRegistry.java:743) at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.publishServiceEvent(ServiceRegistry.java:711) at org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.register(ServiceRegistrationImpl.java:130) at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.registerService(ServiceRegistry.java:206) at org.eclipse.osgi.framework.internal.core.BundleContextImpl.registerService(BundleContextImpl.java:507) at org.eclipse.equinox.internal.ds.InstanceProcess.registerService(InstanceProcess.java:504) at org.eclipse.equinox.internal.ds.InstanceProcess.buildComponents(InstanceProcess.java:259) at org.eclipse.equinox.internal.ds.Resolver.buildNewlySatisfied(Resolver.java:441) at org.eclipse.equinox.internal.ds.Resolver.enableComponents(Resolver.java:213) at org.eclipse.equinox.internal.ds.SCRManager.performWork(SCRManager.java:800) at org.eclipse.equinox.internal.ds.SCRManager$QueuedJob.dispatch(SCRManager.java:767) at org.eclipse.equinox.internal.ds.WorkThread.run(WorkThread.java:89) at org.eclipse.equinox.internal.util.impl.tpt.threadpool.Executor.run(Executor.java:70) Caused by: java.net.BindException: Address already in use: JVM_Bind at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:359) at java.net.ServerSocket.bind(ServerSocket.java:319) at java.net.ServerSocket.<init>(ServerSocket.java:185) at java.net.ServerSocket.<init>(ServerSocket.java:141) at org.eclipse.ecf.provider.comm.tcp.Server.<init>(Server.java:39) at org.eclipse.ecf.provider.generic.TCPServerSOContainerGroup.putOnTheAir(TCPServerSOContainerGroup.java:65) at org.eclipse.ecf.provider.generic.TCPServerSOContainer.<init>(TCPServerSOContainer.java:60) at org.eclipse.ecf.provider.generic.TCPServerSOContainer.<init>(TCPServerSOContainer.java:96) at org.eclipse.ecf.provider.generic.GenericContainerInstantiator.createInstance(GenericContainerInstantiator.java:153) ... 24 more [log;+0100 2011.06.09 12:41:13:609;WARNING;org.eclipse.ecf.osgi.services.distribution;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.distribution;code=2;message=org.eclipse.ecf.internal.osgi.services.distribution.EventHookImpl:handleRegisteredServiceEvent:No remote service containers found for serviceReference={com.example.foo.IFoo}={component.name=foo, component.id=1, service.exported.interfaces=*, service.id=53}. Service NOT EXPORTED;severity2;exception=null;children=[]]] Sent request for enabling component foo It looks like at this point it's trying to create an ECF generic server (instead of using the r-osgi service) and failing, presumably because there's an ECF Generic Server already running somewhere. So what's happening? It looks like it forgets that it should be distributed over R-OSGi and is trying to spin up a generic ECF container. Of course, only the first ECF container is ever going to suceed on a single machine as it maintains a singleton port; but it's less clear why it's even trying to do this?