| Summary: | ECF remote discovery fails over XMPP | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [RT] ECF | Reporter: | Eugen Reiswich <reiswich> | ||||||
| Component: | ecf.providers | Assignee: | ecf.core-inbox <ecf.core-inbox> | ||||||
| Status: | RESOLVED WORKSFORME | QA Contact: | |||||||
| Severity: | major | ||||||||
| Priority: | P3 | CC: | slewis | ||||||
| Version: | unspecified | ||||||||
| Target Milestone: | --- | ||||||||
| Hardware: | Macintosh | ||||||||
| OS: | Mac OS X - Carbon (unsup.) | ||||||||
| Whiteboard: | |||||||||
| Attachments: |
|
||||||||
|
Description
Eugen Reiswich
Created attachment 173454 [details]
SourceCode to reproduce the problem
Would you please describe what you would like to have the working state be? What I mean by this is something like the following:
Service Interface: IFoo
Impl class: Foo
Processes involved: A, B, C (XMPP clients)
Process A - host for IFoo service
Process B and C consumers/clients of IFoo service
All three assumed/setup to be on each other's roster
Expected sequence (e.g.):
A connects
A roster: empty of (A C not online)
B connects
A roster: B present
B roster: A present
C connects
A roster: B, C present
B roster: A, C present
C roster: A, B present
A calls registerRemoteService(IFoo,Foo,props);
B calls getRemoteServiceReference(...)
C calls getRemoteServiceReference(...)
Desired state: B and C have proxy (consumer) to IFoo service implemented by A (host)
B calls methods on IFoo local proxy (and/or sends messages via IRemoteService.sendAsync)
C calls methods on IFoo local proxy (and/or sends messages via IRemoteService.sendAsync)
A description of the desired state...and expected sequence of xmpp connect remote service registration...will help me to understand the intended use case for the code in attachment 173454 [details].
(In reply to comment #2) Hi Scott, > Service Interface: IFoo > Impl class: Foo The IExampleService & ExampleServiceImpl are in org.remotercp.ecf.exampleservice.test bundle. > Processes involved: A, B, C (XMPP clients) > Process A - host for IFoo service IExampleService is hosted by org.remotercp.ecf.producer.test using OSGi DS. This bundle has a listener for presence changes and registers immediately the IExampleService to just connected user. > Process B and C consumers/clients of IFoo service I try to run the example rather with two hosts and one consumer because that's the requirement in my case. Process A & B hosts for IExampleService, C consumer for IExampleService Consumer is org.remotercp.ecf.consumer.test. As soon as the bundle is started it registers a CommandProvider in order to use OSGi console commands. Consumer commands: getService - consumer tries to retrieve a remote service proxy for the IExampleService getAnyService - consumer tries to retrieve any available service getUserService #userName - consumer tries to retrieve the IExampleService for a specific host e.g. "getUserService testa" tries to retrieve the IExampleService registered by host testa. > Expected sequence: Host A connects A roster empty Host B connects A roster: B present --> call registerRemoteService(IExampleService, impl, targetIDs[userB_Id]) B roster: A present --> call registerRemoteService(IExampleService, impl, targetIDs[userA_Id]) Consumer C connects: A roster: B, C present: --> call registerRemoteService(IExampleService, impl, targetIDs[userC_Id]) B roster: A, C present: --> call registerRemoteService(IExampleService, impl, targetIDs[userC_Id]) C roster: A, B present (there is no need yet to register services as a consumer, might be useful later) C calls getRemoteServiceReference(null, IExampleService, null) --> result will be an array with two proxies. One provided by host A, one by host B. C calls methods on both IExampleService local proxies (and/or sends messages) C calls getRemoteServiceReference(filderIDs[hostA], IExampleService, null) --> result will be one proxy for host A C calls methods on IExampleService local proxy provided by hostA C calls getRemoteServiceReference(filderIDs[hostB], IExampleService, null) --> result will be one proxy for hostB C calls methods on IExampleService local proxy provided by hostB Hi Eugen,
Thanks for the explanation. I will take some time over the next few days to digest, test, and understand this further.
I have a couple of immediate thoughts/comments/responses...and there will probably be others as we jointly work with this use case.
Some observations from what I've examined so far:
1) In SessionServiceImpl there is this code (in connect method):
container.connect(xmppid, connectContext);
logger.info("************************* \n User: " + userName
+ " connected \n ******************************");
registerRosterListener();
I believe this could be responsible for some of your race conditions with presence notifications, because the registerRosterListener() is added *after* connect. Upon connect, the XMPP roster is updated asynchronously...and sometimes very quickly...and it can/could happen *before* the registerRosterListener() is even called (by the main thread). This would result in the roster listener *not* being notified about some roster/presence updates (which is what you were seeing at one point I believe...right?).
2) The ECF remote service API does *not* by default interact with the OSGI service registry (and therefore does not interact with declarative services either). This interaction (i.e. between a remote service and the local OSGI service registry) is the job of the ECF OSGI 4.2 remote services standard impl...i.e. plugins org.eclipse.ecf.osgi.services.discovery and o.e.e.osgi.services.distribution. Based upon your existing code, I think you are assuming that the ECF remote services API usage (i.e. IRemoteServiceContainerAdapter.registerRemoteService) interacts/should interact with the OSGI service registry...and therefore trigger declarative service references...but this is not the case. Actually, just to simplify the interaction and timing of the registration and lookup of remote services I would suggest that we not use declarative services to *start*...because with remote services I'm of the opinion that declarative services can make the *timing* of things somewhat harder to understand (since with remote services everything is happening dynamically rather than statically).
I will be examining things in this code base more over today and subsequent days, and we can continue with diagnosing things.
3) With XMPP, the registerRemoteService(...) call has to be qualified with the target receivers (XMPPIDs) via the service property: org.eclipse.ecf.remoteservice.Constants.SERVICE_REGISTRATION_TARGETS. This is necessary because with XMPP there is no scoping for add registration messages...i.e. who should receive these messages (it can't be *everyone* on IM), it can't be *no one*, so it's not clear who should receive the add registration messages. If you want to see an example of this see this class:
org.eclipse.ecf.tests.provider.xmpp.remoteservice.RemoteServiceTest
in this bundle: org.eclipse.ecf.tests.provider.xmpp in the tests/bundles module:
cvsroot: /cvsroot/rt
module: org.eclipse.ecf/tests/bundles/org.eclipse.ecf.tests.provider.xmpp
You will see a method customizeProperties, which is called by the test code to setup the properties for a given call to registerRemoteService (which is made in the test cases define in the abstract super class: org.eclipse.ecf.tests.remoteservice.AbstractRemoteServiceTest).
One enhancement that I've been considering for the XMPP provider is to (optionally), allow a filter to be defined (rather than an ID[]) that will send to any matching ID rather than an explicit set. Other approaches are possible as well (e.g. sending to all active IDs on the roster). Neither of these approaches has not been added yet (except for the SERVICE_REGISTRATION_TARGETS, of course), however, but this/your usage could very well be a driving use case.
(In reply to comment #3)
> (In reply to comment #2)
<stuff deleted>
>
> IExampleService is hosted by org.remotercp.ecf.producer.test using OSGi DS.
> This bundle has a listener for presence changes and registers immediately the
> IExampleService to just connected user.
>
> > Process B and C consumers/clients of IFoo service
> I try to run the example rather with two hosts and one consumer because that's
> the requirement in my case.
>
> Process A & B hosts for IExampleService, C consumer for IExampleService
>
> Consumer is org.remotercp.ecf.consumer.test. As soon as the bundle is started
> it registers a CommandProvider in order to use OSGi console commands. Consumer
> commands:
>
> getService - consumer tries to retrieve a remote service proxy for the
> IExampleService
> getAnyService - consumer tries to retrieve any available service
> getUserService #userName - consumer tries to retrieve the IExampleService for a
> specific host e.g. "getUserService testa" tries to retrieve the IExampleService
> registered by host testa.
>
> > Expected sequence:
>
> Host A connects
> A roster empty
> Host B connects
> A roster: B present --> call registerRemoteService(IExampleService, impl,
> targetIDs[userB_Id])
> B roster: A present --> call registerRemoteService(IExampleService, impl,
> targetIDs[userA_Id])
>
> Consumer C connects:
> A roster: B, C present: --> call registerRemoteService(IExampleService,
> impl, targetIDs[userC_Id])
> B roster: A, C present: --> call registerRemoteService(IExampleService,
> impl, targetIDs[userC_Id])
> C roster: A, B present (there is no need yet to register services as a
> consumer, might be useful later)
>
> C calls getRemoteServiceReference(null, IExampleService, null) --> result will
> be an array with two proxies. One provided by host A, one by host B.
> C calls methods on both IExampleService local proxies (and/or sends
> messages)
>
> C calls getRemoteServiceReference(filderIDs[hostA], IExampleService, null) -->
> result will be one proxy for host A
> C calls methods on IExampleService local proxy provided by hostA
>
> C calls getRemoteServiceReference(filderIDs[hostB], IExampleService, null) -->
> result will be one proxy for hostB
> C calls methods on IExampleService local proxy provided by hostB
(In reply to comment #4) Hi Scott, > Some observations from what I've examined so far: > > 1) In SessionServiceImpl there is this code (in connect method): > > container.connect(xmppid, connectContext); > > logger.info("************************* \n User: " + userName > + " connected \n ******************************"); > > registerRosterListener(); > > > I believe this could be responsible for some of your race conditions with > presence notifications, because the registerRosterListener() is added *after* > connect. Upon connect, the XMPP roster is updated asynchronously...and > sometimes very quickly...and it can/could happen *before* the > registerRosterListener() is even called (by the main thread). This would > result in the roster listener *not* being notified about some roster/presence > updates (which is what you were seeing at one point I believe...right?). I thought registering presence listeners requires an initialized ECF-container object which is created after a connection is done. But if that's not true I'll try to register listeners before connecting to an XMPP server. > 2) The ECF remote service API does *not* by default interact with the OSGI > service registry (and therefore does not interact with declarative services > either). This interaction (i.e. between a remote service and the local OSGI > service registry) is the job of the ECF OSGI 4.2 remote services standard > impl...i.e. plugins org.eclipse.ecf.osgi.services.discovery and > o.e.e.osgi.services.distribution. Based upon your existing code, I think you > are assuming that the ECF remote services API usage (i.e. > IRemoteServiceContainerAdapter.registerRemoteService) interacts/should interact > with the OSGI service registry...and therefore trigger declarative service > references...but this is not the case. I don't expect ECF to interact neither with the OSGi service registry nor with DS. I use DS in my example only for the ISessionService that is required in my host and client bundles in order to register and retrieve the IExampleService. It's just for convenience instead of working with ServiceTrackers. > 3) With XMPP, the registerRemoteService(...) call has to be qualified with the > target receivers (XMPPIDs) via the service property: > org.eclipse.ecf.remoteservice.Constants.SERVICE_REGISTRATION_TARGETS. This is > necessary because with XMPP there is no scoping for add registration > messages...i.e. who should receive these messages (it can't be *everyone* on > IM), it can't be *no one*, so it's not clear who should receive the add > registration messages. That's exactly why I need a working presence listener! As in XMPP users might connect and disconnect over and over again I need a reliable way to listen for new users and to register remote services to just connected users. >If you want to see an example of this see this class: > > org.eclipse.ecf.tests.provider.xmpp.remoteservice.RemoteServiceTest I'm already doing this within ISessionService.registerRemoteService(...) method. > One enhancement that I've been considering for the XMPP provider is to > (optionally), allow a filter to be defined (rather than an ID[]) that will send > to any matching ID rather than an explicit set. Other approaches are possible > as well (e.g. sending to all active IDs on the roster). Sending a service registration to all active IDs would really be helpful especially because users can connect and disconnect over time. It's comparable with OSGi services, they can also come and go anytime. In times before DS was introduces to OSGi it was a nightmare to handle this dynamics with ServiceTracker. >Neither of these approaches has not been added yet (except for the SERVICE_REGISTRATION_TARGETS, > of course), however, but this/your usage could very well be a driving use case. As soon as I get my example application working reliably with ECF over XMPP I don't mind to think about further collaboration. For now I'm working on too many different projects and would like to finish some of them. (In reply to comment #3) <stuff deleted> > > Process B and C consumers/clients of IFoo service > I try to run the example rather with two hosts and one consumer because that's > the requirement in my case. > > Process A & B hosts for IExampleService, C consumer for IExampleService Eugen would you let me know how to launch the two hosts? (i.e. A & B)? I mean you have an ECF Example Host.launch launch config, but that will run one host (right?). How does should the other one be started? <stuff deleted> > > > Expected sequence: > > Host A connects > A roster empty > Host B connects > A roster: B present --> call registerRemoteService(IExampleService, impl, > targetIDs[userB_Id]) > B roster: A present --> call registerRemoteService(IExampleService, impl, > targetIDs[userA_Id]) If I'm understanding your use case, these two registrations are irrelevant and not needed. The reason for this is that your registration on A is targeted to B (i.e. targetIDs[userB_Id])), and B is targeted to A (i.e. targetIds[userA_Id]). Unless I'm misunderstanding, Host B doesn't isn't a consumer of Host A's service, and Host A is not a consumer of Host B's service...so these are superfluous (I think). > > Consumer C connects: > A roster: B, C present: --> call registerRemoteService(IExampleService, > impl, targetIDs[userC_Id]) > B roster: A, C present: --> call registerRemoteService(IExampleService, > impl, targetIDs[userC_Id]) > C roster: A, B present (there is no need yet to register services as a > consumer, might be useful later) These are the important registrations...because they are targeted at the consumer (C). > > C calls getRemoteServiceReference(null, IExampleService, null) --> result will > be an array with two proxies. One provided by host A, one by host B. > C calls methods on both IExampleService local proxies (and/or sends > messages) What are you seeing returned from getRemoteServiceReference(null, IExampleService, null) on the consumer? Nothing returned (null)? Or something else? > > C calls getRemoteServiceReference(filderIDs[hostA], IExampleService, null) --> > result will be one proxy for host A > C calls methods on IExampleService local proxy provided by hostA > > C calls getRemoteServiceReference(filderIDs[hostB], IExampleService, null) --> > result will be one proxy for hostB > C calls methods on IExampleService local proxy provided by hostB Again what are you actually seeing when these calls are made? Thanks. I think we're narrowing things down...and we'll get this working/fixed, one way or another. Please just be patient. BTW...one thought: You ultimately might want to consider using ECF's RemoteServiceTracker on the consumer...to more easily deal with the asynchronous/dynamic nature and simplify the client code. See this class: org.eclipse.ecf.remoteservice.util.tracker.RemoteServiceTracker. Note it's based upon the ServiceTracker, but works explicitly with the IRemoteServiceContainerAdapter (and IRemoteServiceListener). I'm not suggesting making this change at the moment, however...lets see if we can get things working as desired/expected with your existing codebase before making such a change. WRT DS and dynamics...I have to admit to you that although I very much like the DS service dependency handling, I sometimes find DS harder to understand WRT dynamics (i.e. than ServiceTracker)...partially because the implications of the cardinality constraints (i.e. 0..n) can be hard (admittedly for me) to understand...and because everything happens at runtime (policy="dynamic") rather than framework start time (policy="static"). The timing of things can be very complicated. At least with the ServiceTrackerCustomizer the runtime behavior is clear (i.e. *this method* is called when a [remote] service is added to the local service registry...and it's called *every time* the service is added). I do understand that the binding/component activation is similarly done when the policy is dynamic...but in my own mind this is somewhat more subtle...especially WRT timing. In any event, I'm not trying to convince you of anything...I'm just making an observation. I still think that DS is great. Hi Scott, > Eugen would you let me know how to launch the two hosts? (i.e. A & B)? I mean > you have an ECF Example Host.launch launch config, but that will run one host > (right?). How does should the other one be started? org.remotercp.ecf.producer.test has a server.properties file where you can change login information: testb, testc and testd. 1. testa is reserver for the consumer 2. choose e.g. testb to start hostB 3. change login to testc and use the same launch config to start hostC 3. change login to testd and use the same launch config to start hostD > > Host A connects > > A roster empty > > Host B connects > > A roster: B present --> call registerRemoteService(IExampleService, impl, > > targetIDs[userB_Id]) > > B roster: A present --> call registerRemoteService(IExampleService, impl, > > targetIDs[userA_Id]) > > > If I'm understanding your use case, these two registrations are irrelevant and > not needed. The reason for this is that your registration on A is targeted to > B (i.e. targetIDs[userB_Id])), and B is targeted to A (i.e. > targetIds[userA_Id]). Unless I'm misunderstanding, Host B doesn't isn't a > consumer of Host A's service, and Host A is not a consumer of Host B's > service...so these are superfluous (I think). Basically you are right. But in a peer to peer network I don't really know who's host and who's client. That's why I always register services to new online user. > > C calls getRemoteServiceReference(null, IExampleService, null) --> result will > > be an array with two proxies. One provided by host A, one by host B. > > C calls methods on both IExampleService local proxies (and/or sends > > messages) > > What are you seeing returned from getRemoteServiceReference(null, > IExampleService, null) on the consumer? Nothing returned (null)? Or something > else? Before you fixed the bug I got a NPE. Now I get an empty array of IRemoteServiceReference[] > > C calls getRemoteServiceReference(filderIDs[hostA], IExampleService, null) --> > > result will be one proxy for host A > > C calls methods on IExampleService local proxy provided by hostA > > > > C calls getRemoteServiceReference(filderIDs[hostB], IExampleService, null) --> > > result will be one proxy for hostB > > C calls methods on IExampleService local proxy provided by hostB > > Again what are you actually seeing when these calls are made? These methods always return an empty IRemoteServiceReference[] array. > Thanks. I think we're narrowing things down...and we'll get this > working/fixed, one way or another. Please just be patient. Sure, thank you. > BTW...one thought: You ultimately might want to consider using ECF's > RemoteServiceTracker on the consumer...to more easily deal with the > asynchronous/dynamic nature and simplify the client code. See this class: > org.eclipse.ecf.remoteservice.util.tracker.RemoteServiceTracker. Note it's > based upon the ServiceTracker, but works explicitly with the > IRemoteServiceContainerAdapter (and IRemoteServiceListener). Good to know that but right now I am not ably to retrieve any remote service from my IRemoteServiceContainerAdapter. But this is exactly what the RemoteServiceTracker requires as a parameter within the constructor. I'll pay attention on dynamics in my application as soon as I'll be able to reliably retrieve remote services. > WRT DS and dynamics...I have to admit to you that although I very much like the > DS service dependency handling, I sometimes find DS harder to understand WRT > dynamics (i.e. than ServiceTracker)...partially because the implications of the > cardinality constraints (i.e. 0..n) can be hard (admittedly for me) to > understand...and because everything happens at runtime (policy="dynamic") > rather than framework start time (policy="static"). The timing of things can > be very complicated. > At least with the ServiceTrackerCustomizer the runtime behavior is clear (i.e. > *this method* is called when a [remote] service is added to the local service > registry...and it's called *every time* the service is added). I do understand > that the binding/component activation is similarly done when the policy is > dynamic...but in my own mind this is somewhat more subtle...especially WRT > timing. >In any event, I'm not trying to convince you of anything...I'm just > making an observation. I still think that DS is great. I agree that DS is sometimes hard to understand. Especially the policies you've mentioned introduce a new service component lifecycle in addition to the bundle's lifecycle. It took me a while to understand how they are related to each other. But what I really like about DS is the service orchestration concept when a service requires other services in order to run properly, optional and mandatory service orchestration and the handling of concurrency. This can not practically be done with ServiceTrackers. BTW: what you can use to start a component together with the framework is the option "This component is immediately activated" in DS editor. Hi Eugen.
I've tracked down the problem. The main reason that your code is not working is because the initialization of the RegistrySharedObject (actually class org.eclipse.ecf.internal.provider.xmpp.XMPPRemoteServiceAdapterFactory.XMPPRegistrySharedObject) is *lazy*...meaning that the initialization/creation of the remote service registry doesn't happen until the first access...and in the case of the consumer the first access might very well be only when the user calls 'getService' from the command provider. If that's when it's initialized (as it is in your code), then it will have already missed the add registration message sent from the remote host.
The workaround is very simple, however...it's just to initialize the remote service adapter 'more' eagerly...e.g. this code change to SessionServiceImpl.connect does it:
public void connect(String userName, String password, String server)
throws URISyntaxException, ECFException {
...
container.connect(xmppid, connectContext);
// XXX this is added so that the remote service registry gets initialized
// The first call to getAdapter will result in the creation/addition of the
// XMPPRegistrySharedObject
IRemoteServiceContainerAdapter adapter = (IRemoteServiceContainerAdapter) container.getAdapter(IRemoteServiceContainerAdapter.class);
logger.info("Container adapter created="+adapter);
// Then add roster listener
registerRosterListener();
...
In my tests (admittedly with just one host and one consumer as that's all I've had time for today), this allows the getService call to succeed, and it returns the remote IExampleService as desired (on the consumer, of course).
I understand that this behavior should probably be fixed...i.e. perhaps optionally having the rs adapter created as eagerly as possible. There's another interaction issue, however, with the xmpp connect and presence delivery sequence...since xmpp presence messages are delivered asynchronously to the connect sequence it's possible to receive a presence notification *before* the thread that calls connect is complete...and for the current XMPP container the connect has to be completed before it can start sending messages (i.e. if it receives a presence message and as a result registers a remote service as your code does, then at that point it is sending a message).
So the upshot of this is that currently the thread that calls connect should first call connect, then get the IRemoteServiceContainerAdapter (as above change does), and then register presence listeners (that have the behavior of registering remote services and consequently sending messages).
I will think about how this can/should be made less restrictive while still avoiding the timing issues. There may be something that can be done in the container implementation to deal with these asynchrony-induced complications. I'll also think about alternatives WRT the whole service registration/lookup protocol for XMPP...i.e. to deal with the fact that XMPP has no context for host registration (i.e. no way to know which users the registration is intended for in advance).
I'm going on vacation for 5 days, and will be away from Internet access (and even phone access). I will think about these issues while gone, and will return to fixing them properly upon return. In the mean time, I think the workaround should get you going.
Created attachment 173954 [details]
New SessionServiceImpl.java
This is a new SessionServiceImpl.java source file, with the adapter access discussed in comments, and with a IRemoteServiceListener added for debugging (will asynchronously notify when a remote service is added).
(In reply to comment #9) Hi Scott, > I've tracked down the problem. The main reason that your code is not working > is because the initialization of the RegistrySharedObject (actually class > org.eclipse.ecf.internal.provider.xmpp.XMPPRemoteServiceAdapterFactory.XMPPRegistrySharedObject) > is *lazy*...meaning that the initialization/creation of the remote service > registry doesn't happen until the first access...and in the case of the > consumer the first access might very well be only when the user calls > 'getService' from the command provider. If that's when it's initialized (as it > is in your code), then it will have already missed the add registration message > sent from the remote host. > > The workaround is very simple, however...it's just to initialize the remote > service adapter 'more' eagerly...e.g. this code change to > SessionServiceImpl.connect does it: > > public void connect(String userName, String password, String server) > throws URISyntaxException, ECFException { > > ... > > container.connect(xmppid, connectContext); > > // XXX this is added so that the remote service registry gets initialized > // The first call to getAdapter will result in the creation/addition of the > // XMPPRegistrySharedObject > IRemoteServiceContainerAdapter adapter = (IRemoteServiceContainerAdapter) > container.getAdapter(IRemoteServiceContainerAdapter.class); > logger.info("Container adapter created="+adapter); > > // Then add roster listener > registerRosterListener(); > I just tried this fix and it seems to work! I will do more testing within the next few days and see if the problem is really gone. Thanks for your help anyway! > I understand that this behavior should probably be fixed...i.e. perhaps > optionally having the rs adapter created as eagerly as possible. The reason why I got this problem is that I didn't know that the IRemoteServiceContainerAdapter had to be initialized first and I guess there will be some more ECF users with this problem. Resolving as worksforme. Will persue resolution of lazy-loading consequences as described in comment 8 and comment 10 via other bugs. |