Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 334453 - [remoteservice][r-osgi] ChannelWorkerThread: NullPointerException
Summary: [remoteservice][r-osgi] ChannelWorkerThread: NullPointerException
Status: RESOLVED FIXED
Alias: None
Product: ECF
Classification: RT
Component: ecf.providers (show other bugs)
Version: 3.4.0   Edit
Hardware: PC Windows Vista
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: ecf.core-inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-15 05:49 EST by Martin Petzold CLA
Modified: 2016-12-03 20:33 EST (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Petzold CLA 2011-01-15 05:49:41 EST
There are several services beeing registered roughly the same
time.

There is this ERROR ("getRemoteServiceReferences result is empty")
sometimes not, and often also with an Exception on provider side:

----- Consumer end-----

[log;+0100 2011.01.14
18:39:27:542;INFO;org.eclipse.ecf.remoteservice;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.remoteservice;code=0;message=No
async remote service interface found with
name=devsosgi.simulator.IAtomicSimulatorAsync for proxy service
class=devsosgi.simulator.IAtomicSimulator;severity2;exception=null;children=[]]]
[log;+0100 2011.01.14
18:39:27:557;INFO;org.eclipse.ecf.remoteservice;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.remoteservice;code=0;message=No
async remote service interface found with
name=devsosgi.simulator.IAtomicSimulatorAsync for proxy service
class=devsosgi.simulator.IAtomicSimulator;severity2;exception=null;children=[]]]
[log;+0100 2011.01.14
18:39:29:973;ERROR;org.eclipse.ecf.osgi.services.distribution;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.distribution;code=4;message=org.eclipse.ecf.internal.osgi.services.distribution.DiscoveredServiceTrackerImpl:handleDiscoveredServiceAvailable:getRemoteServiceReferences
result is empty. containerHelper=RemoteServiceContainer
[containerID=r-osgi://jumper:9278,
container=org.eclipse.ecf.internal.provider.r_osgi.R_OSGiRemoteServiceContainer@1b8d4c9a,
containerAdapter=org.eclipse.ecf.internal.provider.r_osgi.R_OSGiRemoteServiceContainer@1b8d4c9a]remoteReferences=null;severity4;exception=null;children=[]]]

----- Provider end-----

!ENTRY org.eclipse.osgi 4 0 2011-01-14 18:39:23.347
!MESSAGE An unexpected runtime error has occurred.
!STACK 0
java.lang.NullPointerException
    at
ch.ethz.iks.r_osgi.impl.RemoteOSGiServiceImpl$3.modifiedService(RemoteOSGiServiceImpl.java:556)
    at
org.osgi.util.tracker.ServiceTracker$Tracked.customizerModified(ServiceTracker.java:909)
    at org.osgi.util.tracker.AbstractTracked.track(AbstractTracked.java:237)
    at
org.osgi.util.tracker.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:840)
    at
org.eclipse.osgi.internal.serviceregistry.FilteredServiceListener.serviceChanged(FilteredServiceListener.java:104)
    at
org.eclipse.osgi.framework.internal.core.BundleContextImpl.dispatchEvent(BundleContextImpl.java:933)
    at
org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:227)
    at
org.eclipse.osgi.framework.eventmgr.ListenerQueue.dispatchEventSynchronous(ListenerQueue.java:149)
    at
org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.publishServiceEventPrivileged(ServiceRegistry.java:756)
    at
org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.publishServiceEvent(ServiceRegistry.java:711)
    at
org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.setProperties(ServiceRegistrationImpl.java:171)
    at
org.eclipse.ecf.internal.provider.r_osgi.R_OSGiRemoteServiceContainer.registerRemoteService(R_OSGiRemoteServiceContainer.java:370)
    at
org.eclipse.ecf.internal.provider.r_osgi.R_OSGiRemoteServiceContainer.registerRemoteService(R_OSGiRemoteServiceContainer.java:342)
    at
org.eclipse.ecf.internal.osgi.services.distribution.EventHookImpl.handleRegisteredServiceEvent(EventHookImpl.java:126)
    at
org.eclipse.ecf.internal.osgi.services.distribution.Activator.registerExistingServices(Activator.java:196)
    at
org.eclipse.ecf.internal.osgi.services.distribution.Activator.access$0(Activator.java:187)
    at
org.eclipse.ecf.internal.osgi.services.distribution.Activator$1.run(Activator.java:181)
    at java.lang.Thread.run(Unknown Source)

!ENTRY ch.ethz.iks.r_osgi.remote 4 0 2011-01-14 18:39:23.348
!MESSAGE 
!STACK 0
java.lang.NullPointerException
    at
ch.ethz.iks.r_osgi.impl.RemoteOSGiServiceImpl$3.modifiedService(RemoteOSGiServiceImpl.java:556)
    at
org.osgi.util.tracker.ServiceTracker$Tracked.customizerModified(ServiceTracker.java:909)
    at org.osgi.util.tracker.AbstractTracked.track(AbstractTracked.java:237)
    at
org.osgi.util.tracker.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:840)
    at
org.eclipse.osgi.internal.serviceregistry.FilteredServiceListener.serviceChanged(FilteredServiceListener.java:104)
    at
org.eclipse.osgi.framework.internal.core.BundleContextImpl.dispatchEvent(BundleContextImpl.java:933)
    at
org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:227)
    at
org.eclipse.osgi.framework.eventmgr.ListenerQueue.dispatchEventSynchronous(ListenerQueue.java:149)
    at
org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.publishServiceEventPrivileged(ServiceRegistry.java:756)
    at
org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.publishServiceEvent(ServiceRegistry.java:711)
    at
org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.setProperties(ServiceRegistrationImpl.java:171)
    at
org.eclipse.ecf.internal.provider.r_osgi.R_OSGiRemoteServiceContainer.registerRemoteService(R_OSGiRemoteServiceContainer.java:370)
    at
org.eclipse.ecf.internal.provider.r_osgi.R_OSGiRemoteServiceContainer.registerRemoteService(R_OSGiRemoteServiceContainer.java:342)
    at
org.eclipse.ecf.internal.osgi.services.distribution.EventHookImpl.handleRegisteredServiceEvent(EventHookImpl.java:126)
    at
org.eclipse.ecf.internal.osgi.services.distribution.Activator.registerExistingServices(Activator.java:196)
    at
org.eclipse.ecf.internal.osgi.services.distribution.Activator.access$0(Activator.java:187)
    at
org.eclipse.ecf.internal.osgi.services.distribution.Activator$1.run(Activator.java:181)
    at java.lang.Thread.run(Unknown Source)
[log;+0100 2011.01.14
18:39:27:369;INFO;org.eclipse.ecf.osgi.services.discovery;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.discovery;code=1;message=serviceDiscovered:ignoring
loopback ServiceReference:
ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://192.168.0.104:9279/svc_qmjBascz9BO4EwHO4/uSYij33dU=;full=_osgiservices._tcp.default._iana@osgiservices://192.168.0.104:9279/svc_qmjBascz9BO4EwHO4/uSYij33dU=];severity1;exception=null;children=[]]]
[14.01.2011 18:39:28][devsosgi.simulator.atomic.AtomicComponentTracker] INFO :
devsosgi.model.crossing.Crossing -> Atomic simulator created and registered.
[log;+0100 2011.01.14
18:39:29:972;INFO;org.eclipse.ecf.osgi.services.discovery;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.discovery;code=1;message=serviceDiscovered:ignoring
loopback ServiceReference:
ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://192.168.0.104:9279/svc_0PPDnFHzpBeWvffGume2FweTrHk=;full=_osgiservices._tcp.default._iana@osgiservices://192.168.0.104:9279/svc_0PPDnFHzpBeWvffGume2FweTrHk=];severity1;exception=null;children=[]]]

----- Sometimes with this on provider end-----

Exception in thread "r-OSGi ChannelWorkerThread0"
java.lang.NullPointerException
    at
ch.ethz.iks.r_osgi.impl.ChannelEndpointImpl.populateLease(ChannelEndpointImpl.java:1383)
    at
ch.ethz.iks.r_osgi.impl.ChannelEndpointImpl.handleMessage(ChannelEndpointImpl.java:1025)
    at
ch.ethz.iks.r_osgi.impl.ChannelEndpointImpl$2.run(ChannelEndpointImpl.java:288)
    at
ch.ethz.iks.r_osgi.impl.ChannelEndpointImpl$1.run(ChannelEndpointImpl.java:253)
Comment 1 Markus Kuppe CLA 2011-01-15 05:55:03 EST
To see if this is a race condition please synchronize org.eclipse.ecf.internal.osgi.services.distribution.EventHookImpl.handleRegisteredServiceEvent(ServiceReference, Collection) and rerun your tests.
Comment 2 Martin Petzold CLA 2011-01-15 06:43:46 EST
In this case I have a ServiceTracker tracking local services in the framework and in the addingService() method registering for each of them a remote service (different type and use!).

The error is now fixed for me with a synchronize around my registering inside of addingService().
Comment 3 Markus Kuppe CLA 2011-01-15 06:45:31 EST
(In reply to comment #2)
> In this case I have a ServiceTracker tracking local services in the framework
> and in the addingService() method registering for each of them a remote service
> (different type and use!).
> 
> The error is now fixed for me with a synchronize around my registering inside
> of addingService().

That's not what I wanted to know. Please synchronize org.eclipse.ecf.internal.osgi.services.distribution.EventHookImpl.handleRegisteredServiceEvent(ServiceReference,
Collection) instead of your client code.
Comment 4 Martin Petzold CLA 2011-01-15 06:48:23 EST
Hi Markus,

I know, was just happy to have it running for me just now =).

I will check the other synchronize in a while, so we can fix this in the distribution.
Comment 5 Martin Petzold CLA 2011-01-15 09:59:18 EST
I have now used the working project of "org.eclipse.ecf..osgi.services.distribution" and the problem does not come up anymore (without changing "handleRegisteredServiceEvent(...)" to synchronized).

Possibly this bug has been fixed due to this fix https://bugs.eclipse.org/bugs/show_bug.cgi?id=331836

I'm really sorry I did not do the update before! Thanks a lot!

*** This bug has been marked as a duplicate of bug 331836 ***
Comment 6 Martin Petzold CLA 2011-01-15 10:54:50 EST
I'm sorry, I have to reopen this. There is (only sometimes!) still the exception on the consumer side. It is very hard to force it, I will try to work on a test case!

-----

Exception in thread "r-OSGi ChannelWorkerThread0" java.lang.NullPointerException
	at ch.ethz.iks.r_osgi.impl.ChannelEndpointImpl.handleMessage(ChannelEndpointImpl.java:1084)
	at ch.ethz.iks.r_osgi.impl.ChannelEndpointImpl$2.run(ChannelEndpointImpl.java:288)
	at ch.ethz.iks.r_osgi.impl.ChannelEndpointImpl$1.run(ChannelEndpointImpl.java:253)
[log;+0100 2011.01.15 16:44:57:845;ERROR;org.eclipse.ecf.osgi.services.distribution;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.distribution;code=4;message=org.eclipse.ecf.internal.osgi.services.distribution.DiscoveredServiceTrackerImpl:handleDiscoveredServiceAvailable:getRemoteServiceReferences result is empty. containerHelper=RemoteServiceContainer [containerID=r-osgi://jumper:9278, container=org.eclipse.ecf.internal.provider.r_osgi.R_OSGiRemoteServiceContainer@13ca5df9, containerAdapter=org.eclipse.ecf.internal.provider.r_osgi.R_OSGiRemoteServiceContainer@13ca5df9]remoteReferences=null;severity4;exception=null;children=[]]]
Comment 7 Martin Petzold CLA 2011-02-01 08:21:09 EST
I am working on this bug.

But just for ECF 3.5, in my opinion this is critical (so I raised the importance, even though it's just with a lot rs interacting). Perhaps Jan Rellermeyer will have a look at it then?!
Comment 8 Markus Kuppe CLA 2011-02-01 10:30:46 EST
(In reply to comment #7)
> I am working on this bug.
> 
> But just for ECF 3.5, in my opinion this is critical (so I raised the
> importance, even though it's just with a lot rs interacting). 

Good question, does a bug in a provider justify postponing an ECF release. It's probably best if you take this discussion to the mailinglist. But given that there is no ETA for a possible fix, I'm not sure the community agrees to postpone 3.5 indefinitely.

> Perhaps Jan
> Rellermeyer will have a look at it then?!

I wouldn't count on it. AFAIK Jan is busy defending his Phd thesis. The person most likely to fix this, is you. 
Nevertheless, if you can isolate the issue to be a pure r-OSGi bug (manage to reproduce it independently of ECF), raising a bug over at http://r-osgi.sf.net won't hurt either.
Comment 9 Martin Petzold CLA 2011-02-01 11:14:08 EST
Just to document this. A 'SERVICE_MODIFIED' msg of a service is processed (in handleMessage) before the 'SERVICE_ADDED' msg of this service. So either a thread concurrency or a msg queue problem.

PUT: r-osgi://jumper:9279#1022RemoteServiceReference{r-osgi://jumper:9279#1022-[devsosgi.event.IEventHandler]}
ZooDiscovery> Service Discovered: 01.02.2011 17:06:10. ServiceInfo[uri=osgiservices://jumper:9279/svc_2V/AY3tGIuX369IEdm1bs9cKR7k=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://jumper:9279/svc_2V/AY3tGIuX369IEdm1bs9cKR7k=;full=_osgiservices._tcp.default._iana@osgiservices://jumper:9279/svc_2V/AY3tGIuX369IEdm1bs9cKR7k=];priority=0;weight=0;props=ServiceProperties[{event.topics=devsosgi/model/crossing_3_3/out_west, osgi.remote.service.interfaces=devsosgi.event.IEventHandler, ecf.rsvc.ns=ecf.namespace.r_osgi.remoteservice, ecf.sp.cns=ecf.namespace.r_osgi, ecf.sp.ect=ecf.r_osgi.peer, ecf.rsvc.id=org.eclipse.ecf.discovery.ServiceProperties$ByteArrayWrapper@1092d6d2, org.eclipse.ecf.internal.discovery.id=org.eclipse.ecf.discovery.ServiceProperties$ByteArrayWrapper@18557d7e, ecf.sp.cid=org.eclipse.ecf.discovery.ServiceProperties$ByteArrayWrapper@14a1e7ad}]]
r-osgi://jumper:9279#1024
RS MAP SIZE:294
[01.02.2011 17:06:10][org.apache.zookeeper.server.NIOServerCnxn] ERROR: Thread Thread[r-OSGi ChannelWorkerThread0,5,WorkerThreadsChannelEndpoint(TCPChannel (r-osgi://jumper:9279))] died
java.lang.NullPointerException
	at ch.ethz.iks.r_osgi.impl.ChannelEndpointImpl.handleMessage(ChannelEndpointImpl.java:1096)
	at ch.ethz.iks.r_osgi.impl.ChannelEndpointImpl$2.run(ChannelEndpointImpl.java:289)
	at ch.ethz.iks.r_osgi.impl.ChannelEndpointImpl$1.run(ChannelEndpointImpl.java:254)
PUT: r-osgi://jumper:9279#1024RemoteServiceReference{r-osgi://jumper:9279#1024-[devsosgi.event.IEventHandler]}
Comment 10 Martin Petzold CLA 2011-02-01 12:06:39 EST
For each msg a Thread is created and then put in a work queue (as in http://www.ibm.com/developerworks/library/j-jtp0730.html). But the second thread get's notified first or is just perhaps faster (because of multicore system)?

@Scott: This should be somehow a known issue, as concurrency speciallist, do you have any link to some multicore processing safe worker queue pattern?

----

MSG: 9 - 1 - 415
ZooDiscovery> Service Discovered: 01.02.2011 18:00:38. ServiceInfo[uri=osgiservices://jumper:9279/svc_4yTjR7pJM/jGwCgSmcGqAQattZ0=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://jumper:9279/svc_4yTjR7pJM/jGwCgSmcGqAQattZ0=;full=_osgiservices._tcp.default._iana@osgiservices://jumper:9279/svc_4yTjR7pJM/jGwCgSmcGqAQattZ0=];priority=0;weight=0;props=ServiceProperties[{event.topics=devsosgi/model/crossing_1_5/out_south, osgi.remote.service.interfaces=devsosgi.event.IEventHandler, ecf.rsvc.ns=ecf.namespace.r_osgi.remoteservice, ecf.sp.cns=ecf.namespace.r_osgi, ecf.sp.ect=ecf.r_osgi.peer, ecf.rsvc.id=org.eclipse.ecf.discovery.ServiceProperties$ByteArrayWrapper@2f6e4ddd, org.eclipse.ecf.internal.discovery.id=org.eclipse.ecf.discovery.ServiceProperties$ByteArrayWrapper@72ba007e, ecf.sp.cid=org.eclipse.ecf.discovery.ServiceProperties$ByteArrayWrapper@11768b0a}]]
MSG: 9 - 2 - 415
HANDLE MSG: 9 - 2 - 415
r-osgi://jumper:9279#415
HANDLE MSG: 9 - 1 - 415
PUT: r-osgi://jumper:9279#415RemoteServiceReference{r-osgi://jumper:9279#415-[devsosgi.event.IEventHandler]}
Comment 11 Scott Lewis CLA 2011-02-01 17:20:09 EST
(In reply to comment #10)
> For each msg a Thread is created and then put in a work queue (as in
> http://www.ibm.com/developerworks/library/j-jtp0730.html). But the second
> thread get's notified first or is just perhaps faster (because of multicore
> system)?
> 
> @Scott: This should be somehow a known issue, as concurrency speciallist, do
> you have any link to some multicore processing safe worker queue pattern?

First, I need to say that I haven't looked at this code at all (yet)...so I'm not completely clear on what's going wrong now.

From this bug, it sounds as if perhaps incoming messages are currently processed by multiple threads...and that this can lead to messages getting processed out of the correct order (because thread B created/started after thread A might actually have the message processed first).  This obviously is problematic...if the protocol depends upon message ordering (and I would be surprised if it didn't).

If message processing can occur out of order (by out-of-order processing by multiple ChannelWorkerThreads), then it seems it might make make more sense to have the socket read thread place messages into a fifo queue...which could then be read by a single message processing thread.  But before I speculate any further, I'll look some more at the existing thread.  But unfortunately I don't have a lot of resources (time) for this at the moment.  It would be best if Jan could be engaged (even with his PhD work), since he's obviously the person most familiar with this code.

One possibility...perhaps if this system property was to be set to 1:

-Dch.ethz.iks.r_osgi.threadsPerEndpoint=1

Since MAX_THREADS_PER_ENDPOINT defaults to 2...i.e:

	static final int MAX_THREADS_PER_ENDPOINT = Integer.getInteger(
			THREADS_PER_ENDPOINT, 2).intValue();

Setting THREADS_PER_ENDPOINT to 1 may make the issue go away (because then only one ChannelWorkerThread will be created rather than 2).
Comment 12 Martin Petzold CLA 2011-02-01 17:24:15 EST
Thanks Scott, the problem is exactly that. I have already disabled the threaded message processing, this system property is also nice to know!

Indeed it's not that easy to solve this problem, but now I know what it is about and will present some patch suggestion.
Comment 13 Markus Kuppe CLA 2011-02-01 23:25:43 EST
(In reply to comment #9)
> Just to document this. A 'SERVICE_MODIFIED' msg of a service is processed (in
> handleMessage) before the 'SERVICE_ADDED' msg of this service. So either a
> thread concurrency or a msg queue problem.

What does the network msg order look like? Maybe the producer happens to send the SERVICE_MODIFIED before SERVICE_ADDED.
Comment 14 Martin Petzold CLA 2011-02-02 05:12:30 EST
Network msg order is correct. As the output (#10) shows, first comes MSG with type 1 (ADDED) and then type 2 (MODIFIED), but the handler is called the other way round (HANDLE MSG). This is because the second thread in some case is a bit faster than the first one and calls handleEvent some ticks earlier.

The worker thread should work order safe for msgs form the same service id. Not that easy, had a first idea that failed, but I will find something!
Comment 15 Martin Petzold CLA 2011-02-20 10:00:47 EST
Reduced importance to normal due to workaround in #11.
Comment 16 Scott Lewis CLA 2015-07-24 18:51:34 EDT
Was there going to be something further done about this issue, or should it be resolved?
Comment 17 Scott Lewis CLA 2016-12-03 20:33:21 EST
Resolving due to work around presented in comment 11.  Please reopen if more is needed.