Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 315248 - [Discovery] [ZooDiscovery] GUID contains characters to confuse Zookeeper nodes buildup
Summary: [Discovery] [ZooDiscovery] GUID contains characters to confuse Zookeeper node...
Status: RESOLVED FIXED
Alias: None
Product: ECF
Classification: RT
Component: ecf.discovery (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows Vista
: P3 major (vote)
Target Milestone: 3.3.0   Edit
Assignee: Wim Jongman CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-01 12:22 EDT by Wim Jongman CLA
Modified: 2010-06-05 17:30 EDT (History)
3 users (show)

See Also:


Attachments
Patch - See description (19.94 KB, patch)
2010-06-04 08:17 EDT, Wim Jongman CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Wim Jongman CLA 2010-06-01 12:22:58 EDT
Zookeeper is using a node path in this form 'root/child/grandchild'. We have used the ECF GUID class to get an unique node. However, we have discovered (haha) that the GUID sometimes contains slashes. This is not according to the Zookeeper specs and will result in services not being published. We have a patch.
Comment 1 Scott Lewis CLA 2010-06-01 12:36:10 EDT
Hi Wim.  If the patch can be applied in next couple of days and tested and reviewed by a couple of committers before next Saturday 5/5, we should be able to get into RC4+1 (Monday, 5/7).
Comment 2 Scott Lewis CLA 2010-06-02 10:00:40 EDT
Setting target milestone to 3.3.0.  I hope we can get this in for RC4+1.
Comment 3 Wim Jongman CLA 2010-06-04 08:17:34 EDT
Created attachment 171092 [details]
Patch - See description

Here is a patch that fixes it along with other instabilities with the failing test on ecf2. I have setup a stresstest where every 15 seconds a service is activated by DS, picked up by Zookeeper and delivered to consumers. The host and three consumers are running already for three days non-stop and they work as expected.

From the stack trace, the following method was behind the failing hudson-zoodiscovery test
org.eclipse.ecf.provider.zookeeper.core.ZooDiscoveryContainer.getServices(ZooDiscoveryContainer.java:384)

Ahmed changed it so that it never returns a null but an empty collection instead even if the watchManager is null.
Ahmed also changed  method NodeReader.dispose() because it has the potential of throwing a null pointer exception.
Some unnecessary/error prone  threads/ static fields were taken care of.
Comment 4 Scott Lewis CLA 2010-06-04 10:47:54 EDT
(In reply to comment #3)
> Created an attachment (id=171092) [details]
> Patch - See description
> 
> Here is a patch that fixes it along with other instabilities with the failing
> test on ecf2. I have setup a stresstest where every 15 seconds a service is
> activated by DS, picked up by Zookeeper and delivered to consumers. The host
> and three consumers are running already for three days non-stop and they work
> as expected.
> 
> From the stack trace, the following method was behind the failing
> hudson-zoodiscovery test
> org.eclipse.ecf.provider.zookeeper.core.ZooDiscoveryContainer.getServices(ZooDiscoveryContainer.java:384)
> 
> Ahmed changed it so that it never returns a null but an empty collection
> instead even if the watchManager is null.
> Ahmed also changed  method NodeReader.dispose() because it has the potential of
> throwing a null pointer exception.
> Some unnecessary/error prone  threads/ static fields were taken care of.


Thanks Wim.  I would like to get this patch reviewed, approved, and released to head by end of day tomorrow (Sat 6/5), for RC4+1 build (Sunday evening pacific time).

I will try to review this later today/Friday and give my +1.  I would like Markus to review as well...and we have Wim's review of course.
Comment 5 Markus Kuppe CLA 2010-06-04 13:40:34 EDT
Hi,

+1

just one issue:

- org.eclipse.ecf.provider.zookeeper.DiscoveryActivator.start(...).new ServiceTracker() {...}.removedService(ServiceReference, Object) has to call super.removedService(...) in org.osgi.util.tracker.ServiceTracker.removedService(ServiceReference, Object)

And please for patches this close to a release only change what is strictly necessary to fix the bug (especially formatting). Makes it a lot easier for us reviewer. :-)
Comment 6 Wim Jongman CLA 2010-06-05 09:07:02 EDT
Hi, Thanks for the review. And sorry for the work, you know how it goes...
I have added the super call.
Comment 7 Scott Lewis CLA 2010-06-05 12:04:05 EDT
+1 on patch 171092.  Thanks for the quick work and testing. Please release to HEAD and we'll introduce for rc4+1 build 5/6.
Comment 8 Wim Jongman CLA 2010-06-05 17:30:34 EDT
released to HEAD