| Summary: | [zoodiscovery] hang in WatchManager.publish | ||
|---|---|---|---|
| Product: | [RT] ECF | Reporter: | Scott Lewis <slewis> |
| Component: | ecf.discovery | Assignee: | Scott Lewis <slewis> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | critical | ||
| Priority: | P3 | CC: | ahmed.aadel, bugs.eclipse.org, wim.jongman |
| Version: | 3.5.0 | ||
| Target Milestone: | 3.5.2 | ||
| Hardware: | PC | ||
| OS: | Windows XP | ||
| Whiteboard: | |||
|
Description
Scott Lewis
Setting target milestone, and adding people to cc list. Moving to critical, as I think this has a number of far-reaching effects for users of zookeeper discovery...especially in server environments. As I cannot reproduce your case, would you please test the following and see if it fixes the problem. Please swap method: WatchManager.publish() with:
public void publish(AdvertisedService published) {
Assert.isNotNull(published);
String serviceid = published.getServiceID().getServiceTypeID().getInternal();
if (getNodeWriters().containsKey(serviceid))
return;
try {
/* wait for the server to get ready */
while (!writeRootLock.isOpen())
Thread.sleep(300);
} catch (InterruptedException e) {
Logger.log(LogService.LOG_DEBUG, e.getMessage(), e);
}
NodeWriter nodeWriter = new NodeWriter(published, writeRoot);
getNodeWriters().put(serviceid, nodeWriter);
allKnownServices.put(published.getServiceID().getName(), published);
nodeWriter.publish();
}
(In reply to comment #3) Thanks for the patch Ahmed. I've applied the patch and done some very basic regression testing and so far I haven't been able to reproduce the hang. That's good, of course...but I have a clarifying question: How is it guaranteed that !writeRootLock.isOpen() eventually will fail (and sleeping will stop)? Is it via successful completion of the watch() method? Also...there are two publish methods (i.e. AdvertisedService and ServiceReference). Should the ServiceReference one be modified as well? If so, please provide another code fragment as before. Proposed fix pushed to master: http://git.eclipse.org/c/ecf/org.eclipse.ecf.git/commit/?id=af351781639f9798e185a329186970771ee24680 I'll leave bug open for continued/additional testing over next few days. Please report any testing here. In my tests so far, the issue has not shown itself. We'll hope it's gone and not to return. I'll resolve this bug and reopen if necessary. Changing target milestone to 3.6 (In reply to comment #6) > Changing target milestone to 3.6 Why not 3.5.2? This bug is resolved (for 3.5.2), so my changing to 3.6 target was a mistake. |