Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 349176 - [provider] Allow generic server instantiator to use any port as default
Summary: [provider] Allow generic server instantiator to use any port as default
Status: VERIFIED FIXED
Alias: None
Product: ECF
Classification: RT
Component: ecf.server (show other bugs)
Version: unspecified   Edit
Hardware: PC All
: P3 normal (vote)
Target Milestone: 3.5.2   Edit
Assignee: Scott Lewis CLA
QA Contact:
URL:
Whiteboard:
Keywords: helpwanted
Depends on:
Blocks:
 
Reported: 2011-06-13 06:51 EDT by Alex Blewitt CLA
Modified: 2011-06-20 14:25 EDT (History)
2 users (show)

See Also:


Attachments
Patch for 4daddd11eeeba5f896ce92e6434b82d3e06f37e4 (1.96 KB, patch)
2011-06-17 16:56 EDT, Alex Blewitt CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Blewitt CLA 2011-06-13 06:51:39 EDT
Both the R-OSGi and ECF Generic server suffer from the problem that they expect/want a specific port number on start-up. This should not be necessary if discovery is used, since the whole point of discovery is supposed to be able to announce the host/port that the service is available on. So this restriction, coupled with firing up multiple services on the same host, fails unless you explicitly manage the set of ports.

I'd argue that both the R-OSGi and ECF containers should, if created with the default arguments, should Just Work (TM). Below is the stacktrace of an example which is attempting to auto-create an ECF generic container, and failing, because something else is already on that port (another instance on the same machine):

It shouldn't be necessary to set a random property to make this work, it should Just Work out of the box.

So, it should be possible to do:

a = ContainerFactory.getDefault().createContainer("ecf.generic.server");
b = ContainerFactory.getDefault().createContainer("ecf.generic.server");

and have them both be successful in the same VM.

On the other hand, if someone explicitly requests a port then this should fail of the port can't be reused:

c = ContainerFactory.getDefault().createContainer("ecf.generic.server","localhost","1234");
d = ContainerFactory.getDefault().createContainer("ecf.generic.server","localhost","1234");

I think this can be achieved (in the GenericContainerInstantiator) by changing the logic (155-160) from:

		if (newID == null) {
			int defaultPort = TCPServerSOContainer.DEFAULT_PORT;
			boolean useFallbackPort = TCPServerSOContainer.DEFAULT_FALLBACK_PORT;
			if (useFallbackPort && !defaultPortIsFree(defaultPort)) {
				defaultPort = getFreePort();
			}

to:

		if (newID == null) {
			int defaultPort = TCPServerSOContainer.DEFAULT_PORT;
			if (!defaultPortIsFree(defaultPort)) {
				defaultPort = getFreePort();
			}


Note that the 'getFreePort' here suffers from a race condition if two servers are starting up at once; they may both report that port (default) is free, but when it comes to it, find out that the other has gotten there first. I'd almost prefer to get rid of the whole concept of 'default port' and replace it with a dynamically allocated port in any case, if it's not specified:

  if(newID == null) {
    ServerSocket ss = new ServerSocket();
    ss.bind(null);
    defaultPort = ss.getLocalPort();
    ss.close();
    ss = null;
  }

Since the bound server address will be from an ever-incrementing number of port numbers, you should be able to close this socket and then reuse it later (but I'd be tempted to do socket.setReuseAddress(true) when you recreate it).

Alternatively, could you pass the ServerSocket into the config and let that reuse the ServerSocket as is? I don't know enough of the wiring under the covers to know if that's feasible.


--- excerpt below --- 

osgi> [log;+0100 2011.06.13 11:24:30:228;ERROR;org.eclipse.ecf.osgi.services.remoteserviceadmin;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.remoteserviceadmin;code=4;message=org.eclipse.ecf.internal.osgi.services.distribution.BasicTopologyManager:handleInvalidExportRegistration:exportRegistration=org.eclipse.ecf.osgi.services.remoteserviceadmin.RemoteServiceAdmin$ExportRegistration@122dad5;severity4;exception=org.eclipse.ecf.osgi.services.remoteserviceadmin.SelectContainerException: Exception creating or configuring container;children=[]]]
org.eclipse.ecf.osgi.services.remoteserviceadmin.SelectContainerException: Exception creating or configuring container
	at org.eclipse.ecf.osgi.services.remoteserviceadmin.AbstractContainerSelector.createContainer(AbstractContainerSelector.java:153)
	at org.eclipse.ecf.osgi.services.remoteserviceadmin.AbstractHostContainerSelector.createRSContainer(AbstractHostContainerSelector.java:333)
	at org.eclipse.ecf.osgi.services.remoteserviceadmin.AbstractHostContainerSelector.createMatchingContainer(AbstractHostContainerSelector.java:319)
	at org.eclipse.ecf.osgi.services.remoteserviceadmin.AbstractHostContainerSelector.createAndConfigureHostContainers(AbstractHostContainerSelector.java:244)
	at org.eclipse.ecf.osgi.services.remoteserviceadmin.HostContainerSelector.selectHostContainers(HostContainerSelector.java:60)
	at org.eclipse.ecf.osgi.services.remoteserviceadmin.RemoteServiceAdmin.exportService(RemoteServiceAdmin.java:219)
	at org.eclipse.ecf.osgi.services.remoteserviceadmin.AbstractTopologyManager.handleServiceRegistering(AbstractTopologyManager.java:305)
	at org.eclipse.ecf.osgi.services.remoteserviceadmin.AbstractTopologyManager.handleEvent(AbstractTopologyManager.java:257)
	at org.eclipse.ecf.internal.osgi.services.distribution.BasicTopologyManager.event(BasicTopologyManager.java:87)
	at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.notifyEventHooksPrivileged(ServiceRegistry.java:1143)
	at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.publishServiceEventPrivileged(ServiceRegistry.java:743)
	at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.publishServiceEvent(ServiceRegistry.java:711)
	at org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.register(ServiceRegistrationImpl.java:130)
	at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.registerService(ServiceRegistry.java:206)
	at org.eclipse.osgi.framework.internal.core.BundleContextImpl.registerService(BundleContextImpl.java:507)
	at org.eclipse.equinox.internal.ds.InstanceProcess.registerService(InstanceProcess.java:504)
	at org.eclipse.equinox.internal.ds.InstanceProcess.buildComponents(InstanceProcess.java:212)
	at org.eclipse.equinox.internal.ds.Resolver.buildNewlySatisfied(Resolver.java:441)
	at org.eclipse.equinox.internal.ds.Resolver.enableComponents(Resolver.java:213)
	at org.eclipse.equinox.internal.ds.SCRManager.performWork(SCRManager.java:800)
	at org.eclipse.equinox.internal.ds.SCRManager$QueuedJob.dispatch(SCRManager.java:767)
	at org.eclipse.equinox.internal.ds.WorkThread.run(WorkThread.java:89)
	at java.lang.Thread.run(Thread.java:619)
Caused by: org.eclipse.ecf.core.ContainerCreateException: Create of containerType=ecf.generic.server failed.
	at org.eclipse.ecf.provider.generic.GenericContainerInstantiator.createInstance(GenericContainerInstantiator.java:220)
	at org.eclipse.ecf.core.ContainerFactory.createContainer(ContainerFactory.java:296)
	at org.eclipse.ecf.core.ContainerFactory.createContainer(ContainerFactory.java:387)
	at org.eclipse.ecf.core.ContainerFactory.createContainer(ContainerFactory.java:374)
	at org.eclipse.ecf.osgi.services.remoteserviceadmin.AbstractContainerSelector.createContainer(AbstractContainerSelector.java:135)
	... 22 more
Caused by: java.net.BindException: Address already in use: JVM_Bind
	at java.net.PlainSocketImpl.socketBind(Native Method)
	at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:359)
	at java.net.ServerSocket.bind(ServerSocket.java:319)
	at java.net.ServerSocket.<init>(ServerSocket.java:185)
	at java.net.ServerSocket.<init>(ServerSocket.java:141)
	at org.eclipse.ecf.provider.comm.tcp.Server.<init>(Server.java:40)
	at org.eclipse.ecf.provider.generic.TCPServerSOContainerGroup.putOnTheAir(TCPServerSOContainerGroup.java:53)
	at org.eclipse.ecf.provider.generic.TCPServerSOContainer.<init>(TCPServerSOContainer.java:76)
	at org.eclipse.ecf.provider.generic.TCPServerSOContainer.<init>(TCPServerSOContainer.java:112)
	at org.eclipse.ecf.provider.generic.GenericContainerInstantiator.createInstance(GenericContainerInstantiator.java:215)
	... 26 more
Comment 1 Alex Blewitt CLA 2011-06-13 06:53:19 EDT
Note that the use of the ephemeral ports may be desirable in the general case, and to deprecate the concept of a 'default port'.
Comment 2 Alex Blewitt CLA 2011-06-13 07:03:48 EDT
NB if the '-Dorg.eclipse.ecf.provider.generic.port.fallback=true' is set, this works. However, I can't think of a time when you wouldn't want this set to true.
Comment 3 Scott Lewis CLA 2011-06-13 11:04:38 EDT
In general, what you are asking for is not possible (having any provider use any port).  The reason for this is that different providers have different approaches to managing the port. 

WRT the issues described for the generic impl...please attach fix for race...and/or alternative desired implementation.

>NB if the '-Dorg.eclipse.ecf.provider.generic.port.fallback=true' is set, this
>works. However, I can't think of a time when you wouldn't want this set to
>true.

I would say that there are plenty of cases when having the port be set explicitly would be desired/desirable...e.g. when the remote services host is a server with specific port-range/security constraints.
Comment 4 Alex Blewitt CLA 2011-06-13 11:23:33 EDT
> I would say that there are plenty of cases when having the port be set
> explicitly would be desired/desirable...e.g. when the remote services host is a
> server with specific port-range/security constraints.

In that case, shouldn't a container with an explicit port be supplied for the creation, instead of leaving it through to the default 'create anything'?
Comment 5 Scott Lewis CLA 2011-06-13 12:19:13 EDT
(In reply to comment #4)
> > I would say that there are plenty of cases when having the port be set
> > explicitly would be desired/desirable...e.g. when the remote services host is a
> > server with specific port-range/security constraints.
> 
> In that case, shouldn't a container with an explicit port be supplied for the
> creation, instead of leaving it through to the default 'create anything'?

Yes...that's why (for example) the generic provider allows the specification of port (and hostname, and path) upon construction.

R-osgi was not designed that way, however.
Comment 6 Alex Blewitt CLA 2011-06-17 16:53:25 EDT
Patch (for generic server) available at:

https://github.com/alblue/ecf/tree/bug349176
Comment 7 Alex Blewitt CLA 2011-06-17 16:56:41 EDT
Created attachment 198215 [details]
Patch for 4daddd11eeeba5f896ce92e6434b82d3e06f37e4
Comment 8 Scott Lewis CLA 2011-06-17 23:37:16 EDT
Assigning.
Comment 9 Scott Lewis CLA 2011-06-18 00:12:48 EDT
(In reply to comment #7)
> Created attachment 198215 [details]
> Patch for 4daddd11eeeba5f896ce92e6434b82d3e06f37e4

After applying and looking over this patch, I have some questions.

It looks like the only effect of this patch is to eliminate the use of the 'DEFAULT_FALLBACK_PORT' system property...meaning that when not specified upon construction, instead of using a default port (gotten from DEFAULT_PORT) that a random open port would always be selected (via getFreePort()).  It seems to me that nother way to put this is that it switches the default for DEFAULT_FALLBACK_PORT from 'false' (there is a default...set to DEFAULT_PORT) to 'true' (the default is random no matter what the setting of DEFAULT_PORT).  But it also eliminates the possibility of having/using DEFAULT_PORT.

One reason this is an issue for me is that DEFAULT_PORT (and DEFAULT_FALLBACK_PORT) have been in place for some time, and this changes/eliminates the behavior that people have been/may be using.   That seems to be sort of a high price to avoid setting a system property (as it will break everyone that is using DEFAULT_PORT).

Perhaps a reasonable compromise would be to set the default of DEFAULT_FALLBACK_PORT to 'true' rather than 'false'...while leaving the option of setting it to false explicitly (and thereby using DEFAULT_PORT).  Then without setting any system properties the random open port would be used (your desired outcome), but people that wished to use DEFAULT_PORT could set DEFAULT_FALLBACK_PORT to 'false' (via system property) to use the DEFAULT_PORT rather than have a random port selected.
Comment 10 Scott Lewis CLA 2011-06-18 15:35:15 EDT
Adding Markus for comment.  Markus put in the code in GenericContainerInstantiator that's being modified by Alex's patch, so he should probably at least take a look at Alex's patch...and please also take a look at comment 9.
Comment 11 Markus Kuppe CLA 2011-06-18 15:44:54 EDT
(In reply to comment #9)
> Perhaps a reasonable compromise would be to set the default of
> DEFAULT_FALLBACK_PORT to 'true' rather than 'false'...while leaving the option
> of setting it to false explicitly (and thereby using DEFAULT_PORT).  Then
> without setting any system properties the random open port would be used (your
> desired outcome), but people that wished to use DEFAULT_PORT could set
> DEFAULT_FALLBACK_PORT to 'false' (via system property) to use the DEFAULT_PORT
> rather than have a random port selected.

The reason I originally set the default off was just to stay backward compatible as much as possible (IIRC it was close to a release).
But with Indigo out of the door, I'm +1 for flipping the default.
Comment 12 Scott Lewis CLA 2011-06-18 21:23:06 EDT
(In reply to comment #11)
> (In reply to comment #9)
> > Perhaps a reasonable compromise would be to set the default of
> > DEFAULT_FALLBACK_PORT to 'true' rather than 'false'...while leaving the option
> > of setting it to false explicitly (and thereby using DEFAULT_PORT).  Then
> > without setting any system properties the random open port would be used (your
> > desired outcome), but people that wished to use DEFAULT_PORT could set
> > DEFAULT_FALLBACK_PORT to 'false' (via system property) to use the DEFAULT_PORT
> > rather than have a random port selected.
> 
> The reason I originally set the default off was just to stay backward
> compatible as much as possible (IIRC it was close to a release).
> But with Indigo out of the door, I'm +1 for flipping the default.

Here's my proposal:

1) The default of TCPServerSOContainer.DEFAULT_FALLBACK_PORT is set to *true* (currently set to false as per Markus' comment above).

2) This be the changed logic in GenericContainerInstantiator (starting on line 155):

...
if (newID == null) {
  int defaultPort = TCPServerSOContainer.DEFAULT_PORT;
  // Default is true...i.e. useFallbackPort
  boolean useFallbackPort = TCPServerSOContainer.DEFAULT_FALLBACK_PORT;
  Object lock = new Object();
  // This prevents multithreaded reentrancy...to avoid timing issues
  // with selecting a port
  synchronized (lock) {
    // if useFallbackPort is true (now default) and the DEFAULT_PORT is 
    // not available, then a random free port is selected
    if (useFallbackPort && !defaultPortIsFree(defaultPort)) {
      defaultPort = getFreePort();
    }
  }
newID = IDFactory.getDefault().createStringID(TCPServerSOContainer.DEFAULT_PROTOCOL + "://" + TCPServerSOContainer.DEFAULT_HOST + ":" + defaultPort + TCPServerSOContainer.DEFAULT_NAME);//$NON-NLS-1$ //$NON-NLS-2$
	
1 should satisfy this original bug request...i.e. that the default for the generic provider is that if no id is specified to the instantiator, that an available port is selected...and that no system property need be set to get that to happen.

However, if someone wants to set and use DEFAULT_PORT, then they can set the DEFAULT_PORT and DEFAULT_FALLBACK_PORT (to false) via system properties and it will work that (old) way.

Also, I put a synchronized block around the defaultPortIsFree and getFreePort() in order to prevent timing issues with multithreaded access to the server creation (this is guaranteed not to happen in the RSA impl, but it could happen under other circumstances...so this synchronized block should prevent that.

Thanks Alex for the bug report and the patch/proposed fix.
Comment 13 Alex Blewitt CLA 2011-06-19 03:47:20 EDT
This doesn't fix the race condition though. If the property is set for ANY_PORT then we should just go ahead and select a getFreePort. Only if it is not set should we try DEFAULT and then fail if it is in use. 

The race is as follows:

* A checks default port free
** A gets ID with default port
* B checks default port free
** B gets ID with default port
* A starts SOServer
* B starts SOServer, fails

The lock around the first pair of bullets doesn't fix the problem because the race is to do with another method elsewhere.
Comment 14 Scott Lewis CLA 2011-06-19 12:55:08 EDT
(In reply to comment #13)
> This doesn't fix the race condition though. 

Yes thanks...I realized this shortly after posting and will produce a correct solution for the race.
Comment 15 Scott Lewis CLA 2011-06-19 19:06:16 EDT
Changed title to reflect change in scope from all providers to generic provider.

Pushed fix described in comment 12 (with change to synchronization as per comment 13) to master

http://git.eclipse.org/c/ecf/org.eclipse.ecf.git/commit/?id=c9f6bbbd80e7962f352766999764da27a942ba05

Resolving as fixed.
Comment 16 Alex Blewitt CLA 2011-06-20 04:35:54 EDT
This still doesn't fix the race condition; it fixes it in *this vm* but doesn't fix it if two servers are starting up at the same time.

The fix is not to check the default port if the flag is 'true', i.e. change it from:

if(useAnyport && !defaultPortIsFree)

to:

if(useAnyPort) {
 ...
} else {
  if(defaultPortIsFree()) 
    port = DEFAULT_PORT
  else
    throw
}
Comment 17 Markus Kuppe CLA 2011-06-20 05:26:12 EDT
I suggest to not close the socket in getFreePort() or defaultPortIsFree() and pass on the open socket object. That might require to instantiate a org.eclipse.ecf.provider.comm.tcp.Server instead of just a java.net.ServerSocket.
Comment 18 Alex Blewitt CLA 2011-06-20 08:54:24 EDT
I still think it's a bad idea to try and hover around any kind of blessed port by default. If you don't care what the port it is running on then it shouldn't even try to focus on some hard-coded port initially, and just live with an ephemeral one. Otherwise you start getting into situations where you accidentally assume that it's always running on port xxxx but the first time you launch two of them, you find that the second runs on a different port (c.f. the R-OSGi 'WARNING' message).

That said, I also agree with comment #16 that we really shouldn't be open/close/opening the same server socket, if we can pass it through the generic server objects.
Comment 19 Scott Lewis CLA 2011-06-20 10:55:25 EDT
(In reply to comment #18)
> I still think it's a bad idea to try and hover around any kind of blessed port
> by default. If you don't care what the port it is running on then it shouldn't
> even try to focus on some hard-coded port initially, and just live with an
> ephemeral one. Otherwise you start getting into situations where you
> accidentally assume that it's always running on port xxxx but the first time
> you launch two of them, you find that the second runs on a different port (c.f.
> the R-OSGi 'WARNING' message).

At this point (lots of history with a default port), we can't really change that behavior without breaking a number of clients...so whether it's a good idea or not, that's the way it is for existing providers.

Of course, new providers can be fairly easily created (e.g. based upon existing providers) that can have whatever behavior is desired.

> 
> That said, I also agree with comment #16 that we really shouldn't be
> open/close/opening the same server socket, if we can pass it through the
> generic server objects.

This will require an API change (new constructor), so will be some time in coming.  I think that in the short term (next couple of months), ECF is only going to be able to have a maintenance release (3.5.2), and API changes can't be introduced in a maintenance release.
Comment 20 Markus Kuppe CLA 2011-06-20 10:58:50 EDT
(In reply to comment #19)

> This will require an API change (new constructor), so will be some time in
> coming.  I think that in the short term (next couple of months), ECF is only
> going to be able to have a maintenance release (3.5.2), and API changes can't
> be introduced in a maintenance release.

Isn't there a difference between an API addition and an API change?
Comment 21 Alex Blewitt CLA 2011-06-20 11:00:05 EDT
(In reply to comment #19)
> At this point (lots of history with a default port), we can't really change that behavior without breaking a number of clients...so whether it's a good idea or not, that's the way it is for existing providers.

But we've just changed the default from 'fail hard and fast' to 'allow any port'. Why can't we also extend this to *not* start on a so-called blessed port as well? The flag can be used to enable only blessed port if needed.
Comment 22 Alex Blewitt CLA 2011-06-20 11:01:52 EDT
NB re comment #19 this still has demonstrable race conditions in it with multiple processes. Changing the default to not start on 'blessed port' would fix those race conditions, and still give a configurable way of enabling that if needed.
Comment 23 Scott Lewis CLA 2011-06-20 13:36:18 EDT
ok.  I've changed the generic instantiator to have the following logic:

...
int port = -1;
if (TCPServerSOContainer.DEFAULT_FALLBACK_PORT) {
  port = getFreePort();
} else if (portIsFree(TCPServerSOContainer.DEFAULT_PORT)) {
  port = TCPServerSOContainer.DEFAULT_PORT;
}
if (port < 0) throw new IDCreateException("Server port for generic server cannot be -1"); 
...create ID and container

and I've pushed this to master:

http://git.eclipse.org/c/ecf/org.eclipse.ecf.git/commit/?id=c6f3801e8baf62dd43aa6d67f60cdbe2e260f90b

If other use cases for managing the port upon container construction (e.g. selecting from defined port range, etc) are needed, then it would be a good idea to define a new provider to do so...as that way breaking/surprising old clients won't be an issue.
Comment 24 Alex Blewitt CLA 2011-06-20 14:25:03 EDT
I think the new approach looks good. I suspect that the TCPServerSOContainer.DEFAULT_FALLBACK_PORT is now misleadingly named but this may be API and so not changeable in the near future.