Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 318480

Summary: Remote C/C++ build does not cancel
Product: [Tools] PTP Reporter: Peter Wang <peterwan>
Component: RDTAssignee: Chris Recoskie <recoskie>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: dmcknigh, g.watson, ptp-inbox, recoskie, vivkong
Version: unspecifiedFlags: g.watson: review+
Target Milestone: 4.0.3   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on: 318372    
Bug Blocks:    
Attachments:
Description Flags
patch, replaced runCommand
none
patch that uses the spawner to launch remote commands
recoskie: iplog-
updated patch - externalizes strings in plugin.xml recoskie: iplog-

Description Peter Wang CLA 2010-06-30 10:52:29 EDT
Cancelling a remote C/C++ build currently just hangs the UI while the build completes on the remote shell.
Comment 1 Peter Wang CLA 2010-06-30 11:09:10 EDT
Created attachment 173105 [details]
patch, replaced runCommand

Please commit to HEAD / ptp_4_0
Comment 2 Chris Recoskie CLA 2010-06-30 11:17:25 EDT
This will have to wait until the issue in Bug 318372 is resolved.
Comment 3 Chris Recoskie CLA 2010-07-12 09:30:03 EDT
Ok so I've looked into this patch, and we're in a bit of a tough spot.

We currently use IHostShell.runCommand(...).  This API runs a single command, and then exits.  The reason that cancel does not work right in this case is that RSE launches the external process with Runtime.exec(...).  The underlying implementation of this on UNIX retains a PID of the external process in order that it can send it signals, such as when you need to kill it with Process.destroy().  However, there are some issues with this.  Firstly, the PID it retains is of the parent shell process (csh/bash/whatever).  Secondly, there are a lot of bugs open on Java about how doing Process.destroy() does not always kill child processes.  The upshot of this is that when you try to cancel the build command, RSE does a Process.destroy() like it should, but this only kills the shell process, and make continues running.  I guess make is ignoring SIGTERM?

I tried the patch Peter created, and it's no good because some things have changed in RSE since the release that was used in RDp 7.6.  Firstly, using launchShell(...) and writeToShell(...) doesn't provide you a way of knowing when the command you wrote to it completes.  So in the case of a normal build, the builder just hangs there forever after the build has actually finished on the remote end, with there being no way for it to know it should stop.  Secondly, there is a bunch of garbage that gets printed to the console because any semicolons in the output get escaped, but the escaping doesn't get converted back.

The real solution to this problem is that we need to do what CDT does and have a native spawner that launches the external process.  That way we can get PIDs for any child processes if we so choose, send whatever signal we want to them (SIGKILL for instance if SIGTERM doesn't work).  This is the only way of reliably ensuring that a launched process is terminated, and is the reason that CDT went that direction.  However, this means introducing native code into the server, so there would be releng work involved, and the servers would not be as easily portable (right now it will pretty much work on anything with Java 1.5).  I think we don't have a choice though.  People need to cancel builds.

Since CDT already has a spawner, it doesn't make sense to write a new one.  We should just bundle the CDT spawner with the server.  The questions is, what form should that take?  Should RSE use the spawner in the dstore server?  That means adding a dependency on CDT to RSE, which they probably won't want.  We could add another Miner to PTP that allowed for process launching and manipulation, since PTP already has a dependency on CDT.  Then the RSEProcess class could talk to that Miner in order to launch processes, instead of launching them with the existing API.
Comment 4 Chris Recoskie CLA 2010-07-12 10:29:01 EDT
RSE could also copy/paste the spawner stuff into RSE.  However, that will mean updates to the original in CDT won't propagate automatically.
Comment 5 Greg Watson CLA 2010-07-12 11:23:40 EDT
Remote Tools doesn't have this problem since it starts remote processes in a subshell which is killed using SIGKILL. I would not like to see a dependency that also kills the server's portability because of RSE limitations, unless it was optional. I guess that means that we'd need to have a Remote Tools-only version, which is not ideal, but may be the only option. Alternatively, perhaps RDT could use an RSE SSH connection for the remote builder instead of the dstore connection?
Comment 6 Chris Recoskie CLA 2010-07-12 11:47:09 EDT
(In reply to comment #5)
> Remote Tools doesn't have this problem since it starts remote processes in a
> subshell which is killed using SIGKILL. I would not like to see a dependency
> that also kills the server's portability because of RSE limitations, unless it
> was optional. I guess that means that we'd need to have a Remote Tools-only
> version, which is not ideal, but may be the only option. Alternatively, perhaps
> RDT could use an RSE SSH connection for the remote builder instead of the
> dstore connection?

The dependency would only come into play if you were using RSE anyway.  Remote Tools wouldn't be affected.

We could make it so that if the spawner can't be loaded, we fall back to calling the existing API.  Then of course the build won't cancel.  It shouldn't be a problem to provide a spawner for all our officially supported platforms as CDT already supplies them for a lot of platforms.

We have a requirement that everything must work with one dstore connection, so using SSH only is not an option.
Comment 7 Chris Recoskie CLA 2010-08-11 16:54:39 EDT
Created attachment 176408 [details]
patch that uses the spawner to launch remote commands

The attached patch changes the RSEProcesssBuilder so that it will attempt to use the new SpawnerSubsystem/SpawnerMiner in order to launch processes.  If the attempt fails, then it will fall back to using the existing RSE API to launch the command.

Vivian is working on the releng aspect of modifying the build scripts to build all the servers.  So far we plan on having a Generic Unix, Generic Linux, AIX, Linux/x86, and Linux/PPC servers.  The Generic servers will not bundle a spawner library, so they will cause the client to use the fallback behaviour of launching with RSE API.

This patch also has some added benefits.  Firstly, you can now properly override environment variables that the shell initialization file already defined.  Secondly, it seems to have sped up the build marginally.  The speedup is just in terms of the output processing and the build operation within the IDE... the speed of the external make process is of course unchanged.
Comment 8 Vivian Kong CLA 2010-08-11 17:01:44 EDT
(In reply to comment #7)
> so far we plan on having a Generic Unix, Generic Linux, AIX,
> Linux/x86, and Linux/PPC servers.  The Generic servers will not bundle a
> spawner library, so they will cause the client to use the fallback behaviour of
> launching with RSE API.

We will continue to have Generic MacOSx and Generic Windows servers as well.
Comment 9 Chris Recoskie CLA 2010-08-11 17:35:52 EDT
Created attachment 176414 [details]
updated patch - externalizes strings in plugin.xml
Comment 10 Chris Recoskie CLA 2010-08-12 16:34:10 EDT
Patch applied to ptp_4_0 and HEAD

Releng changes still pending.
Comment 11 Vivian Kong CLA 2010-08-13 09:12:10 EDT
(In reply to comment #10)
> Patch applied to ptp_4_0 and HEAD
> 
> Releng changes still pending.

Releng changes are now in ptp_4_0 and HEAD.

Here's a build if anyone is interested to try the new server packages:
/home/www/tools/ptp/builds/helios/I.I201008130756
Comment 12 Chris Recoskie CLA 2010-08-13 09:15:13 EDT
I'm marking this as FIXED.

BTW there seems to be some kind of problem with the spawner library for Linux/PPC.  It segfaults the JVM when loaded.  The one from CDT was built in 2005.  It might be just too old/built on a really old GCC.  Investigating whether building a new one helps at all.
Comment 13 Greg Watson CLA 2010-08-17 14:50:59 EDT
This patch introduces an API change in 4.0. Please enable API tooling so that you correct these problems in the future. Also, if we're going to make an exception in this case, I think it needs to be raised on the ptp-dev mailing list.
Comment 14 Chris Recoskie CLA 2010-08-18 09:17:24 EDT
(In reply to comment #13)
> This patch introduces an API change in 4.0. Please enable API tooling so that
> you correct these problems in the future. Also, if we're going to make an
> exception in this case, I think it needs to be raised on the ptp-dev mailing
> list.

Huh?

I was using the API tooling... there are no APIs added that I am aware of.  The bulk of the changes are in internal classes (which are flagged as such in the manifest file).  The rest are method body changes, which doesn't constitute an API change.

Can you point to something specific?
Comment 15 Greg Watson CLA 2010-08-18 09:55:19 EDT
Unfortunately the API tooling doesn't tell you *what* has changed. All I'm seeing is:

Description	Resource	Path	Location	Type
The minor version should be incremented in version 4.0.1.qualifier, since new APIs have been added since version 4.0.0.201006142322	MANIFEST.MF	/org.eclipse.ptp.remote.rse.core/META-INF	line 5	Version Numbering Problem

I wonder if the problem is that the messages package is exported? It probably shouldn't be.
Comment 16 Chris Recoskie CLA 2010-08-18 10:07:19 EDT
(In reply to comment #15)
> Unfortunately the API tooling doesn't tell you *what* has changed. All I'm
> seeing is:
> Description    Resource    Path    Location    Type
> The minor version should be incremented in version 4.0.1.qualifier, since new
> APIs have been added since version 4.0.0.201006142322    MANIFEST.MF   
> /org.eclipse.ptp.remote.rse.core/META-INF    line 5    Version Numbering
> Problem
> I wonder if the problem is that the messages package is exported? It probably
> shouldn't be.

Hmmm... I have seen that error transiently, so I have been ignoring it as I was assuming it was a glitch in the tooling.  Usually doing a clean build gets rid of it.

Since the package policy at Eclipse is best summarized as "discourage, but don't block access", if that package is the source of the problem, I'll mark it as internal.
Comment 17 Greg Watson CLA 2010-08-18 10:16:35 EDT
I tried cleaning but the thing is persistently there, so I guess it's real.

I've made all the PTP message packages internal as it's one less API to have to worry about. It also seems pretty unlikely that an external plugin is going to need access. In hindsight, I should have named the packages internal. Oh well...
Comment 18 Chris Recoskie CLA 2010-08-18 10:48:22 EDT
(In reply to comment #17)
> I tried cleaning but the thing is persistently there, so I guess it's real.
> I've made all the PTP message packages internal as it's one less API to have to
> worry about. It also seems pretty unlikely that an external plugin is going to
> need access. In hindsight, I should have named the packages internal. Oh
> well...

I'll mark the package as internal then.
Comment 19 Chris Recoskie CLA 2010-08-18 11:01:10 EDT
(In reply to comment #18)
> I'll mark the package as internal then.

This has been done on ptp_4_0 and HEAD.