| Summary: | Remote C/C++ build does not cancel | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Tools] PTP | Reporter: | Peter Wang <peterwan> | ||||||||
| Component: | RDT | Assignee: | Chris Recoskie <recoskie> | ||||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||||
| Severity: | normal | ||||||||||
| Priority: | P3 | CC: | dmcknigh, g.watson, ptp-inbox, recoskie, vivkong | ||||||||
| Version: | unspecified | Flags: | g.watson:
review+
|
||||||||
| Target Milestone: | 4.0.3 | ||||||||||
| Hardware: | PC | ||||||||||
| OS: | Windows XP | ||||||||||
| Whiteboard: | |||||||||||
| Bug Depends on: | 318372 | ||||||||||
| Bug Blocks: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Peter Wang
Created attachment 173105 [details]
patch, replaced runCommand
Please commit to HEAD / ptp_4_0
This will have to wait until the issue in Bug 318372 is resolved. Ok so I've looked into this patch, and we're in a bit of a tough spot. We currently use IHostShell.runCommand(...). This API runs a single command, and then exits. The reason that cancel does not work right in this case is that RSE launches the external process with Runtime.exec(...). The underlying implementation of this on UNIX retains a PID of the external process in order that it can send it signals, such as when you need to kill it with Process.destroy(). However, there are some issues with this. Firstly, the PID it retains is of the parent shell process (csh/bash/whatever). Secondly, there are a lot of bugs open on Java about how doing Process.destroy() does not always kill child processes. The upshot of this is that when you try to cancel the build command, RSE does a Process.destroy() like it should, but this only kills the shell process, and make continues running. I guess make is ignoring SIGTERM? I tried the patch Peter created, and it's no good because some things have changed in RSE since the release that was used in RDp 7.6. Firstly, using launchShell(...) and writeToShell(...) doesn't provide you a way of knowing when the command you wrote to it completes. So in the case of a normal build, the builder just hangs there forever after the build has actually finished on the remote end, with there being no way for it to know it should stop. Secondly, there is a bunch of garbage that gets printed to the console because any semicolons in the output get escaped, but the escaping doesn't get converted back. The real solution to this problem is that we need to do what CDT does and have a native spawner that launches the external process. That way we can get PIDs for any child processes if we so choose, send whatever signal we want to them (SIGKILL for instance if SIGTERM doesn't work). This is the only way of reliably ensuring that a launched process is terminated, and is the reason that CDT went that direction. However, this means introducing native code into the server, so there would be releng work involved, and the servers would not be as easily portable (right now it will pretty much work on anything with Java 1.5). I think we don't have a choice though. People need to cancel builds. Since CDT already has a spawner, it doesn't make sense to write a new one. We should just bundle the CDT spawner with the server. The questions is, what form should that take? Should RSE use the spawner in the dstore server? That means adding a dependency on CDT to RSE, which they probably won't want. We could add another Miner to PTP that allowed for process launching and manipulation, since PTP already has a dependency on CDT. Then the RSEProcess class could talk to that Miner in order to launch processes, instead of launching them with the existing API. RSE could also copy/paste the spawner stuff into RSE. However, that will mean updates to the original in CDT won't propagate automatically. Remote Tools doesn't have this problem since it starts remote processes in a subshell which is killed using SIGKILL. I would not like to see a dependency that also kills the server's portability because of RSE limitations, unless it was optional. I guess that means that we'd need to have a Remote Tools-only version, which is not ideal, but may be the only option. Alternatively, perhaps RDT could use an RSE SSH connection for the remote builder instead of the dstore connection? (In reply to comment #5) > Remote Tools doesn't have this problem since it starts remote processes in a > subshell which is killed using SIGKILL. I would not like to see a dependency > that also kills the server's portability because of RSE limitations, unless it > was optional. I guess that means that we'd need to have a Remote Tools-only > version, which is not ideal, but may be the only option. Alternatively, perhaps > RDT could use an RSE SSH connection for the remote builder instead of the > dstore connection? The dependency would only come into play if you were using RSE anyway. Remote Tools wouldn't be affected. We could make it so that if the spawner can't be loaded, we fall back to calling the existing API. Then of course the build won't cancel. It shouldn't be a problem to provide a spawner for all our officially supported platforms as CDT already supplies them for a lot of platforms. We have a requirement that everything must work with one dstore connection, so using SSH only is not an option. Created attachment 176408 [details]
patch that uses the spawner to launch remote commands
The attached patch changes the RSEProcesssBuilder so that it will attempt to use the new SpawnerSubsystem/SpawnerMiner in order to launch processes. If the attempt fails, then it will fall back to using the existing RSE API to launch the command.
Vivian is working on the releng aspect of modifying the build scripts to build all the servers. So far we plan on having a Generic Unix, Generic Linux, AIX, Linux/x86, and Linux/PPC servers. The Generic servers will not bundle a spawner library, so they will cause the client to use the fallback behaviour of launching with RSE API.
This patch also has some added benefits. Firstly, you can now properly override environment variables that the shell initialization file already defined. Secondly, it seems to have sped up the build marginally. The speedup is just in terms of the output processing and the build operation within the IDE... the speed of the external make process is of course unchanged.
(In reply to comment #7) > so far we plan on having a Generic Unix, Generic Linux, AIX, > Linux/x86, and Linux/PPC servers. The Generic servers will not bundle a > spawner library, so they will cause the client to use the fallback behaviour of > launching with RSE API. We will continue to have Generic MacOSx and Generic Windows servers as well. Created attachment 176414 [details]
updated patch - externalizes strings in plugin.xml
Patch applied to ptp_4_0 and HEAD Releng changes still pending. (In reply to comment #10) > Patch applied to ptp_4_0 and HEAD > > Releng changes still pending. Releng changes are now in ptp_4_0 and HEAD. Here's a build if anyone is interested to try the new server packages: /home/www/tools/ptp/builds/helios/I.I201008130756 I'm marking this as FIXED. BTW there seems to be some kind of problem with the spawner library for Linux/PPC. It segfaults the JVM when loaded. The one from CDT was built in 2005. It might be just too old/built on a really old GCC. Investigating whether building a new one helps at all. This patch introduces an API change in 4.0. Please enable API tooling so that you correct these problems in the future. Also, if we're going to make an exception in this case, I think it needs to be raised on the ptp-dev mailing list. (In reply to comment #13) > This patch introduces an API change in 4.0. Please enable API tooling so that > you correct these problems in the future. Also, if we're going to make an > exception in this case, I think it needs to be raised on the ptp-dev mailing > list. Huh? I was using the API tooling... there are no APIs added that I am aware of. The bulk of the changes are in internal classes (which are flagged as such in the manifest file). The rest are method body changes, which doesn't constitute an API change. Can you point to something specific? Unfortunately the API tooling doesn't tell you *what* has changed. All I'm seeing is: Description Resource Path Location Type The minor version should be incremented in version 4.0.1.qualifier, since new APIs have been added since version 4.0.0.201006142322 MANIFEST.MF /org.eclipse.ptp.remote.rse.core/META-INF line 5 Version Numbering Problem I wonder if the problem is that the messages package is exported? It probably shouldn't be. (In reply to comment #15) > Unfortunately the API tooling doesn't tell you *what* has changed. All I'm > seeing is: > Description Resource Path Location Type > The minor version should be incremented in version 4.0.1.qualifier, since new > APIs have been added since version 4.0.0.201006142322 MANIFEST.MF > /org.eclipse.ptp.remote.rse.core/META-INF line 5 Version Numbering > Problem > I wonder if the problem is that the messages package is exported? It probably > shouldn't be. Hmmm... I have seen that error transiently, so I have been ignoring it as I was assuming it was a glitch in the tooling. Usually doing a clean build gets rid of it. Since the package policy at Eclipse is best summarized as "discourage, but don't block access", if that package is the source of the problem, I'll mark it as internal. I tried cleaning but the thing is persistently there, so I guess it's real. I've made all the PTP message packages internal as it's one less API to have to worry about. It also seems pretty unlikely that an external plugin is going to need access. In hindsight, I should have named the packages internal. Oh well... (In reply to comment #17) > I tried cleaning but the thing is persistently there, so I guess it's real. > I've made all the PTP message packages internal as it's one less API to have to > worry about. It also seems pretty unlikely that an external plugin is going to > need access. In hindsight, I should have named the packages internal. Oh > well... I'll mark the package as internal then. (In reply to comment #18) > I'll mark the package as internal then. This has been done on ptp_4_0 and HEAD. |