| Summary: | Resource manager connections closing for no apparent reason | ||
|---|---|---|---|
| Product: | [Tools] PTP | Reporter: | Greg Watson <g.watson> |
| Component: | Remote Tools | Assignee: | Greg Watson <g.watson> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | CC: | arossi |
| Version: | 5.0 | ||
| Target Milestone: | 5.0.4 | ||
| Hardware: | Macintosh | ||
| OS: | Mac OS X | ||
| Whiteboard: | |||
|
Description
Greg Watson
I have observed the following behavior. If I open three new RMs which have associated LML drivers (remote polling done once a minute), and follow this sequence of actions, I get the reported error below. 1. RM1 on Connection1 to host A 2. RM2 on Connection2 to host B 3. RM3 on Connection2 to host B 1. RM1 submits a job to the scheduler. 2. RM2 submits a job to the scheduler. 3. RM3 submits an "interactive" pseudoTerminal session in order to use the -I option on the scheduler. When the job on RM1 completes, I activate a context menu action which attempts to stream the output file to the console. The connection on the head node of A seems to be slow or the head node has a high load, so this stream can take up to minute even though the amount being streamed is not considerable (50-100k). Because this overlaps with the next polling from the other two drivers, they attempt to run in the meantime. But when the streaming completes (successfully; the output arrives in its entirety at the console), one of the other two RMs, usually the interactive one, reports a "pipe closed" and the RM goes into error mode. It can be terminated, but the tread doing the polling needs to be canceled separately using the progress bar. This looks like something at the level of connection is being shared that shouldn't be. Al Running on separate connections greatly reduces the chance of this happening, but it still does seem to pop up occasionally. My gut feeling is that there is some non-thread-safe behavior in the underlying JSch classes. Al This issue looks like Remote Tools connections are not thread safe for remote processes. I'm not sure if this is in Remote Tools or Jsch. The work around is to use separate connections for each RM. Lowering severity to normal and moving to Remote Tools. This will need to be fixed in an update release. Also, the original diagnosis that it had to do with the long I/O read was a red herring. It will happen on the boot of the LML driver if you share a connection between RMs. Al I've added a fix that synchronizes the call to channel.connect(), which seems to fix the use of RemoteProcessBuilder from multiple threads. My tests are not showing any issues with this fix, but because this is a concurrency issue there may still be problems that manifest from time to time. Closing as fixed in ptp_5_0 and HEAD. |