| Summary: | TerminalServiceHostShell with SshTerminalShell on slower targets will hang the shell when writeToShell with commands on shell that is not ready to accept command yet | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Tools] Target Management | Reporter: | Yufen Kuo <ykuo> | ||||||||||||||
| Component: | RSE | Assignee: | Martin Oberhuber <mober.at+eclipse> | ||||||||||||||
| Status: | RESOLVED FIXED | QA Contact: | Martin Oberhuber <mober.at+eclipse> | ||||||||||||||
| Severity: | blocker | ||||||||||||||||
| Priority: | P3 | CC: | anna.dushistova, chandrayya.gk, dmcknigh | ||||||||||||||
| Version: | unspecified | ||||||||||||||||
| Target Milestone: | 3.3.1 | ||||||||||||||||
| Hardware: | All | ||||||||||||||||
| OS: | Linux | ||||||||||||||||
| Whiteboard: | |||||||||||||||||
| Attachments: |
|
||||||||||||||||
|
Description
Yufen Kuo
Created attachment 202371 [details]
stack trace when it hangs
If I place sleep calls in between each writeToShell then the problem goes away, but this is not the right fix. It seems wrong to assume that once shell is open, it is ready and you can call writeToShell immediately. Created attachment 202843 [details]
patch to wait until output stream is ready when launching shell before writing to shell
added output listener to hostshell when launchShell to make sure that it receives output from the shell invocation commands. If the shell is not ready, it will block until there is output before returning the hostShell, therefore it prevents writeToShell before the shell is ready.
I am raising the priority to blocker since the functionality is not workable with slower targets. On slower targets, once it hangs the shell, you will need to restart the workbench in order to kill the hanging shell. The ssh subprocess is critical to us and without this fix, remote debugging, getting process lists does not work at all. There is an API change which I added "removeOutputListener" API. This API should be in there in the first place since there is already a "addOutputListener" api, adding this API does not break backward compatibility. Please consider apply the patch for the SR1 since this feature is critical to our product. Thank you. Yufen Created attachment 203018 [details]
Yu-Fen's patch without API changes
Martin, Dave, Could you please review if we can check in the modified patch for 3.3.1? Or do you think we should for now move the removeOutputListener further up to the TerminalHostShellOutputReader? Created attachment 203335 [details] patch v3 (without API) Sorry guys, but this patch is completely bogus in a number of ways: 1.) First of all, I very much doubt that it's really incorrect to send commands before having received a prompt. I have seen shells which don't ever send any prompt (eg during an scp handshake), so it seems incorrect to me that we forcefully wait for a prompt before sending anything. At the time we send something, SSH Streams have been established successfully; if a remote shell fails to read commands from those Streams I'd consider it buggy. What i have seen though, is that opening an RSE Shell doesn't print the first couple lines of output occasionally. Could it be that the remote board is really too fast rather than too slow, and it has already completed processing the command when we start reading, rather than not yet executed it? If it really turns out we have to wait for a prompt before sending commands (which I doubt), then it must be a property of the connection whether it should wait or not. 2.) If we really want to wait for output, then TerminalShellService is the wrong place to wait. The right place would be TerminalServiceHostShell line 86. Because in there, we send an initial "cd" command - which would be sent before waiting with the current patch. This can easily be tested in RSE: Open the Files Subsystem, drill down to a folder, then right-click > Launch Shell. Does it cd into the right folder? Does it do so before the shell prints its prompt? Also note that if the change is made in TerminalServiceHostShell we do not have to change any API since we can use the internal Stream Readers. To me, this seems to be a defect in the shell implementation on that fsl board. 3.) Regarding API: Just removing methods from the interface still keeps API in the abstract base classes. API Tooling should have told you that you've been adding API. I hope you've all got API Tooling set up and a baseline configured ? See downloadable baselines here: http://wiki.eclipse.org/TM/Code_Streams I actually don't see a strong need for removing the output listener. It gets added only once and does practically nothing, so for a service release it's fine to just keep it lurking there. 4.) The change around "started" in AbstractHostShellOutputReader is bogus. API Docs for isAlive() clearly say that a Thread is considered alive if it has been started - checking that again with our own variable is bogus and unnecessary: http://download.oracle.com/javase/6/docs/api/java/lang/Thread.html#isAlive() 5.) The code that's waiting is not safe on multi-core CPU's since connectionReady must be declared "volatile" to properly sync between Threads. Instedad of Thread.sleep() with a fixed timeout, using wait / notifyAll is more elegant since it's faster. http://www.javamex.com/tutorials/synchronization_volatile.shtml Unfortunately I had already fixed issues 3-5 before I discovered that issues 1 and 2 already made the change completely bogus; so find attached my version of the patch in the hope that it's eventually helpful in some way (eg after refactoring / moving into TerminalServiceHostShell, and making conditional with an ifWaitForPrompt in order to fix issues 2 and 1). But to be clear again, I don't think we can functionally change the TerminalService to forcefully wait for a prompt on connect, since there might be shells out there which just don't send a prompt. Please reconsider / reinvestigate things with your target! Created attachment 203338 [details]
patch v4
Here is a better patch, which also waits for output before sending the initial cd command. This also seems to address the issue of missing output.
I'm still not entirely sure whether it's acceptable to forcefully wait before sending anything to the remote (there might be remotes which never send a prompt), but I'd be OK committing this patch if a majority of committers wants it that way.
I believe that in most cases it should be OK to require a prompt since we're dealing with interactive shells. What I still dislike about this patch is that it waits in the constructor; so client code can't forcefully stop waiting if something goes wrong.
It would be better to wait in the shell writer thread, allowing to cancel the shell if need be; and it would be better to have a Property of some kind to indicate whether we should wait or not. But again, if a majority of committers wants it that way I'm OK committing it. Note that deadline for committing is TODAY so we'll discuss this in our monthly meeting today.
Yu-fen please test whether this patch addresses your needs and report here (please test thoroughly!)
We discussed this in the TM monthly meeting: http://wiki.eclipse.org/TM/Meetings/14-Sep-2011 And committers agreed to include this change in Indigo SR1. We thought that the value of the patch was higher than the potential risk of breaking remotes which don't send any output on initial connect. At some point I'd still like to see the patch improved though, by deferring the wait into the command sender rather than blocking the constructor (which cannot be canceled). Comment on attachment 202843 [details]
patch to wait until output stream is ready when launching shell before writing to shell
iplog- since Yufen's patch wasn't used at all.
I have released the change, a build is available: Update Site: http://download.eclipse.org/tm/updates/3.3milestones Downloads: http://download.eclipse.org/tm/downloads/drops/I20110914-1310/ Please test thoroughly, to ensure (a) it fixes the issue and (b) we don't have any regressions. Especially RSE Shell over Terminal (SSH, Telnet) must be tested against various hosts. We have only today to find any issues since tomorrow is drop-dead-date for Indigo SR1 contributions. Created attachment 203371 [details] remote shell screen shot >2.) If we really want to wait for output, then TerminalShellService is the > wrong place to wait. The right place would be TerminalServiceHostShell > line 86. Because in there, we send an initial "cd" command - which > would be sent before waiting with the current patch. This can easily > be tested in RSE: Open the Files Subsystem, drill down to a folder, > then right-click > Launch Shell. Does it cd into the right folder? > Does it do so before the shell prints its prompt? Attaching the screenshot of selecting the /bin folder, then right-click > Launch Shell. As you can see, it does not cd to the /bin folder, I typed in command "pwd" twice and both time, it outputs pwd is /home/root You can see the command cd /bin is displayed but before the prompt return with "Last login" line. Doing Launch Terminal sometimes works and sometimes don't. When it does not work, I get the empty terminal window and in console, I get the following error message: Exception in thread "Thread-31" java.lang.NullPointerException at org.eclipse.rse.internal.terminals.ui.views.RSETerminalConnectionThread.readDataForever(RSETerminalConnectionThread.java:99) at org.eclipse.rse.internal.terminals.ui.views.RSETerminalConnectionThread.run(RSETerminalConnectionThread.java:70) We are doing testing now with your patch and so far it seems to work, I have tested with getting processes, and remote debugging with ssh. Will continue testing. tested with telnet only remote system connection type and it is working with your patch. But I found that I need to hit a carriage return in Terminals view for the prompt to be displayed in terminals view. (In reply to comment #7) > Created attachment 203335 [details] > patch v3 (without API) > > Sorry guys, but this patch is completely bogus in a number of ways: > > 1.) First of all, I very much doubt that it's really incorrect to send commands > before having received a prompt. I have seen shells which don't ever send > any prompt (eg during an scp handshake), so it seems incorrect to me that > we forcefully wait for a prompt before sending anything. > At the time we send something, SSH Streams have been established > successfully; if a remote shell fails to read commands from those Streams > I'd consider it buggy. Currently in the code, it always sends the shell invocation command In TerminalShellService's launchShell, it calls runCommand which passes in the SHELL_INVOCATION as the initial command, and in runCommand when TerminServiceHostShell gets created, it checks if commandToRun is SHELL_INVOCATION and then sends the prompt command, so it always sends the prompt command. So if you always send the prompt command, we can expect the prompt command return some output. > > What i have seen though, is that opening an RSE Shell doesn't print the > first couple lines of output occasionally. Could it be that the remote > board is really too fast rather than too slow, and it has already completed > processing the command when we start reading, rather than not yet executed > it? As you can see from the screenshot I attached with file subsystem -> Launch Shell, the cd command was not processed, so I'd like to rule out that the remote board is really too fast and process all the commands already. > > If it really turns out we have to wait for a prompt before sending > commands (which I doubt), then it must be a property of the connection > whether it should wait or not. > > 2.) If we really want to wait for output, then TerminalShellService is the > wrong place to wait. The right place would be TerminalServiceHostShell > line 86. Because in there, we send an initial "cd" command - which > would be sent before waiting with the current patch. This can easily > be tested in RSE: Open the Files Subsystem, drill down to a folder, > then right-click > Launch Shell. Does it cd into the right folder? > Does it do so before the shell prints its prompt? > > Also note that if the change is made in TerminalServiceHostShell we do > not have to change any API since we can use the internal Stream Readers. > To me, this seems to be a defect in the > shell implementation on that fsl board. > > 3.) Regarding API: Just removing methods from the interface still keeps > API in the abstract base classes. API Tooling should have told you that > you've been adding API. I hope you've all got API Tooling set up and > a baseline configured ? See downloadable baselines here: > http://wiki.eclipse.org/TM/Code_Streams > > I actually don't see a strong need for removing the output listener. > It gets added only once and does practically nothing, so for a service > release it's fine to just keep it lurking there. > > 4.) The change around "started" in AbstractHostShellOutputReader is bogus. > API Docs for isAlive() clearly say that a Thread is considered alive > if it has been started - checking that again with our own variable is > bogus and unnecessary: > > http://download.oracle.com/javase/6/docs/api/java/lang/Thread.html#isAlive() > I've found that for the standard error's AbstractHostShellOutputReader, it does not return the correct value for isAlive(). When the thread is already started, isAlive() still returns false, and if I don't add my own "started" flag, I would get: java.lang.IllegalThreadStateException at java.lang.Thread.start(Thread.java:638) at org.eclipse.rse.services.shells.AbstractHostShellOutputReader.startIfNotAlive(AbstractHostShellOutputReader.java:111) at org.eclipse.rse.services.shells.AbstractHostShellOutputReader.addOutputListener(AbstractHostShellOutputReader.java:140) at org.eclipse.rse.services.shells.HostShellProcessAdapter.<init>(HostShellProcessAdapter.java:59) at org.eclipse.rse.internal.subsystems.processes.shell.linux.LinuxShellProcessService.listAllProcesses(LinuxShellProcessService.java:142) at org.eclipse.rse.subsystems.processes.servicesubsystem.ProcessServiceSubSystem.listAllProcesses(ProcessServiceSubSystem.java:134) at org.eclipse.rse.subsystems.processes.core.subsystem.impl.RemoteProcessSubSystemImpl.internalResolveFilterString(RemoteProcessSubSystemImpl.java:140) at org.eclipse.rse.core.subsystems.SubSystem.internalResolveFilterStrings(SubSystem.java:2924) at org.eclipse.rse.core.subsystems.SubSystem.resolveFilterStrings(SubSystem.java:2300) at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.internalGetChildren(SystemViewFilterReferenceAdapter.java:466) at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.getChildren(SystemViewFilterReferenceAdapter.java:284) at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.getChildren(SystemViewFilterReferenceAdapter.java:292) at org.eclipse.rse.ui.operations.SystemFetchOperation.execute(SystemFetchOperation.java:441) at org.eclipse.rse.ui.operations.SystemFetchOperation.run(SystemFetchOperation.java:182) at org.eclipse.rse.ui.view.AbstractSystemViewAdapter.fetchDeferredChildren(AbstractSystemViewAdapter.java:2290) at org.eclipse.ui.progress.DeferredTreeContentManager$1.run(DeferredTreeContentManager.java:235) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54) > 5.) The code that's waiting is not safe on multi-core CPU's since > connectionReady must be declared "volatile" to properly sync between > Threads. Instedad of Thread.sleep() with a fixed timeout, using wait / > notifyAll is more elegant since it's faster. > http://www.javamex.com/tutorials/synchronization_volatile.shtml > > Unfortunately I had already fixed issues 3-5 before I discovered that issues 1 > and 2 already made the change completely bogus; so find attached my version of > the patch in the hope that it's eventually helpful in some way (eg after > refactoring / moving into TerminalServiceHostShell, and making conditional with > an ifWaitForPrompt in order to fix issues 2 and 1). > > But to be clear again, I don't think we can functionally change the > TerminalService to forcefully wait for a prompt on connect, since there might > be shells out there which just don't send a prompt. Please reconsider / > reinvestigate things with your target! (In reply to comment #14) > Currently in the code, it always sends the shell invocation command > In TerminalShellService's launchShell, it calls runCommand which passes > in the SHELL_INVOCATION as the initial command, and in runCommand when > TerminServiceHostShell gets created, it checks if commandToRun is > SHELL_INVOCATION and then sends the prompt command Yes, but (a) in other cases SHELL_INVOCATION would not be sent, and (b) if your case about the shell not reacting to early commands the prompt command could be sent too early. So the shell would not react. I think the new patch fixes both issues. > As you can see from the screenshot I attached with file subsystem -> Launch > Shell, the cd command was not processed, so I'd like to rule out that the > remote board is really too fast and process all the commands already. I agree that your board can't be too fast. Still I claim that your board is buggy. When SSH has set up a Stream, that Stream must not lose characters before the Shell reads them. Looks like some buffering is missing on your board, or there's a race condition with pipes between the original Stream opener (be it inetd or getty) and the shell. Anyways, it looks like the patch works around your buggy board. > I've found that for the standard error's AbstractHostShellOutputReader, it > does not return the correct value for isAlive(). I assume that isAlive() does return the correct value, but the Thread has already ended. Since internalReadLine() returns null and handle() terminates the Thread when receiving null. We get IllegalThreadStateException since we can't re-start the Thread. This won't be an issue any more in this context, since the new patch doesn't add Output Listeners; but it can be an issue in other cases. We must not try to re-start a reader Thread that has already ended. Can you please file a new bug for this. (In reply to comment #15) > I assume that isAlive() does return the correct value, but the Thread has > already ended. Since internalReadLine() returns null and handle() terminates > the Thread when receiving null. We get IllegalThreadStateException since we > can't re-start the Thread. > > This won't be an issue any more in this context, since the new patch doesn't > add Output Listeners; but it can be an issue in other cases. We must not try to > re-start a reader Thread that has already ended. Can you please file a new bug > for this. bug submitted. https://bugs.eclipse.org/bugs/show_bug.cgi?id=357886 |