Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 546710

Summary: [console] Race condition in ProcessConsole init with fast terminating processes
Product: [Eclipse Project] Platform Reporter: Paul Pazderski <paul-eclipse>
Component: DebugAssignee: Paul Pazderski <paul-eclipse>
Status: VERIFIED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: loskutov, sarika.sinha
Version: 4.12   
Target Milestone: 4.12 M3   
Hardware: All   
OS: All   
See Also: https://git.eclipse.org/r/141087
https://git.eclipse.org/c/platform/eclipse.platform.debug.git/commit/?id=f0cc33023dda08a2bae26b32c8e22a2863729724
https://git.eclipse.org/r/141174
https://git.eclipse.org/c/platform/eclipse.platform.debug.git/commit/?id=33e75e385ad0cf1f967ae7cf69c8b5cfd4adc257
https://git.eclipse.org/r/141288
https://git.eclipse.org/r/141289
https://git.eclipse.org/c/platform/eclipse.platform.debug.git/commit/?id=72f672fe7e6becae36c083ab3f9162f47476b128
https://git.eclipse.org/c/platform/eclipse.platform.debug.git/commit/?id=f09f1e9f05795fb820e3c859273faae755ce94e2
Whiteboard:
Bug Depends on:    
Bug Blocks: 552030    
Attachments:
Description Flags
Not closable consoles none

Description Paul Pazderski CLA 2019-04-24 14:08:20 EDT
As a short reminder: a launch first creates the native process and than allocates a ProcessConsole for it. To recognize process termination a thread is spawned to waitFor the native process.

The process may already terminate before the console is initialized and this is already considered in ProcessConsole implementation but I think there is a small chance for a race condition.

The init checks if the process is terminated and if not registers the listener for process termination. But the termination monitor and therefore the notification is run in another thread and if the init is interrupted just between the terminated check and register listener it can miss process termination. 

I was not affected by this myself but could trigger it using debugger or Thread.sleep.

The fix sets the listener unconditionally and after that check if the process already terminated.  There is small chance the "onTermination" actions are applied twice but that should be no problem for ProcessConsole. To be safe I synchronized resetName().
Comment 1 Eclipse Genie CLA 2019-04-24 14:10:40 EDT
New Gerrit change created: https://git.eclipse.org/r/141087
Comment 2 Andrey Loskutov CLA 2019-04-25 03:06:05 EDT
Created attachment 278386 [details]
Not closable consoles

Today's build shows 3 hanging consoles and two (not three!) hanging input jobs:

eclipse.platform/ant/org.eclipse.ant.core/buildfiles/buildExtraJAR.xml
eclipse.platform.resources/bundles/org.eclipse.core.resources/scripts/buildExtraJAR.xml
rt.equinox.p2/bundles/org.eclipse.equinox.p2.jarprocessor/localbuild.xml

There is definitely something wrong with short living process jobs / consoles for them.

Additionally, the consoles can't be closed by clicking on "Remove launch" button, and "Remove all" is disabled.

"Worker-13: Process Console Input Job" #57 prio=5 os_prio=0 cpu=50.03ms elapsed=1393.68s tid=0x00007ffe48001000 nid=0x1a9a9 in Object.wait()  [0x00007ffe638ee000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(java.base@11.0.2/Native Method)
        - waiting on <no object reference available>
        at java.lang.Object.wait(java.base@11.0.2/Object.java:328)
        at org.eclipse.ui.console.IOConsoleInputStream.waitForData(IOConsoleInputStream.java:142)
        at org.eclipse.ui.console.IOConsoleInputStream.read(IOConsoleInputStream.java:93)
        - waiting to re-lock in wait() <0x000000042552e8a0> (a org.eclipse.ui.console.IOConsoleInputStream)
        at org.eclipse.ui.console.IOConsoleInputStream.read(IOConsoleInputStream.java:115)
        at org.eclipse.debug.internal.ui.views.console.ProcessConsole$InputReadJob.run(ProcessConsole.java:752)
        at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)

"Worker-38: Process Console Input Job" #82 prio=5 os_prio=0 cpu=45.77ms elapsed=1393.20s tid=0x00007ffecc05d800 nid=0x1a9c3 in Object.wait()  [0x00007ffe61bd5000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(java.base@11.0.2/Native Method)
        - waiting on <no object reference available>
        at java.lang.Object.wait(java.base@11.0.2/Object.java:328)
        at org.eclipse.ui.console.IOConsoleInputStream.waitForData(IOConsoleInputStream.java:142)
        at org.eclipse.ui.console.IOConsoleInputStream.read(IOConsoleInputStream.java:93)
        - waiting to re-lock in wait() <0x0000000425603af0> (a org.eclipse.ui.console.IOConsoleInputStream)
        at org.eclipse.ui.console.IOConsoleInputStream.read(IOConsoleInputStream.java:115)
        at org.eclipse.debug.internal.ui.views.console.ProcessConsole$InputReadJob.run(ProcessConsole.java:752)
        at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
Comment 3 Andrey Loskutov CLA 2019-04-25 03:10:23 EDT
Thanks to https://git.eclipse.org/r/#/c/141013/ I was able to cancel the two input jobs. Note: this did not had any effect for hanging consoles, so the "hanging" consoles are unrelated to "hanging" jobs, common is only the root cause of fast terminated launches.
Comment 5 Paul Pazderski CLA 2019-04-25 14:47:18 EDT
I still could not reproduce this problem but still searched in the code what could break your remove buttons.

I think I found a race condition which could explain the disabled "remove all" button. (but you must have some bad luck if you actually hit it) If you see the disabled "remove all" button again I would also be interested in the enable state of the "Remove All Terminated" action in Console context menu and the "Remove All Terminated Launches" action in Debug View.

For the not working "remove single" button I have still no clue who this can get broken at all.
Comment 6 Eclipse Genie CLA 2019-04-25 14:48:00 EDT
New Gerrit change created: https://git.eclipse.org/r/141174
Comment 8 Andrey Loskutov CLA 2019-04-25 15:48:58 EDT
(In reply to Paul Pazderski from comment #5)
> I still could not reproduce this problem but still searched in the code what
> could break your remove buttons.
> 
> I think I found a race condition which could explain the disabled "remove
> all" button. (but you must have some bad luck if you actually hit it) If you
> see the disabled "remove all" button again I would also be interested in the
> enable state of the "Remove All Terminated" action in Console context menu
> and the "Remove All Terminated Launches" action in Debug View.

I've setup my SDK IDE to start with debugger port open. Next time it happens I should be able to look into it.

> For the not working "remove single" button I have still no clue who this can
> get broken at all.

I've also tried to understand what could it be, I assume it could be wrong process implementation of AntProcess. Somehow it probably manages to answer that it is not terminated, even if it is.
Comment 9 Paul Pazderski CLA 2019-04-25 16:06:19 EDT
(In reply to Andrey Loskutov from comment #8)
> I assume it could be wrong process implementation of AntProcess.

Had the same idea. But the only problem I could found was bug 546755.
Comment 10 Andrey Loskutov CLA 2019-04-25 16:09:23 EDT
(In reply to Paul Pazderski from comment #9)
> (In reply to Andrey Loskutov from comment #8)
> > I assume it could be wrong process implementation of AntProcess.
> 
> Had the same idea. But the only problem I could found was bug 546755.

Note: the problems I observed are caused by ant launched in the same JVM, you patch seem to touch separated JVM case.
Comment 11 Paul Pazderski CLA 2019-04-25 16:12:08 EDT
(In reply to Andrey Loskutov from comment #10)
> Note: the problems I observed are caused by ant launched in the same JVM,
> you patch seem to touch separated JVM case.

Correct. Also the other bug would had blocked the entire build to finish.
Comment 12 Andrey Loskutov CLA 2019-04-26 04:42:02 EDT
OK, now I've got 5 consoles (and 3 input stream jobs waiting in org.eclipse.ui.console.IOConsoleInputStream.waitForData()), but in debugger I see that there is only one (!) launch known in org.eclipse.debug.internal.core.LaunchManager.getLaunches().

After removing this one, there are no launches left, which explains why ConsoleRemoveAllTerminatedAction is disabled and ConsoleRemoveLaunchAction doesn't work anymore: the code below simply does nothing, even if *current* launch of the console is not null:

public synchronized void run() {
    	ILaunch launch = getLaunch();
    	if (launch != null) {
	        ILaunchManager launchManager = DebugPlugin.getDefault().getLaunchManager();
	        launchManager.removeLaunch(launch);
    	}
    }

The console removal code waits for the removal event from LaunchManager, which is of course not sent because LaunchManager doesn't remove any thing.

So the problem is in the way how launches are created / managed, not in the console action updates.
Comment 13 Andrey Loskutov CLA 2019-04-26 11:26:38 EDT
OK, I think I've got the root cause. It is not launch management issue, it is console management issue.

All starts with this two:
org.eclipse.debug.internal.ui.launchConfigurations.LaunchConfigurationManager.removeTerminatedLaunches(ILaunch)
org.eclipse.debug.internal.ui.views.console.ProcessConsoleManager

1) If the preference to remove terminated launches is set (default), it will remove terminated launches on starting a new one.
2) If a org.eclipse.debug.ui plugin is activated (opened Breakpoints or Debug views), ProcessConsoleManager will watch for launch events to open/close consoles.
3) If a launch is *added*, ProcessConsoleManager listener which will also create the associated console.
4) If a launch is *removed*, ProcessConsoleManager listener which will also remove the associated console.

The code to add/remove consoles in ProcessConsoleManager runs asynchronously to the launch / process lifecycle.

This means: if the console creation thread is behind the launch execution, we will have consoles opened *after* their process/launch is gone and *removed* from LaunchManager list. Those consoles will never be removed!

I have workspace with few projects using ant builders, so they started pretty much one after each other. The last one "wins" and can be closed, because no one runs after him who can trigger removeTerminatedLaunches(). All before are potential "victims" so they consoles may be opened after all events are already dispatched and they launches are gone.

To reproduce, at least two projects are needed, each one containing ant build launch, and a breakpoint at ProcessConsoleManager.getConsole(IProcess) line: 166.

Once breakpoint is set, trigger "Project->Build all" and let only the build job to access to consoles, but do not let "create console" job to do so. After the build, let the "create console" jobs proceed. Kaboom. Consoles are opened and will stay forever.

Thread [Worker-32: Creating console for /data/runtime-New configuration/A/build.xml] (Suspended (breakpoint at line 166 in ProcessConsoleManager))	
	ProcessConsoleManager.getConsole(IProcess) line: 166	
	ProcessConsoleManager.getConsoleDocument(IProcess) line: 216	
	ProcessConsoleManager$ConsoleCreation.run(IProgressMonitor) line: 70	
	Worker.run() line: 63	


Thread [Worker-30: Building Workspace] (Suspended (breakpoint at line 166 in ProcessConsoleManager))	
	ProcessConsoleManager.getConsole(IProcess) line: 166	
	ProcessConsoleManager.removeProcess(IProcess) line: 150	
	ProcessConsoleManager.removeLaunch(ILaunch) line: 137	
	ProcessConsoleManager.launchRemoved(ILaunch) line: 130	
	LaunchManager$LaunchNotifier.run() line: 430	
	SafeRunner.run(ISafeRunnable) line: 45	
	LaunchManager$LaunchNotifier.notify(ILaunch, int) line: 417	
	LaunchManager.fireUpdate(ILaunch, int) line: 1039	
	LaunchManager.removeLaunch(ILaunch) line: 2223	
	LaunchConfigurationManager.removeTerminatedLaunches(ILaunch) line: 334	
	LaunchConfigurationManager.launchAdded(ILaunch) line: 320	
	LaunchManager$LaunchNotifier.run() line: 427	
	SafeRunner.run(ISafeRunnable) line: 45	
	LaunchManager$LaunchNotifier.notify(ILaunch, int) line: 417	
	LaunchManager.fireUpdate(ILaunch, int) line: 1039	
	LaunchManager.addLaunch(ILaunch) line: 679	
	LaunchConfiguration.launch(String, IProgressMonitor, boolean, boolean) line: 811	
	LaunchConfiguration.launch(String, IProgressMonitor, boolean) line: 720	
	LaunchConfiguration.launch(String, IProgressMonitor) line: 715	
	ExternalToolBuilder.launchBuild(int, ILaunchConfiguration, Map<String,String>, IProgressMonitor) line: 183	
	ExternalToolBuilder.doBuildBasedOnScope(IResource[], int, ILaunchConfiguration, Map<String,String>, IProgressMonitor) line: 171	
	ExternalToolBuilder.build(int, Map<String,String>, IProgressMonitor) line: 90	
	BuildManager$2.run() line: 833	
	SafeRunner.run(ISafeRunnable) line: 45	
	BuildManager.basicBuild(int, IncrementalProjectBuilder, Map<String,String>, MultiStatus, IProgressMonitor) line: 220	
	BuildManager.basicBuild(IBuildConfiguration, int, IBuildContext, ICommand[], MultiStatus, IProgressMonitor) line: 263	
	BuildManager$1.run() line: 316	
	SafeRunner.run(ISafeRunnable) line: 45	
	BuildManager.basicBuild(IBuildConfiguration, int, IBuildContext, MultiStatus, IProgressMonitor) line: 319	
	BuildManager.basicBuildLoop(IBuildConfiguration[], IBuildConfiguration[], int, MultiStatus, IProgressMonitor) line: 371	
	BuildManager.build(IBuildConfiguration[], IBuildConfiguration[], int, IProgressMonitor) line: 392	
	Workspace.buildInternal(IBuildConfiguration[], int, boolean, IProgressMonitor) line: 504	
	Workspace.build(int, IProgressMonitor) line: 404	
	GlobalBuildAction$1.run(IProgressMonitor) line: 180	
	Worker.run() line: 63
Comment 14 Paul Pazderski CLA 2019-04-26 12:17:02 EDT
Great! Now I finally could reproduce it. I was so close but failed to imagine this specific thread order. Also I tested multiple ant targets, multiple ant builders but not multiple projects...

Should I work on a fix or will you do it yourself?
Comment 15 Andrey Loskutov CLA 2019-04-26 13:14:39 EDT
(In reply to Paul Pazderski from comment #14)
> Great! Now I finally could reproduce it. I was so close but failed to
> imagine this specific thread order. Also I tested multiple ant targets,
> multiple ant builders but not multiple projects...
> 
> Should I work on a fix or will you do it yourself?

If you have time & interst, feel free to push a patch ;)
Comment 16 Eclipse Genie CLA 2019-04-28 10:52:22 EDT
New Gerrit change created: https://git.eclipse.org/r/141288
Comment 17 Eclipse Genie CLA 2019-04-28 10:57:47 EDT
New Gerrit change created: https://git.eclipse.org/r/141289
Comment 18 Paul Pazderski CLA 2019-04-28 11:06:28 EDT
Writing the patch was quite easy. Writing a test too. But figuring out why my test was so unstable was hard.

All this started in bug 546641 with the JobManager complaining about a still running input read job on shutdown. On shutdown the LaunchManager removes all launches and as consequence all consoles and therefore all input read jobs. We found a few problems there but also on shutdown the ConsoleManager removes all consoles (again) and therefore ends all input read jobs.

Now I found a new combination. To initialize and dispose consoles the AbstractConsole registers a listener on ConsoleManager. The managers listener list is initialized lazy without synchronization. If two consoles are created simultaneous (which is possible due to creation done in a job) it can happen that only one of the listeners is added. Then one of the consoles had started the input read job in constructor but never stops due to not received notifications.
Comment 21 Andrey Loskutov CLA 2019-04-28 13:02:52 EDT
Not closing yet, will check tomorrow if my Linux setup (which was affected) is fine or if there is still some leaks.
Comment 22 Andrey Loskutov CLA 2019-04-29 13:04:03 EDT
(In reply to Andrey Loskutov from comment #21)
> Not closing yet, will check tomorrow if my Linux setup (which was affected)
> is fine or if there is still some leaks.

Looks good. Thanks Paul.
Comment 23 Sarika Sinha CLA 2019-05-21 04:52:10 EDT
@Paul,
Please verify this bug with the new I build.
Comment 24 Paul Pazderski CLA 2019-05-21 14:04:31 EDT
Verified with Build id: I20190520-1805