Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 323071

Summary: Memory leak in running JUnit tests
Product: [Tools] CDT Reporter: Andy Jin <ajin>
Component: cdt-debug-dsf-gdbAssignee: Marc Khouzam <marc.khouzam>
Status: RESOLVED FIXED QA Contact: Marc Khouzam <marc.khouzam>
Severity: normal    
Priority: P3 CC: john.cortell, pawel.1.piech
Version: 7.0Flags: marc.khouzam: review? (john.cortell)
Target Milestone: 8.0   
Hardware: PC   
OS: Linux   
Whiteboard:
Attachments:
Description Flags
Fix to avoid large memory leak marc.khouzam: iplog-

Description Andy Jin CLA 2010-08-18 13:55:28 EDT
Build Identifier: I20100608-0911

Details please reference this CDT mailing list post: http://dev.eclipse.org/mhonarc/lists/cdt-dev/msg20179.html

There is memory leak when running JUnit tests. The tests have to be in large number in order to reveal the heap out of memory error.



Reproducible: Always

Steps to Reproduce:
1. Run DSF/GDB unit tests in a large number or set the PermSize to be small enough (32M)
2. The tests will hang at some point with heap out of memory error in the log file
3. There are many "Rejected execution exception" in the console view
Comment 1 Marc Khouzam CLA 2010-08-28 21:17:14 EDT
From the email thread:

> -----Original Message-----
> From: cdt-dev-bounces@xxxxxxxxxxx 
> [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Andy Jin
> Sent: Wednesday, August 18, 2010 1:19 PM
> To: cdt-dev@xxxxxxxxxxx
> Subject: [cdt-dev] [DSF-GDB] Memory leak when running JUnit tests
> 
> I am writing JUnit tests based on the current MI*test from 
> DSF-GDB. When
> I run my JUnit tests (140+ tests) on multiple targets (total is 1000+
> tests) I always run out of heap memory. Some tracing reveals there is
> memory leak in the range of 1M-2M per test.
> 
> Further debugging led me to this:
> 
> In my launch delegate I create the GDB CLI process before running my
> finalLaunchSequence
> (https://bugs.eclipse.org/bugs/show_bug.cgi?id=223154), and 
> this creates
> the GDB runtime process allocating around 2M buffer for input/output
> (AbstractCLIProcess). When the launch is terminated this GDB runtime
> process needs to be released and this is done at 
> GDBControl.terminate()
> where it schedules a MIGDBExit().
> 
> However, most of the time (8 out of 10) this MIGDBExit() is not
> completed, thus the future task above it will run after 2 second. This
> task does not kill the GDB process because the state is never
> "State.STARTED" in this case. Not sure if it's related but 
> running this
> task results in a "rejected execution exception" which can be 
> seen from
> the console view.
> 
> I also tried running the DSF/GDB unit tests (e.g. 
> MIRegisterTest) I can
> see the same behavior that the MIGDBExit() is not completed but the
> future task is run.
Comment 2 Marc Khouzam CLA 2010-08-28 21:17:34 EDT
(In reply to comment #1)
> From the email thread:
> 
> > -----Original Message-----
> > From: cdt-dev-bounces@xxxxxxxxxxx
> > [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Andy Jin
> > Sent: Wednesday, August 18, 2010 1:19 PM
> > To: cdt-dev@xxxxxxxxxxx
> > Subject: [cdt-dev] [DSF-GDB] Memory leak when running JUnit tests
> >
> > I am writing JUnit tests based on the current MI*test from
> > DSF-GDB. When
> > I run my JUnit tests (140+ tests) on multiple targets (total is 1000+
> > tests) I always run out of heap memory. Some tracing reveals there is
> > memory leak in the range of 1M-2M per test.

I've seen some out-of-memory errors when running a large number
of JUnit tests also, so I believe we are leaking memory somehow.

> >
> > Further debugging led me to this:
> >
> > In my launch delegate I create the GDB CLI process before running my
> > finalLaunchSequence
> > (https://bugs.eclipse.org/bugs/show_bug.cgi?id=223154), and
> > this creates
> > the GDB runtime process allocating around 2M buffer for input/output
> > (AbstractCLIProcess). When the launch is terminated this GDB runtime
> > process needs to be released 

Up to here I'm following.

> > and this is done at GDBControl.terminate()
> > where it schedules a MIGDBExit().

Here I'm confused.
The two LargePipedInputStream of AbstractCLIProcess seem to be released in
AbstractCLIProcess#closeIO() which is called by AbstractCLIProcess#dispose(),
which is called by GDBControl.CommandProcessorsStep which is triggered by the
ShutdownSequence, which I would like to believe is always being called.

Did I mis-understand what you meant?
Comment 3 Andy Jin CLA 2010-08-31 13:43:15 EDT
Yes I see the AbstractCLIProcess#closeIO() is being run and the two IO streams are closed. What I really meant was the GDB runtime process itself is not killed.

It's a native process so it needs to be killed by calling the Process.destroy(). GDBBackendCLIProcess#destroy() is trying to call the GDBControl#terminate() to kill the GDB process but I don't see that is happening or it's not reliable for the reason in comment #1.
Comment 4 Marc Khouzam CLA 2010-08-31 23:19:45 EDT
(In reply to comment #1)
 
> > However, most of the time (8 out of 10) this MIGDBExit() is not
> > completed, thus the future task above it will run after 2 second. This
> > task does not kill the GDB process because the state is never
> > "State.STARTED" in this case. 

If the state is not STARTED then it means MonitorJob.fExited has been set to true, which means the process is already killed.

Two questions:
1- do we have to call Process.destroy() all the time, or only if the process is still running.  It looks like the code only try to call destroy() if GDB is still running.

2- could we have a concurrency issue with MonitorJob.fExited?  That field is accessed by two threads: the MonitorJob and the Executor in GDBBackend#getState.  I think that last method must synchronize its access, no?
Comment 5 Marc Khouzam CLA 2010-09-02 15:42:04 EDT
I've been running the JUnit tests and checking my memory usage, and I don't see any memory leak.  Are you sure you see one?
Comment 6 Marc Khouzam CLA 2011-05-09 16:03:38 EDT
I've confirmed that AbstractCLIProcess is the problem with its use LargePipedInputStream.  Here is how I know.

With LargePipedInputStream using 1024*1024 bytes, I can now run 450 launches before running out of memory.  So, I made LargePipedInputStream use only 1024 bytes of memory, and I can then run more than 1000 tests.  Therefore, it is LargePipedInputStream that is leaking.

LargePipedInputStream is used in AbstractCLIProcess and MIInferiorProcess.  I removed its use from MIInferiorProcess, but the memory leak still happened.  So, instead, I removed it from AbstractCLIProcess (while leaving it in MIInferiorProcess), and then I could run 1000 tests again.  That means that it is AbstractCLIProcess that is the problem, as Andy believed all along.

My current theory (still be be proven) is that the JUnit tests do not clear a terminated launch, so the AbstractCLIProcess remains existent and its memory is not cleared.  Just as if the Debug view was still showing a terminated launch.

I'm going to see if we can either remote the terminated launch, or of we can remove AbstractCLIProcess from the launch, to get the memory freed up.
Comment 7 Marc Khouzam CLA 2011-05-11 16:25:32 EDT
Created attachment 195427 [details]
Fix to avoid large memory leak

(In reply to comment #6)
> My current theory (still be be proven) is that the JUnit tests do not clear a
> terminated launch, so the AbstractCLIProcess remains existent and its memory is
> not cleared.  Just as if the Debug view was still showing a terminated launch.

This turns out to be wrong.  We have all kinds of memory leaks which eventually prevent the release of AbstractCLIProcess, which then keeps a hold of the LargePipedInputStream class.

Bug 331232 is about our memory leaks in general, while Bug 345476 is about a specific one about the model proxy strategy.

Until those bugs are fixed, I think we can address the simpler issue of our large memory leak of LargePipedInputStream.  The attached patch does this by setting to null the variables holding the LargePipedInputStream when we dispose AbstractCLIProcess.  Note that we need to set all four streams to null because, the two that are not LargePipedInputStream hold a reference to one.
Comment 8 Marc Khouzam CLA 2011-05-11 16:27:02 EDT
John, what do you think of this trick to improve the situation?

I'm running the JUnit tests now and I'm expecting to go from 450 to all 1500 of them in a single run...  I'll report when they are done.
Comment 9 John Cortell CLA 2011-05-11 16:54:01 EDT
(In reply to comment #8)
> John, what do you think of this trick to improve the situation?
> 
> I'm running the JUnit tests now and I'm expecting to go from 450 to all 1500 of
> them in a single run...  I'll report when they are done.

Mark, this looks like a great improvement to me...if and only if we can be assured that the dispose() and the code that uses the pipes never, in practice, run at the same time. If it's possible they do, then this change could introduce a very intermittent NPE bug. Fortunately, in this case you can add synchronization around the new cleanup code and the pipe usage with no risk of deadlocks, so that's what I would do.
Comment 10 Marc Khouzam CLA 2011-05-12 09:29:59 EDT
(In reply to comment #9)
> (In reply to comment #8)
> > John, what do you think of this trick to improve the situation?
> > 
> > I'm running the JUnit tests now and I'm expecting to go from 450 to all 1500 of
> > them in a single run...  I'll report when they are done.
> 
> Mark, this looks like a great improvement to me...if and only if we can be
> assured that the dispose() and the code that uses the pipes never, in practice,
> run at the same time. If it's possible they do, then this change could
> introduce a very intermittent NPE bug. Fortunately, in this case you can add
> synchronization around the new cleanup code and the pipe usage with no risk of
> deadlocks, so that's what I would do.

Good point I will do that.

Good news, I was able to run all tests (1544 of them) and took about 30 minutes
Comment 11 Marc Khouzam CLA 2011-05-13 10:49:00 EDT
(In reply to comment #9)
> (In reply to comment #8)
> > John, what do you think of this trick to improve the situation?
> > 
> > I'm running the JUnit tests now and I'm expecting to go from 450 to all 1500 of
> > them in a single run...  I'll report when they are done.
> 
> Mark, this looks like a great improvement to me...if and only if we can be
> assured that the dispose() and the code that uses the pipes never, in practice,
> run at the same time. If it's possible they do, then this change could
> introduce a very intermittent NPE bug. Fortunately, in this case you can add
> synchronization around the new cleanup code and the pipe usage with no risk of
> deadlocks, so that's what I would do.

Sorry John, but I misunderstood what you meant.  Re-reading now, I believe you are worried that AbstractCLIProcess could have an NPE if we null the stream variables and then those streams are used.

To protect against this the patch surrounds every use of the streams with a null check.  Is that good enough?

Thanks
Comment 12 John Cortell CLA 2011-05-13 11:34:07 EDT
> Sorry John, but I misunderstood what you meant.  Re-reading now, I believe you
> are worried that AbstractCLIProcess could have an NPE if we null the stream
> variables and then those streams are used.
> 
> To protect against this the patch surrounds every use of the streams with a
> null check.  Is that good enough?

In micro time, the gap between the null reference check and the use of the reference (if it's not null) is huge; another thread might be given the CPU right after the check and run quite a bit of code. In theory, thread A shouldn't be in the dispose method() while thread B is in another method of the object. But very often those constraints aren't ensured because, at a minimum, there aren't well defined threading rules on using an object. What my comment tried to point out is that if we do have that case happening today (thread A in dipose(), thread B in another method), there isn't going to be an NPE. With your change, there's likely to be an intermittent NPE. If we know the object is only every exercised on one thread (e.g., the UI thread), then forget everything I've said. Otherwise, we need to consider the problem scenario I brought up. And what I'm ultimately saying is that you can just synchronize the clearing of the references and the the uses of them, and that will work in all cases. Of course, there's performance to keep in mind.
Comment 13 Marc Khouzam CLA 2011-05-13 13:10:59 EDT
(In reply to comment #12)
 
> In micro time, the gap between the null reference check and the use of the
> reference (if it's not null) is huge; another thread might be given the CPU
> right after the check and run quite a bit of code. In theory, thread A
> shouldn't be in the dispose method() while thread B is in another method of the
> object. 

Of course, you are right.  That is the downside of coding for DSF so much, I forget to think about multi-threading.

I'll post a patch.
Comment 14 Marc Khouzam CLA 2011-05-13 13:20:13 EDT
(In reply to comment #12)
> If we know the object is only every exercised on one thread 
> (e.g., the UI thread), then forget everything I've said.

Ok, I think I got lucky.  The setting of null is done in dispose() which is marked  @ConfinedToDsfExecutor, while the reading of the streams is done in:

1- closeIO()
  private method called by dispose() => OK

2- IEventListener.eventReceived() 
  also @ConfinedToDsfExecutor => ok

3- checkMissingSecondaryPrompt()
  called by ICommandListener.commandSent() which is @ConfinedToDsfExecutor => ok

The last point that I'm not sure about is that getErrorStream() and 
getInputStream() return the streams.  Do I have to synchronize those with dispose()?
Comment 15 John Cortell CLA 2011-05-13 14:12:00 EDT
(In reply to comment #14)
> The last point that I'm not sure about is that getErrorStream() and 
> getInputStream() return the streams.  Do I have to synchronize those with
> dispose()?

Nope. Those aren't exposed to the potential problem I mentioned.

That said...technically, even a simple field read operation should be made thread-safe when there's a possibility another thread might be modifying the field at the same time ("Java Concurrency in Practice", Goetz;) However, I think that level of diligence is rarely followed and I'm not sure how real the danger is.
Comment 16 Marc Khouzam CLA 2011-05-13 14:22:55 EDT
(In reply to comment #15)
> (In reply to comment #14)
> > The last point that I'm not sure about is that getErrorStream() and 
> > getInputStream() return the streams.  Do I have to synchronize those with
> > dispose()?
> 
> Nope. Those aren't exposed to the potential problem I mentioned.
> 
> That said...technically, even a simple field read operation should be made
> thread-safe when there's a possibility another thread might be modifying the
> field at the same time ("Java Concurrency in Practice", Goetz;) However, I
> think that level of diligence is rarely followed and I'm not sure how real the
> danger is.

I also was told this during a java course, and referred to the same book :-)  But I also haven't seen anyone really follow that rule...

Even in that specific case, I don't think we are in much danger because using getErrorStream() and getInputStream() at the same time as disposing the gdb process, and therefore the debug session, does not seem likely to yield much a functionality issue.

Are you ok with this patch for HEAD?

Thanks
Comment 17 John Cortell CLA 2011-05-13 14:26:02 EDT
> Are you ok with this patch for HEAD?

Ship it! :-)
Comment 18 Marc Khouzam CLA 2011-05-13 14:33:22 EDT
(In reply to comment #17)
> > Are you ok with this patch for HEAD?
> 
> Ship it! :-)

Done.
We can now execute all the DSF-GDB JUnit tests in a single run, so they could be run nightly.  That should help us feel safer about changes.

I'm running them now.  Just for fun :-)

Thanks for your help.
Comment 19 CDT Genie CLA 2011-05-13 15:23:05 EDT
*** cdt cvs genie on behalf of mkhouzam ***
Bug 323071: Make sure LargePipedInputStream does not leak so that we can run all JUnit tests in a single run.

[*] AbstractCLIProcess.java 1.8 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf-gdb/org.eclipse.cdt.dsf.gdb/src/org/eclipse/cdt/dsf/mi/service/command/AbstractCLIProcess.java?root=Tools_Project&r1=1.7&r2=1.8