Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 480076

Summary: Excessive threads created (more than 30,000) for some tests, on some machines
Product: [Eclipse Project] Platform Reporter: David Williams <david_williams>
Component: UIAssignee: Markus Keller <markus.kell.r>
Status: VERIFIED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: arunkumar.thondapu, daniel_megert, denis.roy, Lars.Vogel, loskutov, markus.kell.r, mikael.barbero, pascal, tjwatson, webmaster
Version: 4.6   
Target Milestone: 4.5.2   
Hardware: All   
OS: All   
See Also: https://git.eclipse.org/r/60675
https://git.eclipse.org/r/60732
https://git.eclipse.org/c/platform/eclipse.platform.ui.git/commit/?id=5987551acfa800096c0d908175e5a813abced853
https://git.eclipse.org/r/62565
Whiteboard:
Bug Depends on: 481281    
Bug Blocks:    
Attachments:
Description Flags
one jstack output during excessive threading.
none
patch for build I20151117-0800
none
feature patch with latest Gerrit patch none

Description David Williams CLA 2015-10-18 20:38:10 EDT
Might have started with the previous I-build, not sure.

And, so far, not sure how to reproduce. 

But, the problem is I will have an IDE open, but be doing other things, and then get an "out of memory" exception in the iDE (after, say 30 minutes or an hour). The first message (before "out of memory") is "can not create a native thread" ... I think perhaps says "can not create native thread for GC". 

It does seem to depend on "having an editor open". I've not tracked which one, but might be XML? For Feature editor? 

Also, so far, have seen on only one machine, and on that machine, besides everything else, I do have unit tests and performance tests running "in the background". 

The "system monitor" does show that memory is full -- up to 16G (which, is very rare) but so far not even positive it is the IDE that is consuming that memory. 

One theory, is that I have mx set too high, so that for all the instances of IDE's, and various tests running, in total they are more memory than the machine has? 

Just wanted to make a note, in case others see, or in case I have time to systemically reproduce. 

The only reason I even suspect "Eclipse", is that is did seem to "just start" recently, after upgrading to latest I-build, but the things I am doing (background and foreground) are not that different. 

I did write a script to monitor "number of threads", and one of them is excessive, when the error occurs. 

        java PID:       2301    Number of threds in process:    25
        java PID:       5001    Number of threds in process:    25
        java PID:       5041    Number of threds in process:    25
        java PID:       5150    Number of threds in process:    36
        java PID:       5398    Number of threds in process:    36
        java PID:       5678    Number of threds in process:    72
        java PID:       5889    Number of threds in process:    19
        java PID:       20412   Number of threds in process:    55
        java PID:       21491   Number of threds in process:    2024
        java PID:       23008   Number of threds in process:    40
        java PID:       26458   Number of threds in process:    41

A few minutes can pass (without me responding to IDE prompt to restart) and the threads go back down, and memory total memory goes back down (total, in system, as shows up in System Monitor).
Comment 1 David Williams CLA 2015-10-18 20:44:26 EDT
Markus or Tom, 

I doubt this is an "Eclipse Problem" ... but, if it is, it's serious. 

So, wanted to CC you to find out if you know of any *easy* ways to narrow down the source on "memory consumer"?  Plus, if non-Eclipse memory is exhausted, would that make Eclipse throw "out of memory" error (even though Eclipse itself wasn't literally increasing memory usage? -- i.e. could Eclipse's method trying to reclaim garbage/heap throw a spurious error?) 

just asking ... don't spend more than 5 minutes on it. :) Might be "just me".
Comment 2 David Williams CLA 2015-10-18 22:33:18 EDT
I do see, with MANIFEST Editor open that the little "heap indicator" (in Eclipse) does "cycle" from about 200M of 1000M up to around 500M of 1000M, and then back down again. I do not think it's growing or leaking, but do see as a case where if "external memory" was all consumed by a unit test, or a performance test, then during one of those "up swings", Eclipse would "run out of memory". More specifically, it probably "tries and tries" to reduce memory, again, each time creating a new thread, that "can't run"? Eventually running out of memory to create a GC thread". 

Or, something like that. 

I do not know if this is new behavior, or if I'm running unit and performance tests more often, so it increases the probability of overlapping,
Comment 3 Andrey Loskutov CLA 2015-10-19 00:42:39 EDT
If you can, please make jstack dump at time where your Eclipse is running with high number of threads. 

I guess your are hitting this: http://stackoverflow.com/questions/763579/how-many-threads-can-a-java-vm-support, but the question is: who is creating that high number of new threads in your workspace? Jstack could help to see what all those 2000 threads are doing and probably could help to identify the root cause.
Comment 4 David Williams CLA 2015-10-19 09:02:03 EDT
(In reply to Andrey Loskutov from comment #3)
> If you can, please make jstack dump at time where your Eclipse is running
> with high number of threads. 
> 
> I guess your are hitting this:
> http://stackoverflow.com/questions/763579/how-many-threads-can-a-java-vm-
> support, but the question is: who is creating that high number of new
> threads in your workspace? Jstack could help to see what all those 2000
> threads are doing and probably could help to identify the root cause.

For my education, is jstack better than just using kill -3 <pid>? 
If you are able to read such stacks, I assume best to use "all" the options? 

jstack -F -m -l <pid>

Thanks for the suggestion.
Comment 5 Andrey Loskutov CLA 2015-10-19 09:13:45 EDT
(In reply to David Williams from comment #4)
> For my education, is jstack better than just using kill -3 <pid>? 
> If you are able to read such stacks, I assume best to use "all" the options? 
> 
> jstack -F -m -l <pid>
> 
> Thanks for the suggestion.

kill -3 <pid> does not work for me with any process on RHEL7 (I also never used this), so don't know if this is better as jstack or not.

We use jstack a lot, and if it works, it works fine.

I would run "jstack -l <pid> > dump.log" because "-F" sometimes does not work (depends on which packages are installed on your machine), and "-m" is not relevant for your case.
Comment 6 Andrey Loskutov CLA 2015-10-19 09:16:16 EDT
(In reply to Andrey Loskutov from comment #5)
> We use jstack a lot, and if it works, it works fine.

You must make sure that your jstack is from same JDK installation path as the running JVM, at least not from older JDK version, sometimes they won't work together.
Comment 7 David Williams CLA 2015-10-19 20:14:45 EDT
Created attachment 257367 [details]
one jstack output during excessive threading.

I don't have much, but will attach the little I have, I maybe it will be a clue (to someone) on how to better collect the data. 

The attached jstack output shows many many threads, and the few places that has a recognizable class name, it appears those threads are coming from the unit tests that are running in the background, on an xvfb port :111 (in other words, I don't "see" what's running). 

But, I did get an "out of memory" message in the IDE, and this small bit of lot in that IDE's .metadata/.log: 

!SESSION 2015-10-19 16:32:22.391 -----------------------------------------------
eclipse.buildId=4.6.0.I20151019-1802
java.version=1.8.0_60
java.vendor=Oracle Corporation
BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk, NL=en_US
Command-line arguments:  -os linux -ws gtk -arch x86_64

!ENTRY org.eclipse.core.jobs 4 2 2015-10-19 16:34:06.323
!MESSAGE An internal error occurred during: "Collecting garbage".
!STACK 0
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:714)
	at java.lang.ref.Finalizer$1.run(Finalizer.java:130)
	at java.lang.ref.Finalizer$1.run(Finalizer.java:123)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.lang.ref.Finalizer.forkSecondaryFinalizer(Finalizer.java:122)
	at java.lang.ref.Finalizer.runFinalization(Finalizer.java:146)
	at java.lang.Runtime.runFinalization0(Native Method)
	at java.lang.Runtime.runFinalization(Runtime.java:712)
	at java.lang.System.runFinalization(System.java:1015)
	at org.eclipse.ui.internal.ide.application.IDEIdleHelper$3.run(IDEIdleHelper.java:187)
	at org.eclipse.core.runtime.jobs.Job$1.run(Job.java:162)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)




It is hard for me to understand how the unit tests can be related to the IDE (on display :0). 

Does jstack list "all threads that VM is involved in"? Even from different PIDs?
Comment 8 Pascal Rapicault CLA 2015-10-19 20:22:58 EDT
You may want to try to run with the following additional VM args (you will need to tweak the path):
 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath="c:\temp\dump2.hprof"

(from http://stackoverflow.com/questions/4935520/generate-java-dump-when-outofmemory)

These will cause a heap dump to be generated when the VM runs out of memory. With this in hand, we may be able to figure out what is happening using a tool like Eclipse Mat.
Comment 9 Andrey Loskutov CLA 2015-10-20 16:25:02 EDT
(In reply to David Williams from comment #7)
> Created attachment 257367 [details]
> one jstack output during excessive threading.

Some UI test goes wild??? 

> It is hard for me to understand how the unit tests can be related to the IDE
> (on display :0). 

Haven't you started UI tests from that IDE?
 
> Does jstack list "all threads that VM is involved in"? Even from different
> PIDs?

Nope. jstack always shows threads from given process only. So in doubt just check how many Eclipse instances you run with "jps -l".
Comment 10 David Williams CLA 2015-10-20 17:08:45 EDT
(In reply to Andrey Loskutov from comment #9)
> (In reply to David Williams from comment #7)
> > Created attachment 257367 [details]
> > one jstack output during excessive threading.
> 
> Some UI test goes wild??? 

Yes, that's one of them running during "performance test" which isn't 
too surprising. Not sure if it is doing something wild on purpose, a bug, 
or something wrong with my xvfb setup. 

> > It is hard for me to understand how the unit tests can be related to the IDE
> > (on display :0). 
> 
> Haven't you started UI tests from that IDE?

No. And that, to me, is the weird part of this bug. I think what happens 
is the performance test exhausts resources on the whole machine, so that the 
IDE can not operate "normally" in trying to reclaim some heap. (Even though, it does not really need to, at that moment.) I'm not even sure the IDE needs to do that at all any more ... I think most JVMs do that themselves, these days?  
  
> > Does jstack list "all threads that VM is involved in"? Even from different
> > PIDs?
> 
> Nope. jstack always shows threads from given process only. So in doubt just
> check how many Eclipse instances you run with "jps -l".

Good tip. I was trying to figure out how to tell the difference, easily, bwtwen all the version of java that are running (for Ant, Hudson, etc., plus IDEs and unit tests.
Comment 11 David Williams CLA 2015-10-20 17:12:24 EDT
I think by coincidence I just started seeing around I20151014-1100. So removing that from the title. 

I suspect this is a relatively 'minor' bug (i.e. would not normally be "noticeable", but if users had constrained machines to begin with (say with merely 8 Gig of memory installed :) they might see more often?
Comment 12 David Williams CLA 2015-10-26 09:29:01 EDT
classifying as "minor", since hard to reproduce. I will eventually try to isolate which performance test is using so many threads to have a stand-alone test case, but I think specific to my machine and procedure and is interesting only as a "stress test". 

FWIW, once this performance test "eats up" all the threads, I actually see problems in code that has nothing to do with Java or the IDE, such as cron jobs that say "can not fork". 

As far as I know, could be related to the version of the Linux Kernel I am running (4.1.8), or something odd like that.
Comment 13 David Williams CLA 2015-11-03 22:12:36 EST
I think the test responsible, or, related, to this issue is GenerateIdentifiersTest. Which is a test of retrieval for "activitesManager". 

From a quick read, it creates 10000 IDs, and then has this section, inside a "runnable":

                startMeasuring();
                for(int i = 0; i < ids.length; i++) {
                    activityManager.getIdentifier(ids[i]);
                }
                stopMeasuring();

I am not sure, but would not be surprised if "startMeasuring" and "stopMeasuring" are done in their own thread or process? I'd have to dig deep, to know. 

But, the problem does not occur on the production machine where we normally run the tests. 

I've compared the machines settings to see what might account for the difference. Technically, I looked at "build.eclipse.org", not the "performance test machine", but I suspect it would have the same settings. And compared those "build machine" settings, to my "Ubuntu test machine". 

One thing that stood out, is "nproc" (number of processes per user) is set to 
 192436 on build machine, but on my test machine is only 62790. While I doubt his test generates 62790 threads or processes, it may generate enough to "tip over". Part of the reason is that on test machine is that my main user id "does everything" ... runs database, performance tests, hudson, and countless other things that a "one user" machine might do. On the production machine, all these roles and responsibilities are more professionally divided between different "users". 

Bottom line, I will change my setting to 4 times what it current is, and see if I can reproduce. (Only reason I haven't yet, is that I need to reboot (or, at least log out).
Comment 14 David Williams CLA 2015-11-04 01:04:55 EST
(In reply to David Williams from comment #13)

> One thing that stood out, is "nproc" (number of processes per user) is set
> to 
>  192436 on build machine, but on my test machine is only 62790. While I
> doubt his test generates 62790 threads or processes, it may generate enough
> to "tip over". Part of the reason is that on test machine is that my main
> user id "does everything" ... runs database, performance tests, hudson, and
> countless other things that a "one user" machine might do. On the production
> machine, all these roles and responsibilities are more professionally
> divided between different "users". 


Just for completeness, and later comparison?, on my test machine

$ cat /proc/sys/kernel/threads-max
125580

while on build.eclipse.org (and I assume the performance machine), the value is 
384873

I do not think this is "directly settable", but is combination of 
other variables, according to https://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux

But, that post is kind of contradictory, and also it led me to read that "getconf -a" and setconf may be the "modern way" to view and control some of these settings ... so, less sure my change will help :/
Comment 15 David Williams CLA 2015-11-04 18:45:03 EST
Just to document status, increasing nproc to 251160 did not help. 

The error still occurs, on my test machine. 

I did measure "number of threads" at around 31000, 

java PID:       20929   Number of threds in process:    31498

and overall threads being ran by my id was not a lot more than that: 


        ps -u davidw | wc -l == 148

        ps -L -u davidw | wc -l == 32150

And, did confirm, it is definitely during org.eclipse.ui.tests.performance.UIPerformanceTestSuite

20929 org.eclipse.core.launcher.Main -application org.eclipse.test.uitestapplication -data /data/shared/hudson/hudsonhome/jobs/ep46I-perf-lin64-baseline/workspace/workarea/I20151104-2057/eclipse-testing/test-eclipse/eclipse/performance-workspace-platform-ui formatter=org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter,/data/shared/hudson/hudsonhome/jobs/ep46I-perf-lin64-baseline/workspace/workarea/I20151104-2057/eclipse-testing/test-eclipse/eclipse/org.eclipse.ui.tests.performance.UIPerformanceTestSuite.xml -testPluginName org.eclipse.ui.tests.performance -className org.eclipse.ui.tests.performance.UIPerformanceTestSuite -os linux -ws gtk -arch x86_64 -consolelog -timeout 7200000 -junitReportOutput /data/shared/hudson/hudsonhome/jobs/ep46I-perf-lin64-baseline/workspace/workarea/I20151104-2057/eclipse-testing/results/linux.gtk.x86_64_8.0

At this moment, I am out of ideas. Guess I'll continue to compare test machine, with production machine.
Comment 16 Markus Keller CLA 2015-11-09 11:50:31 EST
(In reply to David Williams from comment #13)
> I think the test responsible, or, related, to this issue is
> GenerateIdentifiersTest.

I don't see anything that would support this hypothesis. org.eclipse.test.performance is pretty self-contained and the implementation doesn't use Jobs or Threads. start/stopMeasuring() have to do the measurement in the calling thread, and the default performance meter (OSPerformanceMeter) is also single-threaded.

I manually started the UIPerformanceTestSuite in Debug mode and set a conditional breakpoint in Thread#start() with this condition:

ThreadGroup g= Thread.currentThread().getThreadGroup();
ThreadGroup p= g.getParent();
while (p != null) {
	g= p;
	p= g.getParent();
}
int c= g.activeCount();
return c > 25;

The breakpoint was only hit in the OpenMultipleEditorTest, where the "java" tests open 100 editors and therefore create and start 100 reconciler threads. I increased the limit to 120, and then the breakpoint wasn't hit any more until the suite finished.

(In reply to David Williams from comment #7)
> Created attachment 257367 [details]
> one jstack output during excessive threading.

Most of the 31332 threads are "Worker" threads that are created by the Eclipse Jobs framework. 

> java.lang.OutOfMemoryError: unable to create new native thread

I think the OOME is just an abuse of that error type. It's not about running out-of-memory, but about running into the OS limit for the maximal number of threads. Once OS resources are used up, this exception can happen in any process that tries to create a new thread (which can be totally unrelated to the process that caused the problem).


I suggest you just try to run the UIPerformanceTestSuite from the workspace with the conditional breakpoint I mentioned above. Set the limit to 150 and let it run over lunch (or whenever you can survive 1h without that machine;-).
Alternatively, you could also set a conditional breakpoint in org.eclipse.core.internal.jobs.Worker#Worker(WorkerPool) and stop if nextWorkerNumber > 5000.
Comment 17 David Williams CLA 2015-11-10 03:32:49 EST
(In reply to Markus Keller from comment #16)
> (In reply to David Williams from comment #13)
> > I think the test responsible, or, related, to this issue is
> > GenerateIdentifiersTest.
> 
> I don't see anything that would support this hypothesis.
> org.eclipse.test.performance is pretty self-contained and the implementation
> doesn't use Jobs or Threads. start/stopMeasuring() have to do the
> measurement in the calling thread, and the default performance meter
> (OSPerformanceMeter) is also single-threaded.

I do not disagree with anything you've said. I suspected GenerateIdentifiersTest based on code inspection only. I was looking for things that might call "garbage collection" rapidly. I've lost track now, but 
a) know the IDE itself does at times, such as in IDEIdelHelper, and 
b) the performance framework does (at least, used to ... I just noticed tonight one of those "runtim.gc calls is commented out?). 

I suspect IDEIdelHelper::createGarbageCollectionJob is why I get the error in my idle IDE simply because the performance tests is running consuming resources. The performance test would be using a completely different instance of Eclipse. It would, in my case, be using the same VM, but a different invocation of it, of course. 

But, you are right, Markus, the "place it happens" is not "what is causing it". It is worse in this case, since it is in *native threads*, so I do not think debugging threads or jobs will necessarily lead to an answer. During the performance tests, I think "GC" (and/or "finalization") are called a number of times, sometimes rapidly, and it is my guess that the native code that supports those methods is creating the native threads, for things that are hard to guess. Such as how many threads does GC create? How many does finalization create? Does System.

Worse yet, it varies by Operating System and (I am guessing) VM! See 
https://plumbr.eu/outofmemoryerror/unable-to-create-new-native-thread

My *main* concern in spotting this behavior is not that the performance tests are "big hogs" (that is sort of by design, though we may need to tweak them some) but the concern is on a memory or thread constrained system, the "idle IDE" ends up "throwing an error" ... instead of continuing to sit there idle. 

It appears that IDEIdleHelper::createGarbageCollectionJob does have some logic in it so that it will stop trying to garbage collect if it takes more than 8 seconds. IF (and, a big IF, I have not confirmed this is the code causing the "pop up") ... IF this the code is what causes the dialog to "pop up" in an Idle IDE, then we might want to make it a little smarter. For example, if one of these "out of native threads" errors occurs, to just have it print a warning or something to the log, and to disable explicit GC. Whether or not the IDE is still usable depends on how long it might be before the number of native threads returns to normal levels -- not to mention how long until a user actually tries to use the idle IDE! 

= = = = = = 

The second aspect, what causes so many native threads to be used during performance tests is probably a less important problem. It would be good to solve ... but not one that would impact users. 

I did (mostly?) rule out "database effects" tonight, since I tried running only the 'uiperformance' target in my local performance testing system without a database (so the measurements were simply printed to the logs for the tests) and the wild number of threads still occurred. 

I also turned off the "Heap Status" indicator, but only in my idle IDE. Do we turn that on by default, these days? I'll have to check. That widget also calls "gc" and "finalize" and I *think* some of our tests "simulate" mouse clicks? (mouse-up is what triggers the GC in that little widget). 

= = = = = =

I think I've reported before, but I am using Ubuntu 12.04, and Oracle's 1.8 VM -- if it turns out that matters. 

According to that reference above: 
https://plumbr.eu/outofmemoryerror/unable-to-create-new-native-thread
My troubles begin right when it says it should, for Ubuntu 12.04 ... at around 31 K threads! 

= = = = = =

Thanks for taking a look Markus. As always, it is good to have your sanity checks ... to keep me from going insane! :)
Comment 18 Markus Keller CLA 2015-11-10 08:08:04 EST
> The second aspect, what causes so many native threads to be used during
> performance tests is probably a less important problem.

I think that's the main problem and tackling this is the only way to find out what's really going wrong. Investigations around GC code or at the point where threads are already exhausted are not going to help.

Comment 7 shows that the problem is that the Jobs frameworks creates and starts way too many Worker threads. It's likely a bug in the framework, and it would be interesting to know why that happens in your setup.

To launch the command line from comment 15 in debug mode from Eclipse, I'd create a new Eclipse Application launch config, choose "Run an application: org.eclipse.test.uitestapplication", and then paste the rest of the command line (everything after "-data xxx") into "Arguments > Program arguments".
Comment 19 David Williams CLA 2015-11-11 12:39:25 EST
I have not been able to reproduce this memory/thread problem in the IDE. 

My latest "theory" is that it may be related to the version of GTK that is "detected" while running the performance tests in their "test environment". The biggest factor in the "test environment" is that they are running under Xfvb, and, I am not sure if all Xvfb are "equal" when it comes to GTK+ code. 

Plus, while "old news", I have found some issues with Xvfb and memory leaks. For example, see 

http://blog.jeffterrace.com/2012/07/xvfb-memory-leak-workaround.html

Anyone remember a good command line method (or, Java, or Ant :) to determine which version of GTK is "seen" and especially if there might be "mixed" versions? 
As I recall, this was a problem on Ubuntu 12.04? 

Plus, I think in my local test environment, I'll try some runs with 
export SWT_GTK=0, to see if that makes a difference. 
(That is how I normally run my IDEs, though not sure it is still required?) 

I am not sure how Xvfb "interacts" with GTK, but do notice on Ubuntu, I have

xorg-server 1.11.3
ant the eclipse foundation appears to have 
xorg-server 1.6.5
Comment 20 David Williams CLA 2015-11-11 12:40:36 EST
Arun, 

Adding you to CC, in case you know anything about "Xvfb" and GTK?
Comment 21 David Williams CLA 2015-11-15 17:42:10 EST
(In reply to David Williams from comment #19)
> I have not been able to reproduce this memory/thread problem in the IDE. 

Actually, I have been able to reproduce easily, in the IDE. I think I had trouble before before because I had some break points set, or something, that caused it not to run "full speed" in the critical parts. 

The easiest way, for me, was to put a breakpoint in the constructor of Worker, and have it suspend VM there, when nextWorkerNumber > 200. 

Also, I have learned from watching several runs, when to hit "pause" in the debugger after it has a hundred workers or so but before it gets to that 200 limit. I mention that, since all the "excess" worker threads are at a different point, when using that method. ... So, that just complicates me trying to interpret the results. 

In the latter case, hundreds of the worker threads stacks look like (sorry if formatted poorly): 

Thread [Worker-99] (Suspended)	
	waiting for: java.lang.Class<T> (org.eclipse.swt.graphics.Device) (id=238)	
	org.eclipse.swt.widgets.Display(org.eclipse.swt.graphics.Device).isDisposed() line: 726	
	org.eclipse.ui.internal.activities.MutableActivityManager$3$1(org.eclipse.ui.progress.UIJob).run(org.eclipse.core.runtime.IProgressMonitor) line: 81	
	org.eclipse.core.internal.jobs.Worker.run() line: 55	


That section of MutableActivityManager 
is just checking if display is disposed, before starting a new job?! 
        Display asyncDisplay = getDisplay();
        if (asyncDisplay == null || asyncDisplay.isDisposed()) {
            return Status.CANCEL_STATUS;
        }
        asyncDisplay.asyncExec(new Runnable() {
            @Override
			public void run() {

= = = = =

Main is at at interesting point. The test actually has completed, and called 'tear down', which eventually goes into its own readAndDispatch loop. 

 while (display.readAndDispatch())

Here's the "top" of the thread: 

Thread [main] (Suspended)	
	owns: org.eclipse.swt.widgets.RunnableLock  (id=306)	
		waited by: Thread [WorkbenchTestable] (Suspended)	
	waiting for: java.lang.Class<T> (org.eclipse.swt.graphics.Device) (id=238)	
	org.eclipse.swt.widgets.Display(org.eclipse.swt.graphics.Device).isDisposed() line: 726	
	org.eclipse.swt.widgets.Display.readAndDispatch() line: 3434	
	org.eclipse.ui.tests.harness.util.UITestCase.processEvents() line: 229	
	org.eclipse.ui.tests.performance.GenerateIdentifiersTest(org.eclipse.ui.tests.harness.util.UITestCase).doTearDown() line: 221	
	org.eclipse.ui.tests.performance.GenerateIdentifiersTest(org.eclipse.ui.tests.performance.BasicPerformanceTest).doTearDown() line: 112	
	org.eclipse.ui.tests.performance.GenerateIdentifiersTest(org.eclipse.ui.tests.harness.util.UITestCase).tearDown() line: 209	
	org.eclipse.ui.tests.performance.GenerateIdentifiersTest(junit.framework.TestCase).runBare() line: 146	
	junit.framework.TestResult$1.protect() line: 122	
	junit.framework.TestResult.runProtected(junit.framework.Test, junit.framework.Protectable) line: 142	
	junit.framework.TestResult.run(junit.framework.TestCase) line: 125	
	org.eclipse.ui.tests.performance.GenerateIdentifiersTest(junit.framework.TestCase).run(junit.framework.TestResult) line: 129	
	org.eclipse.ui.tests.performance.ActivitiesPerformanceSuite(junit.framework.TestSuite).runTest(junit.framework.Test, junit.framework.TestResult) line: 252	
	org.eclipse.ui.tests.performance.ActivitiesPerformanceSuite(junit.framework.TestSuite).run(junit.framework.TestResult) line: 247	
	org.eclipse.ui.tests.performance.UIPerformanceTestSuite(junit.framework.TestSuite).runTest(junit.framework.Test, junit.framework.TestResult) line: 252	
	org.eclipse.ui.tests.performance.UIPerformanceTestSuite(junit.framework.TestSuite).run(junit.framework.TestResult) line: 247	
	org.eclipse.ui.tests.performance.UIPerformanceTestSetup(junit.extensions.TestDecorator).basicRun(junit.framework.TestResult) line: 23	

= = = = = = = = =

If I wait and let the constructor breakpoint stop everything, then many of the excess threads are nearing their end: 

Thread [Worker-61] (Suspended)	
	waiting for: org.eclipse.core.internal.jobs.WorkerPool  (id=279)	
	org.eclipse.core.internal.jobs.WorkerPool.decrementBusyThreads() line: 86	
	org.eclipse.core.internal.jobs.WorkerPool.endJob(org.eclipse.core.internal.jobs.InternalJob, org.eclipse.core.runtime.IStatus) line: 109	
	org.eclipse.core.internal.jobs.Worker.run() line: 72	

But the Worker Pool is creating new ones faster than they can end: 

Thread [Worker-68] (Suspended (breakpoint at line 30 in org.eclipse.core.internal.jobs.Worker))	
	owns: org.eclipse.core.internal.jobs.WorkerPool  (id=279)	
	org.eclipse.core.internal.jobs.Worker.<init>(org.eclipse.core.internal.jobs.WorkerPool) line: 30	
	org.eclipse.core.internal.jobs.WorkerPool.jobQueued() line: 143	
	org.eclipse.core.internal.jobs.WorkerPool.startJob(org.eclipse.core.internal.jobs.Worker) line: 244	
	org.eclipse.core.internal.jobs.Worker.run() line: 52	

If I expand that "owns:" list, it "owns" several hundred of he "excess" threads: 

Thread [Worker-68] (Suspended (breakpoint at line 30 in org.eclipse.core.internal.jobs.Worker))	
	owns: org.eclipse.core.internal.jobs.WorkerPool  (id=279)	
		waited by: Thread [Worker-199] (Suspended)	
		waited by: Thread [Worker-198] (Suspended)	
		waited by: Thread [Worker-197] (Suspended)	
		waited by: Thread [Worker-196] (Suspended)	
		waited by: Thread [Worker-195] (Suspended)	
		waited by: Thread [Worker-194] (Suspended)	
		waited by: Thread [Worker-193] (Suspended)	
[deleted a hundred or so here]
		waited by: Thread [Worker-3] (Suspended)	
		waited by: Thread [Worker-2] (Suspended)	
		waited by: Thread [Worker-1] (Suspended)	
		waited by: Thread [Worker-0] (Suspended)	
	org.eclipse.core.internal.jobs.Worker.<init>(org.eclipse.core.internal.jobs.WorkerPool) line: 30	
	org.eclipse.core.internal.jobs.WorkerPool.jobQueued() line: 143	
	org.eclipse.core.internal.jobs.WorkerPool.startJob(org.eclipse.core.internal.jobs.Worker) line: 244	
	org.eclipse.core.internal.jobs.Worker.run() line: 52

= = = = = = = =

The main thread in this case is still "tearing down" the GenerateIdentifiersTest and still doing a read and dispatch loop. 

Thread Group [main]	
	Thread [main] (Suspended)	
		owns: java.lang.Object  (id=307)	
		owns: org.eclipse.swt.widgets.RunnableLock  (id=308)	
		owns: org.eclipse.swt.widgets.RunnableLock  (id=306)	
			waited by: Thread [WorkbenchTestable] (Suspended)	
		org.eclipse.core.internal.jobs.JobManager.changeState(org.eclipse.core.internal.jobs.InternalJob, int) line: 385	
		org.eclipse.core.internal.jobs.JobManager.endJob(org.eclipse.core.internal.jobs.InternalJob, org.eclipse.core.runtime.IStatus, boolean) line: 689	
		org.eclipse.ui.internal.activities.MutableActivityManager$3$1(org.eclipse.core.internal.jobs.InternalJob).done(org.eclipse.core.runtime.IStatus) line: 198	
		org.eclipse.ui.internal.activities.MutableActivityManager$3$1(org.eclipse.core.runtime.jobs.Job).done(org.eclipse.core.runtime.IStatus) line: 322	
		org.eclipse.ui.progress.UIJob$1.run() line: 110	
		org.eclipse.swt.widgets.RunnableLock.run() line: 35	
		org.eclipse.ui.internal.UISynchronizer(org.eclipse.swt.widgets.Synchronizer).runAsyncMessages(boolean) line: 183	
		org.eclipse.swt.widgets.Display.runAsyncMessages(boolean) line: 3795	
		org.eclipse.swt.widgets.Display.readAndDispatch() line: 3434	
		org.eclipse.ui.tests.harness.util.UITestCase.processEvents() line: 229	
		org.eclipse.ui.tests.performance.GenerateIdentifiersTest(org.eclipse.ui.tests.harness.util.UITestCase).doTearDown() line: 221	
		org.eclipse.ui.tests.performance.GenerateIdentifiersTest(org.eclipse.ui.tests.performance.BasicPerformanceTest).doTearDown() line: 112	
		org.eclipse.ui.tests.performance.GenerateIdentifiersTest(org.eclipse.ui.tests.harness.util.UITestCase).tearDown() line: 209	
		org.eclipse.ui.tests.performance.GenerateIdentifiersTest(junit.framework.TestCase).runBare() line: 146	


= = = = = = = =

I am just giving this "dump" here, since I've been obsessed with this "minor" problem, I think I need to put in on the back burner. 

I've tried various "guesses" of making different things synchronized or similar, and nothing seemed to help. At most just "moved the choke point" around. 

And before anyone else does any re-investigation, there were some "big changes" made MutableActivityManager in that in Neon it uses a new Job API (See bug 474273 and bug 474264) but this problem occurs also when running tests against Mars so I do not think related to that. 

My intuition is that there there is some tight loop that "hogs" a CPU and creates jobs faster than another thread can finish them (That is, not a true "deadlock" or anything -- though, will admit there are a few "iffy" parts of code ... where one 'job', will create 2 or 3 or 5 others? But, apparently that normally works ok.  I may try to recompile my Linux kernel with some different options, but other than that, I am, again, out of ideas. And, again, I'll say "thank goodness it is only me and my machine" :) That is, I still think minor, for that reason. But would be nice to understand before someone reports an issue "from the field".
Comment 22 Markus Keller CLA 2015-11-16 15:17:22 EST
(In reply to David Williams from comment #21)
> Thread [Worker-99] (Suspended)	
> 	waiting for: java.lang.Class<T> (org.eclipse.swt.graphics.Device) (id=238)	

> Thread [main] (Suspended)	
> 	owns: org.eclipse.swt.widgets.RunnableLock  (id=306)	
> 		waited by: Thread [WorkbenchTestable] (Suspended)	
> 	waiting for: java.lang.Class<T> (org.eclipse.swt.graphics.Device) (id=238)	
> 	org.eclipse.swt.widgets.Display(org.eclipse.swt.graphics.Device).isDisposed()
> line: 726	

It would be interesting to know who owns the monitor for the Device.class in these cases. I think the owner will reveal who is causing the lock contention that makes the WorkerPool create so many Workers. The WorkerPool itself doesn't know why all workers are busy.

This could also be caused by another bundle in your workspace that synchronizes on the Device.class for longer periods.
Comment 23 David Williams CLA 2015-11-16 19:40:23 EST
(In reply to Markus Keller from comment #22)
> (In reply to David Williams from comment #21)
> > Thread [Worker-99] (Suspended)	
> > 	waiting for: java.lang.Class<T> (org.eclipse.swt.graphics.Device) (id=238)	
> 
> > Thread [main] (Suspended)	
> > 	owns: org.eclipse.swt.widgets.RunnableLock  (id=306)	
> > 		waited by: Thread [WorkbenchTestable] (Suspended)	
> > 	waiting for: java.lang.Class<T> (org.eclipse.swt.graphics.Device) (id=238)	
> > 	org.eclipse.swt.widgets.Display(org.eclipse.swt.graphics.Device).isDisposed()
> > line: 726	
> 
> It would be interesting to know who owns the monitor for the Device.class in
> these cases. I think the owner will reveal who is causing the lock
> contention that makes the WorkerPool create so many Workers. The WorkerPool
> itself doesn't know why all workers are busy.
> 
> This could also be caused by another bundle in your workspace that
> synchronizes on the Device.class for longer periods.

I'm not sure if there is a systematic way to tell but by "hitting pause" again and then manually looking through the thread/locks in the debugger, I do see one suspicious one [pasted below]. It was only other instance of "waiting for device" I could see on this particular "pause". Other than the type I pasted earlier .. also pasted below.  
It appears this "WorkbenchTestable" thread is "owned by Main", and both threads are "waiting for Device", with one (main) owning "RunnableLock" and the other waiting for it. 

If I understand you, though, the "smoking gun" will be finding one that say it "owns" the Device class? Guess I try a few more runs? 


Thread Group [main]	
	Thread [main] (Suspended)	
		owns: org.eclipse.swt.widgets.RunnableLock  (id=529)	
			waited by: Thread [WorkbenchTestable] (Suspended)	
		waiting for: java.lang.Class<T> (org.eclipse.swt.graphics.Device) (id=265)	
		org.eclipse.swt.widgets.Display(org.eclipse.swt.graphics.Device).isDisposed() line: 726	
		org.eclipse.swt.widgets.Display.checkDevice() line: 766	
		org.eclipse.swt.widgets.Display.readAndDispatch() line: 3416	
		org.eclipse.ui.tests.harness.util.UITestCase.processEvents() line: 229	
		org.eclipse.ui.tests.performance.GenerateIdentifiersTest(org.eclipse.ui.tests.harness.util.UITestCase).doTearDown() line: 221	
                [....]

Thread [WorkbenchTestable] (Suspended)	
	waiting for: org.eclipse.swt.widgets.RunnableLock  (id=529)	
		owned by: Thread [main] (Suspended)	
			waiting for: java.lang.Class<T> (org.eclipse.swt.graphics.Device) (id=265)	
	java.lang.Object.wait(long) line: not available [native method]	
	org.eclipse.swt.widgets.RunnableLock(java.lang.Object).wait() line: 502	
	org.eclipse.ui.internal.UISynchronizer(org.eclipse.swt.widgets.Synchronizer).syncExec(java.lang.Runnable) line: 247	
	org.eclipse.ui.internal.UISynchronizer.syncExec(java.lang.Runnable) line: 145	
	org.eclipse.swt.widgets.Display.syncExec(java.lang.Runnable) line: 4634	
	org.eclipse.e4.ui.internal.workbench.swt.E4Testable.runTest(java.lang.Runnable) line: 115	
	org.eclipse.test.UITestApplication.runTests() line: 194	
	org.eclipse.e4.ui.internal.workbench.swt.E4Testable$1.run() line: 73	
	java.lang.Thread.run() line: 745	



Thread [Worker-397] (Suspended)	
	waiting for: java.lang.Class<T> (org.eclipse.swt.graphics.Device) (id=265)	
	org.eclipse.swt.widgets.Display(org.eclipse.swt.graphics.Device).isDisposed() line: 726	
	org.eclipse.ui.internal.activities.MutableActivityManager$3$1(org.eclipse.ui.progress.UIJob).run(org.eclipse.core.runtime.IProgressMonitor) line: 81	
	org.eclipse.core.internal.jobs.Worker.run() line: 55
Comment 24 David Williams CLA 2015-11-16 20:13:06 EST
Here's one that's probably what you had in mind ... hope if means more to you, than me :) Main is waiting for device, but is is owned by Worker 34, and Worker 34 is the one generating lots of other jobs. 

Thread [main] (Suspended)	
	owns: org.eclipse.swt.widgets.RunnableLock  (id=212)	
	waiting for: java.lang.Class<T> (org.eclipse.swt.graphics.Device) (id=126)	
		owned by: Thread [Worker-34] (Suspended)	
	org.eclipse.swt.widgets.Display(org.eclipse.swt.graphics.Device).isDisposed() line: 726	
	org.eclipse.swt.widgets.Display.readAndDispatch() line: 3434	
	org.eclipse.ui.tests.harness.util.UITestCase.processEvents() line: 229	
	org.eclipse.ui.tests.performance.GenerateIdentifiersTest(org.eclipse.ui.tests.harness.util.UITestCase).doTearDown() line: 221	
	org.eclipse.ui.tests.performance.GenerateIdentifiersTest(org.eclipse.ui.tests.performance.BasicPerformanceTest).doTearDown() line: 112	
	org.eclipse.ui.tests.performance.GenerateIdentifiersTest(org.eclipse.ui.tests.harness.util.UITestCase).tearDown() line: 209	
	org.eclipse.ui.tests.performance.GenerateIdentifiersTest(junit.framework.TestCase).runBare() line: 146	
	junit.framework.TestResult$1.protect() line: 122	
	junit.framework.TestResult.runProtected(junit.framework.Test, junit.framework.Protectable) line: 142	
	junit.framework.TestResult.run(junit.framework.TestCase) line: 125	
	org.eclipse.ui.tests.performance.GenerateIdentifiersTest(junit.framework.TestCase).run(junit.framework.TestResult) line: 129	
	org.eclipse.ui.tests.performance.ActivitiesPerformanceSuite(junit.framework.TestSuite).runTest(junit.framework.Test, junit.framework.TestResult) line: 252	
	org.eclipse.ui.tests.performance.ActivitiesPerformanceSuite(junit.framework.TestSuite).run(junit.framework.TestResult) line: 247	
	org.eclipse.ui.tests.performance.UIPerformanceTestSuite(junit.framework.TestSuite).runTest(junit.framework.Test, junit.framework.TestResult) line: 252	
	org.eclipse.ui.tests.performance.UIPerformanceTestSuite(junit.framework.TestSuite).run(junit.framework.TestResult) line: 247	
	org.eclipse.ui.tests.performance.UIPerformanceTestSetup(junit.extensions.TestDecorator).basicRun(junit.framework.TestResult) line: 23	
	junit.extensions.TestSetup$1.protect() line: 23	
	junit.framework.TestResult.runProtected(junit.framework.Test, junit.framework.Protectable) line: 142	
	org.eclipse.ui.tests.performance.UIPerformanceTestSetup(junit.extensions.TestSetup).run(junit.framework.TestResult) line: 27	
  [...]
  

Thread [Worker-34] (Suspended)	
	owns: java.lang.Object  (id=282)	
	owns: java.lang.Class<T> (org.eclipse.swt.graphics.Device) (id=126)	
		waited by: Thread [Worker-404] (Suspended)	
		waited by: Thread [Worker-403] (Suspended)	
		waited by: Thread [Worker-402] (Suspended)	
		waited by: Thread [Worker-401] (Suspended)	
		waited by: Thread [Worker-400] (Suspended)	
                [several hundred more of these theads waiting]
		waited by: Thread [Worker-3] (Suspended)	
		waited by: Thread [Worker-2] (Suspended)	
		waited by: Thread [Worker-1] (Suspended)	
		waited by: Thread [main] (Suspended)	
	org.eclipse.ui.internal.UISynchronizer(org.eclipse.swt.widgets.Synchronizer).addLast(org.eclipse.swt.widgets.RunnableLock) line: 111	
	org.eclipse.ui.internal.UISynchronizer(org.eclipse.swt.widgets.Synchronizer).asyncExec(java.lang.Runnable) line: 140	
	org.eclipse.ui.internal.UISynchronizer.asyncExec(java.lang.Runnable) line: 123	
	org.eclipse.swt.widgets.Display.asyncExec(java.lang.Runnable) line: 732	
	org.eclipse.ui.internal.activities.MutableActivityManager$3$1(org.eclipse.ui.progress.UIJob).run(org.eclipse.core.runtime.IProgressMonitor) line: 84	
	org.eclipse.core.internal.jobs.Worker.run() line: 55
Comment 25 David Williams CLA 2015-11-16 20:14:46 EST
Changing title to be a little more descriptive of "root" problem.
Comment 26 David Williams CLA 2015-11-17 01:51:29 EST
(In reply to Markus Keller from comment #22)

> This could also be caused by another bundle in your workspace that
> synchronizes on the Device.class for longer periods.

I think that is the problem, and think I know a solution. 

In MutableActivityManager, near the very end, around line 953 is this code: 

UIJob notifyJob = new UIJob("Identifier Update Job") { //$NON-NLS-1$
	@Override
	public IStatus runInUIThread(
			IProgressMonitor monitor) {
		notifyIdentifiers(identifierEventsByIdentifierId);
		return Status.OK_STATUS;
	}
 };

A job forced on to UI thread, that "notifies identifiers". (what ever that is). 

That method, notifyIdentifiers can be long running (esp. in performance test). 
And, I'm guessing each notify-ee may themselves start UI jobs? 
AND, from what I can quickly see, it is called from other places without being forced on to the UI thread. 
I think a more typical pattern for "notifiers" is they run on a non-UI thread, and if the receiver happens to need to do something on the UI thread, then it is their responsibility to do so -- in general, they may, or may not. But, I will admit, I do not much about these "identifiers". 

So, I would suggest replacing that UIJob code with a simple job. (I've embellished a little, to check for "cancel" since that seems like the better behaved "job"). Namely, 

Job notifyJob = new Job("Identifier Update Job") { //$NON-NLS-1$
	@Override
	public IStatus run(
		    IProgressMonitor monitor) {
		if (monitor.isCanceled()) {
			return Status.CANCEL_STATUS;
		}
		notifyIdentifiers(identifierEventsByIdentifierId);
		return Status.OK_STATUS;
	}
};

Running this code locally, from IDE, all the performance tests run, with no excess threads created. And, there was no "called from non UI thread" exceptions that I feared :) ... though, of course, this was only the performance tests. 

From what I can tell from history, the code has been that way (as a UI Job) for a very long time ... but, I think it's wrong. 

Anyone else know?
Comment 27 David Williams CLA 2015-11-17 17:03:31 EST
If we do not need that "inner job" to be a UI Job, then I do not think there is a need to create a new job there at all. We can just call 
notifyIdentifiers(identifierEventsByIdentifierId);
in that if-statement (which is already in a job). 

That saves a little more overhead (relative to the large overhead savings of a UI Job). 

In the absence of obvious errors (such as and SET Error of "called from wrong thread") the only downside I can think of is that sometimes people intentionally, or accidentally, use the Display thread to make things "single threaded" so they occur in the correct order. I doubt that is the case here, and if that is needed, there are better ways of doing that than tying up the UI Thread. 

I've done a local build, and test, and the UI unit tests all pass as well as they ever do on my system. No obvious errors. And, I am fairly sure that "changing themes" occurs noticeably faster with the fix, than without. (At least on Linux, have not tested others). Therefore have classified as "normal" severity instead of "minor" (that is, I think effects more than performance tests). 

I'll prepare Gerrit patch ... and perhaps feature patch (after one more build), if anyone wants to "try it out". Would be interesting to check on Ubuntu 15.10 and see if it helps with bug 482230 or other UI anomalies ... Or, if makes them worse. :/
Comment 28 Eclipse Genie CLA 2015-11-17 21:29:14 EST
New Gerrit change created: https://git.eclipse.org/r/60675
Comment 29 David Williams CLA 2015-11-17 22:23:53 EST
Created attachment 258106 [details]
patch for build I20151117-0800

Seems a little faster changing themes, to me, but I am not sure. Plus, I don't change themes -- ever -- so not sure what to look for or expect. 

And, again, I've only tested on Linux. 

If others could test, I'd appreciating knowing if you think it helps performance in other areas, or if this is just a "performance test" fix. (and, not to mention, other platforms). 

As typical, the patch is just something I built locally .. so is unsigned. And, there is no categories in the zipped up repo.
Comment 30 Eclipse Genie CLA 2015-11-18 13:11:16 EST
New Gerrit change created: https://git.eclipse.org/r/60732
Comment 31 Markus Keller CLA 2015-11-18 13:39:31 EST
I can't tell for sure if everything called from org.eclipse.ui.internal.activities.MutableActivityManager#notifyIdentifiers(Map) is actually non-UI-thread safe.

"Show Annotations" on MutableActivityManager from 2aff330eaf228b7bbd4356a439d617956c381ad5 (before E4) brought up bug 191503 comment 4, which says the UIJob was necessary to fix spurious NPEs.

A safer alternative would be to avoid the tons of jobs and just pack all the updates into a single UIJob. Pushed https://git.eclipse.org/r/60732 for this.
I couldn't measure a relevant delay with this, and the ActivitiesPerformanceSuite only triggers a single UIJob instead of thousands.

I realized a bit late that the org.eclipse.ui.tests project actually prevented me from seeing the thousands of UIJobs in my workspace. Close that dreaded project and the UIJob bomb explodes.
Comment 32 David Williams CLA 2015-11-18 14:02:45 EST
(In reply to Markus Keller from comment #31)
> I can't tell for sure if everything called from
> org.eclipse.ui.internal.activities.
> MutableActivityManager#notifyIdentifiers(Map) is actually non-UI-thread safe.
> 
> "Show Annotations" on MutableActivityManager from
> 2aff330eaf228b7bbd4356a439d617956c381ad5 (before E4) brought up bug 191503
> comment 4, which says the UIJob was necessary to fix spurious NPEs.
> 
> A safer alternative would be to avoid the tons of jobs and just pack all the
> updates into a single UIJob. Pushed https://git.eclipse.org/r/60732 for this.
> I couldn't measure a relevant delay with this, and the
> ActivitiesPerformanceSuite only triggers a single UIJob instead of thousands.
> 
> I realized a bit late that the org.eclipse.ui.tests project actually
> prevented me from seeing the thousands of UIJobs in my workspace. Close that
> dreaded project and the UIJob bomb explodes.

Safer, yes, but a spurious NPE from 7 years ago doesn't seem enough reason to hog the UI thread. That does sound like a case of solving a problem by forcing Eclipse to run single threaded. And, again I'll remind you notifyIdentifiers(identifierEventsByIdentifierId);
is called elsewhere without being put on a UI Thread. I also check the API descriptions, and none implied they were automatically executed on the UI Thread. 

If you go this route, though, I'd suggest at least naming the jobs differently. Currently, they are both named "Identifier Update Job" and when I turn on debugging options, it is a little hard to tell which is which, in the output, without reading real closely. I'd suggest just adding "UI" so the UI one is "Identifier Update UI Job"?
Comment 33 Lars Vogel CLA 2015-11-18 14:16:10 EST
+1 for trying Davids solution and reverting it in case we find to (hard to fix) problems.
Comment 34 Markus Keller CLA 2015-11-18 14:49:29 EST
(In reply to David Williams from comment #32)
> Safer, yes, but a spurious NPE from 7 years ago doesn't seem enough reason
> to hog the UI thread.

That was just one example. https://bugs.eclipse.org/bugs/buglist.cgi?classification=Eclipse&longdesc=MutableActivityManager&longdesc_type=allwordssubstr shows that nobody fully understands threading in this class.

> And, again I'll remind you
> notifyIdentifiers(identifierEventsByIdentifierId);
> is called elsewhere without being put on a UI Thread.

Where? In the test and in my workspace, I only see it being called on the UI thread. Situations where it's called in other threads can be found in the above bug query.

> I also check the API
> descriptions, and none implied they were automatically executed on the UI
> Thread. 

http://help.eclipse.org/mars/index.jsp?topic=/org.eclipse.platform.doc.isv/guide/swt_threading.htm&anchor=workbench_threads admits that the threading rules in Platform UI are not very clear. But it says explicitly that the absence of a specification doesn't mean an API is thread-safe.

> I'd suggest just adding "UI" so the UI one is "Identifier Update UI Job"?

Fair enough.
Comment 35 David Williams CLA 2015-11-18 15:39:35 EST
(In reply to Markus Keller from comment #34)
> (In reply to David Williams from comment #32)
> > Safer, yes, but a spurious NPE from 7 years ago doesn't seem enough reason
> > to hog the UI thread.
> 
> That was just one example.
> https://bugs.eclipse.org/bugs/buglist.
> cgi?classification=Eclipse&longdesc=MutableActivityManager&longdesc_type=allw
> ordssubstr shows that nobody fully understands threading in this class.

Yes, that does show "nobody fully understands threading in this class". Interesting list, but hard to say if it means more should be done on the UI thread or less.  I just thought it was a good time to try and make UI more responsive and change it later if found to be not possible. It is very tangled up code. 


> Where? In the test and in my workspace, I only see it being called on the UI
> thread. Situations where it's called in other threads can be found in the
> above bug query.

I just looked for "references to" that method and found two other places in that class. Then I tried "looking up" those two, and could not see where they were put on a UI thread, and then again. I just looked a couple of levels, and admit something even "higher up" might put them on the UI Thread. 

readRegistry(boolean)
updateListeners(boolean, Map, Set, Set)

= = = = = = = 

I'll explain a little of my bias. In the back of my mind, I'm thinking that if, with e4, more is "done with EMF" then there is more data "flying around" to update the model, and if only the data is changing, then that should be done on a background thread. And only when really needed, the "minimal amount" done on the UI thread. It also seemed to me that this method and all paths to it were "internal" to the platform, so if some small thing needed to be fixed later, it would be easy to put that small thing on the UI thread, instead of the "entire update of data". My assumptions could be wrong ... just explaining what whey are.
Comment 36 David Williams CLA 2015-11-18 20:10:08 EST
(In reply to Markus Keller from comment #34)

> > I'd suggest just adding "UI" so the UI one is "Identifier Update UI Job"?
> 
> Fair enough.

I'd suggest we use Markus's Gerrit patch. The reason is that I set a breakpoint in notifyIdentifiers(identifierEventsByIdentifierId), and just ran a plain SDK in the debugger (not perf. tests). The only time I saw it called was from the UI thread. And further, all those calls were from the two methods I mentioned earlier. 

readRegistry(boolean)
updateListeners(boolean, Map, Set, Set)

I *never* saw it called from getUpdateJob. 

So, my conclusion is it probably is always "meant to be called" from UI thread, and further that the "inner" part of getUpdateJob must be called pretty rarely so I doubt there would be any performance impact. 

If we change the name to "Identifier Update UI Job" then we could at least monitor it in "real use" from time to time (by using job flags in .options file), and see if it is created/running often in real use, and if so, then come back and revisit. My guess is, though, it's only "hit" in testing ... perhaps during part of shutdown or maybe after an "update" or something fairly rare, like that.
Comment 37 David Williams CLA 2015-11-18 23:04:46 EST
Created attachment 258145 [details]
feature patch with latest Gerrit patch

This is a feature patch, with Markus's latest fix, in case anyone wants to "try it out" before committing. 

I used following .options file (there could be better or fewer options to use?) 

#Master Tracing Options
#Wed Nov 18 11:10:03 EST 2015
org.eclipse.core.jobs/jobs/errorondeadlock=true
org.eclipse.core.jobs/jobs=true
org.eclipse.ui.workbench/debug=true
org.eclipse.core.jobs/jobs/timing=true
org.eclipse.core.jobs/jobs/shutdown=true

On my Windows machine I tested by just opening empty workspace, and opening a few perspectives, clicking around, and changing themes (not sure if effects themes, at all).

Without the patch (a fresh I-build) there were at least 300 "Identifier Update Job" messages printed (some in background, some on Display thread ... hard to count, which is which, at least without too much work). 

With the patch, there were a few score of Identifier Update Job and a few score of Identifier Update UI Job messages printed. If nothing else, batching up the events all by itself may help performance! (a little).
Comment 38 Lars Vogel CLA 2015-11-19 04:32:18 EST
(In reply to David Williams from comment #36)
> I'd suggest we use Markus's Gerrit patch. The reason is that I set a
> breakpoint in notifyIdentifiers(identifierEventsByIdentifierId), and just
> ran a plain SDK in the debugger (not perf. tests). The only time I saw it
> called was from the UI thread. And further, all those calls were from the
> two methods I mentioned earlier. 

Also +1 to Markus patch in this case. My original vote was based on Davids statement that the notifyIdentifiers method is also called from non UI threads which turned out to be false.
Comment 40 Markus Keller CLA 2015-11-20 09:05:36 EST
Pushed my patch with updated job names: "Activity Identifier Update" and "... UI"
Comment 41 David Williams CLA 2015-11-24 17:04:24 EST
Just wanted to verify this fixes the problem in the context where I originally observed it. On my local test system, the threads during that UI Performance tests increases from an initial reading of 22 to 45, and that is well within the range of other Java processes running and the behavior of other tests. Thanks. 

= = = = = = = =

I also investigated some of the comments made in one of the reference documents: 

https://plumbr.eu/outofmemoryerror/unable-to-create-new-native-thread

With a small test program, found the following results on various machines I have or use. 

using 64 bit, 1.8 VMs (except the one 32 bit test on Window: 
In all but one case, the "out of memory" (no available native threads) exception was thrown. 

OS/Machine                            N threads until exception
Windows 10 (32 bit Java):                 2808 
Windows 10 (64 bit Java):            unlimited * 
Mac OS X: 10.11.1 (El Capitan):           2023
Linux Ubuntu 12:04 (custom kernel):      31703
At eclipse.org: 
Linux SUSE 11 (performance test machine): 4960
Linux SUSE 11 (build.eclipse.org CLI):   13008
Linux SUSE 11 (shared Hudson instance):  11876
Linux SUSE 11 (Platform HIPP instance):  30786

* At least nearly unlimited. My test stops after creating 100,000 threads.

The only surprising part is the differences obtained on the various "eclipse.org" machines. Not sure if this is a feature of SUSE? Or, system load? Or, just the way they are configured? 
I don't see the differences as a problem -- especially in that we should not be creating that many threads anyway :)
Comment 42 David Williams CLA 2015-11-24 17:09:42 EST
CCing webmasters, primarily for awareness, but partially for a minor question. 

Webmasters, do you know anything about "maximum threads allowed" (per user? process? or per system?) on the various Eclipse.org machines. See comment 41. 

As background summary, we had an odd test that would "run wild" on some machines (i.e. mine :) and in tracking that down learned that "maximum threads allowed" varies a lot from OS to OS, and, the surprising part, even from machine to machine, in the case of Eclipse.org machines. 

Just curious. I do not (as of now) think the differences signify a problem of any sort. 

Thanks,
Comment 43 Andrey Loskutov CLA 2015-11-24 17:30:10 EST
Differences on Linux boxes could be related to the different max stack size allowed for the user (check ulimit values) and also for the running JVM (either via -Xss and -Xmx settings or implicitly via different defaults for the different VM versions). Higher stack size => more memory for thread needed => less threads can be created. Not sure if the max available RAM size affects JVM choice for the implicit default values of Xss and Xmx, similar to what happens with "-server".
Comment 44 Mikaƫl Barbero CLA 2015-11-25 04:16:10 EST
"ulimit -u" => max user processes
"cat /proc/sys/kernel/threads-max" => max number of threads on the system

Here are some numbers:

perftests (4 GB RAM)
max user processes       31690
sys threads-max          63381

build.eclipse.org (24 GB RAM)
max user processes      192436
sys threads-max         384873

hipp1/2(64 GB RAM)
max user processes     ~515155
sys threads-max       ~1030311

other hipps (128 GB RAM)
max user processes    ~1032212
sys threads-max       ~2064424

You issue may come from the low stats of the perftest machine compared to others. But note that this machine is not shared and there is only 1 job that run at a time. So 4GB of RAM is closer to what is used by a end user and then more suitable for performance testing, isn't it?
Comment 45 Denis Roy CLA 2015-11-25 07:41:12 EST
> But note that this machine is not shared and there is only 1 job
> that run at a time. So 4GB of RAM is closer to what is used by a end user
> and then more suitable for performance testing, isn't it?

Just one point of interest: the perftest machine (which is virtual) is configured in a way to produce repeatable results regardless of the load on the host. 4 GB RAM is dedicated, as we've also dedicated 1 CPU core to it.

The absolute performance is likely well below what is available to developers (especially nowadays) but the machine will produce repeatable performance.

I tend to prefer running performance tests on slower hardware, as I find regressions are amplified and much easier to spot.
Comment 46 Lars Vogel CLA 2015-11-25 10:29:18 EST
(In reply to David Williams from comment #37)

> On my Windows machine I tested by just opening empty workspace, and opening
> a few perspectives, clicking around, and changing themes (not sure if
> effects themes, at all).

I did not do any measurements, but changing theme feels way faster with this patch. 

Thanks again Andrey.
Comment 47 David Williams CLA 2015-11-25 12:26:21 EST
I have put "backport" in the whiteboard field meaning to consider as a candidate to backport to 4.5.2 release. I think we should IF we can convince ourselves the fix makes a difference to user scenarios. The observed problem that gave rise to the bug was pretty specific to the performance tests and was related to a combination of unusual things that are done in the performance tests. As noted, even that does not occur on all machines, so am not sure what combination of "timing" or "memory size" or whatever gives rise to "excessive threads". 

I would say to fix in the maintenance stream, just for the sake of performance tests, except as far as I know, we will never use 4.5.2 as a "reference build". 

I guess if I can get the performance data collected and displayed properly we might have a better idea if it actually improves performance (over baseline) for tasks, in general, or if it is only when "tearing down" the performance test. 

Hence, "backport" is there to better find and not forget to consider for backporting when the time comes. 


 (In reply to Mikael Barbero from comment #44)
> ... So 4GB of RAM is closer to what is used by an end user
> and then more suitable for performance testing, isn't it?

Yes. Should be fine.
Comment 48 Markus Keller CLA 2015-11-25 13:14:35 EST
(In reply to Lars Vogel from comment #46)
> I did not do any measurements, but changing theme feels way faster with this
> patch.

Changing themes via the Appearance preference page doesn't trigger the code that I changed in the committed fix.

The biggest count of updated activity identifiers I see is
a) on startup
b) on the Capabilities preference page, when I disable a capability, click Apply, re-enable all capabilities, click Apply

Note that (b) doesn't work if org.eclipse.ui.tests is in your workspace. That project contains disabled activities, which block this code path.
Comment 49 Lars Vogel CLA 2015-11-25 14:09:00 EST
David, our "way" of marking a bug for downport is to reopen and set the target.

@Markus, do you think we should downport this one? +1 from me, but you should decide.
Comment 50 David Williams CLA 2015-11-25 16:06:08 EST
(In reply to Lars Vogel from comment #49)
> David, our "way" of marking a bug for downport is to reopen and set the
> target.
> 
> @Markus, do you think we should downport this one? +1 from me, but you
> should decide.

I thought that was only after it was decided to. At this point, I do not think we know enough to say we should. Hence, the reason I marked the whiteboard as a reminder to consider in the future. But, do as you like!
Comment 51 Lars Vogel CLA 2015-11-26 02:21:52 EST
(In reply to David Williams from comment #50)
> But, do as you like!

Just for clarification, AFAIK this is our process, not what I like. IMHO our process is: 
-----
We set the target to the backport release, afterwards we decide, if we want to backport it or not. I yes, we cherry-pick the commit onto the maintenance branch and mark the bug as fixed. If we decide not to downport it, we move the target back to the milestone release and mark the bug as fixed.
------
In case you think our process is different, we should check with Dani. It would be good to have one process we  all agree upon.
Comment 52 Dani Megert CLA 2015-11-26 04:51:58 EST
(In reply to Lars Vogel from comment #51)
> In case you think our process is different, we should check with Dani. It
> would be good to have one process we  all agree upon.

Setting the target milestone is expressing an intent and sets expectations for those that follow the bug, hence the target should only be set if we plan to do the backport.

I didn't look at the bug or fix in detail, but it looks like an interesting candidate to backport.
Comment 53 Markus Keller CLA 2015-11-26 06:42:18 EST
I'm not aware of a scenario where a regular user is affected by this. The biggest count of identifier update UI jobs I've seen is about 290, and the time spent in the updates was always negligible (2ms max).

The fix is low-risk, but I would still only backport it if it helps David when running performance tests.
Comment 54 David Williams CLA 2015-11-26 11:33:29 EST
There is a bit of "the Eclipse Project bugzilla process" written up at 
https://wiki.eclipse.org/Eclipse/Bug_Tracking

A bit out of date (and I know some sub-components "have tweaked" the policy), but it does say "The 'Target Milestone' field lists the Milestone or Release in which a bug is intended to get fixed or was first fixed."

= = = = = = = = = = = = 

AND, due to the recent discussion in bug 483067 I now think that 4.5.2 WILL be used as a baseline some day, so I WOULD like this fixed in the Mars.2 release. So I do think it is marked correctly. Thanks. 

= = = = = = = = = = = =
Comment 55 Eclipse Genie CLA 2015-12-13 12:34:18 EST
New Gerrit change created: https://git.eclipse.org/r/62565
Comment 56 David Williams CLA 2015-12-13 12:36:07 EST
(In reply to Eclipse Genie from comment #55)
> New Gerrit change created: https://git.eclipse.org/r/62565

I'm sure obvious, but his is the back port to maintenance branch. 

Be sure to review carefully, as 'master' used a new Jobs API, so the back port was not a cherry-pick.
Comment 57 David Williams CLA 2015-12-29 17:48:34 EST
(In reply to David Williams from comment #56)
> (In reply to Eclipse Genie from comment #55)
> > New Gerrit change created: https://git.eclipse.org/r/62565
> 
> I'm sure obvious, but his is the back port to maintenance branch. 
> 
> Be sure to review carefully, as 'master' used a new Jobs API, so the back
> port was not a cherry-pick.

If not obvious, from Gerrit, I have change the job name to "Activity Identifier Update UI" and believe it is ready for re-review/merge.
Comment 58 Markus Keller CLA 2016-01-13 07:59:55 EST
Even if you have conflicts, it's still better to cherry-pick the original commit and maintain as much formatting etc. as possible. In this case, e.g. the non-UI Job name should also match the new name in master.

Fixed in R4_5_maintenance with http://git.eclipse.org/c/platform/eclipse.platform.ui.git/commit/?id=374c955735de619731b2a572a6996397dddadc6a
Comment 59 Markus Keller CLA 2016-02-04 13:41:25 EST
Verified in M20160203-1000 on OS X 10.11. GenerateIdentifiersTest also looks good (no excessive amount of Threads created any more.