Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 353015

Summary: Getting a thread dump of a hanging job?
Product: Community Reporter: Kaloyan Raev <kaloyan>
Component: CI-JenkinsAssignee: Eclipse Webmaster <webmaster>
Status: CLOSED FIXED QA Contact:
Severity: normal    
Priority: P3    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows Vista   
Whiteboard:
Attachments:
Description Flags
hudson thread dump none

Description Kaloyan Raev CLA 2011-07-25 11:38:46 EDT
The libra-indigo job is persistently hanging on the tests. This behavior is not observed when running the build locally. 

I want to have a thread dump to see the reason, but when I call:
https://hudson.eclipse.org/hudson/threadDump

It says that I have no permissions.

How can I get the thread dump?
Comment 1 Denis Roy CLA 2011-07-25 13:11:53 EDT
Let your job hang, let us know and we'll post the thread dump for you.
Comment 2 Eclipse Webmaster CLA 2011-07-25 13:27:10 EDT
Another option is if the job is running on the master you can try starting it with the -Xdebug flag, then you should be able to connect with jdb from build.

-M.
Comment 3 Kaloyan Raev CLA 2011-07-25 15:10:38 EDT
The build is currently hanging hudson-slave1. 
Please, take the thread dump. You may then kill the job to free the slot.
Comment 4 Denis Roy CLA 2011-07-25 16:18:18 EDT
Here's what I'm seeing:

Executor #4 for hudson-slave1 : executing libra-indigo #36

"Executor #4 for hudson-slave1 : executing libra-indigo #36" Id=59 Group=main WAITING on hudson.remoting.UserRequest@2947aa41
	at java.lang.Object.wait(Native Method)
	-  waiting on hudson.remoting.UserRequest@2947aa41
	at java.lang.Object.wait(Object.java:485)
	at hudson.remoting.Request$1.get(Request.java:206)
	at hudson.remoting.Request$1.get(Request.java:172)
	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
	at hudson.Proc$RemoteProc.join(Proc.java:347)
	at hudson.Launcher$ProcStarter.join(Launcher.java:280)
	at hudson.tasks.Maven.perform(Maven.java:259)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
	at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:603)
	at hudson.model.Build$RunnerImpl.build(Build.java:172)
	at hudson.model.Build$RunnerImpl.doRun(Build.java:137)
	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:417)
	at hudson.model.Run.run(Run.java:1337)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:140)



SCM polling for hudson.model.FreeStyleProject@67d77351[libra-indigo]

"SCM polling for hudson.model.FreeStyleProject@67d77351[libra-indigo]" Id=182 Group=main WAITING on hudson.slaves.WorkspaceList@38b1a665
	at java.lang.Object.wait(Native Method)
	-  waiting on hudson.slaves.WorkspaceList@38b1a665
	at java.lang.Object.wait(Object.java:485)
	at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:173)
	at hudson.model.AbstractProject.poll(AbstractProject.java:1230)
	at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:425)
	at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:454)
	at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)

	Number of locked synchronizers = 1
	- java.util.concurrent.locks.ReentrantLock$NonfairSync@175c7b86
Comment 5 Denis Roy CLA 2011-07-25 16:19:22 EDT
Actually, just about every running job has the same thread:

"Executor #0 for hudson-slave1 : executing cbi-papyrus-0.7-nightly #1623" Id=55 Group=main WAITING on hudson.remoting.UserRequest@689b9911
	at java.lang.Object.wait(Native Method)
	-  waiting on hudson.remoting.UserRequest@689b9911
	at java.lang.Object.wait(Object.java:485)
	at hudson.remoting.Request$1.get(Request.java:206)
	at hudson.remoting.Request$1.get(Request.java:172)
	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
	at hudson.Proc$RemoteProc.join(Proc.java:347)
	at hudson.Launcher$ProcStarter.join(Launcher.java:280)
	at hudson.tasks.Ant.perform(Ant.java:216)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
	at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:603)
	at hudson.model.Build$RunnerImpl.build(Build.java:172)
	at hudson.model.Build$RunnerImpl.doRun(Build.java:137)
	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:417)
	at hudson.model.Run.run(Run.java:1337)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:140)
Comment 6 Denis Roy CLA 2011-07-25 16:21:54 EDT
On slave1 I see this:

java -Dbuckminster.output.root=/opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/buckminster.output -Dbuckminster.temp.root=/opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/buckminster.temp -Dhttp.proxyHost=proxy.eclipse.org -Dhttp.proxyPort=9898 -Dhttps.proxyHost=proxy.eclipse.org -Dhttps.proxyPort=9898 -Dhttp.nonProxyHosts="*.eclipse.org" -Dhttps.nonProxyHosts="*.eclipse.org" -Dcheckout.location=/opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master -Dreference.repository=https://hudson.eclipse.org/hudson/job/buckminster-mdt-ocl-core-3.2-master/lastSuccessfulBuild/artifact/MDT-OCL.p2.repository/ -XX:PermSize=64M -XX:MaxPermSize=128M -jar /shared/common/buckminster-3.7/plugins/org.eclipse.equinox.launcher_1.2.0.v20110502.jar -application org.eclipse.buckminster.cmdline.headless -data /opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/buildroot/buckminster.workspace --loglevel info -S /opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/org.eclipse.ocl.git/releng/org.eclipse.ocl.releng.buckminster/scripts/ocl-buckminster.script: stdout copier

"java -Dbuckminster.output.root=/opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/buckminster.output -Dbuckminster.temp.root=/opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/buckminster.temp -Dhttp.proxyHost=proxy.eclipse.org -Dhttp.proxyPort=9898 -Dhttps.proxyHost=proxy.eclipse.org -Dhttps.proxyPort=9898 -Dhttp.nonProxyHosts="*.eclipse.org" -Dhttps.nonProxyHosts="*.eclipse.org" -Dcheckout.location=/opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master -Dreference.repository=https://hudson.eclipse.org/hudson/job/buckminster-mdt-ocl-core-3.2-master/lastSuccessfulBuild/artifact/MDT-OCL.p2.repository/ -XX:PermSize=64M -XX:MaxPermSize=128M -jar /shared/common/buckminster-3.7/plugins/org.eclipse.equinox.launcher_1.2.0.v20110502.jar -application org.eclipse.buckminster.cmdline.headless -data /opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/buildroot/buckminster.workspace --loglevel info -S /opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/org.eclipse.ocl.git/releng/org.eclipse.ocl.releng.buckminster/scripts/ocl-buckminster.script: stdout copier" Id=530663 Group=main RUNNABLE (in native)
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:199)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
	-  locked java.io.BufferedInputStream@7a7c82c7
	at java.io.FilterInputStream.read(FilterInputStream.java:90)
	at hudson.util.StreamCopyThread.run(StreamCopyThread.java:60)
Comment 7 Kaloyan Raev CLA 2011-07-26 06:28:37 EDT
Thanks, Denis. However, the thread dump of the job process does not show anything really useful. The tests are executed in a separate java process and that thread dump of that process is the interesting one. 

I managed to run the tests with the -Xdebug option and connect remotely from my IDE. I see that there is a thread hanging on the XML Validator that is blocked on reading an XML Schema from the Internet. Perhaps, this is not possible anymore, after some restrictions for accessing Internet were recently applied to the Hudson server. 

I will try to work this around in the test.
Comment 8 Kaloyan Raev CLA 2011-07-26 06:30:13 EDT
Created attachment 200346 [details]
hudson thread dump

Thread-8 is the problematic one hanging on socket read.
Comment 9 Kaloyan Raev CLA 2011-07-26 07:16:51 EDT
I was able to resolve the problem in the tests. 

However, the remote debug session interrupted unexpectedly and it looks like there are still some file locks in the workspace. This prevents successful execution of the libra-indigo job on 'master'. I also cannot clean the workspace on 'master' from the web front-end - it says that some ..../.nfsxxxxxx file cannot be deleted. The job works successfully on hudson-slave1 now. 

Could you clean the workspace of the libra-indigo job on 'master'?
Comment 10 Denis Roy CLA 2011-07-26 08:27:37 EDT
> I see that there is a thread hanging on the XML Validator that is blocked
> on reading an XML Schema from the Internet. Perhaps, this is not possible
> anymore, after some restrictions for accessing Internet were recently applied
> to the Hudson server. 

We haven't applied any further restrictions on Internet access to the Hudson servers, so I'd like to investigate this.  What URL are you trying to access?

I've wiped the job's workspace.
Comment 11 Kaloyan Raev CLA 2011-07-26 08:35:15 EDT
I am not sure about the exact URL, but the following one is a good guess:

http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd

In the thread dump I recognize the XML validator that is triggered by the project builder. The validator needs to read the corresponding XML schema in order to validate the XML file. In WTP we have the so called XML catalog that contains the popular XML schema locally and avoid the validator accessing them from Internet. I resolved the issue by adding the XML catalog plugins as dependency to the test.
Comment 12 Denis Roy CLA 2011-07-26 08:46:03 EDT
If your tests are run from a separate Java process, you "may" need to redefine the proxy options.  We've defined them in the Shell and in Hudson, but we've seen instances where this is not enough.

We've set JAVA_ARGS and JVM-OPTS to:
-Dhttp.proxyHost=proxy.eclipse.org -Dhttp.proxyPort=9898 -Dhttps.proxyHost=proxy.eclipse.org -Dhttps.proxyPort=9898 -Dhttp.nonProxyHosts="*.eclipse.org|172.30.206.*" -Dhttps.nonProxyHosts="*.eclipse.org" -Dftp.proxyHost=proxy.eclipse.org -Dftp.proxyPort=9898 -Dftp.nonProxyHosts="*.eclipse.org"
Comment 13 Kaloyan Raev CLA 2011-07-26 09:02:19 EDT
Do you have the same proxy settings in the Maven's settings.xml?

Having them there should make all processes executed by Maven to have these proxy configuration.
Comment 14 Denis Roy CLA 2011-07-26 09:09:21 EDT
Yes, this is our maven settings.xml file:

<settings>
  <proxies>
    <proxy>
      <id>httpproxy</id>
      <active>true</active>
      <protocol>http</protocol>
      <host>proxy.eclipse.org</host>
      <port>9898</port>
      <nonProxyHosts>*.eclipse.org|172.30.206.*</nonProxyHosts>
    </proxy>
    <proxy>
      <id>httpsproxy</id>
      <active>true</active>
      <protocol>https</protocol>
      <host>proxy.eclipse.org</host>
      <port>9898</port>
      <nonProxyHosts>*.eclipse.org|172.30.206.*</nonProxyHosts>
    </proxy>
  </proxies>
</settings>
Comment 15 Kaloyan Raev CLA 2011-07-26 09:22:45 EDT
OK, great. 
Next time if I have the same problem and I cannot resolve it in the tests, I will try adding these proxy settings explicitly to the process. 

I am closing this bug. 

Thanks for the help.