Community
Participate
Working Groups
The libra-indigo job is persistently hanging on the tests. This behavior is not observed when running the build locally. I want to have a thread dump to see the reason, but when I call: https://hudson.eclipse.org/hudson/threadDump It says that I have no permissions. How can I get the thread dump?
Let your job hang, let us know and we'll post the thread dump for you.
Another option is if the job is running on the master you can try starting it with the -Xdebug flag, then you should be able to connect with jdb from build. -M.
The build is currently hanging hudson-slave1. Please, take the thread dump. You may then kill the job to free the slot.
Here's what I'm seeing: Executor #4 for hudson-slave1 : executing libra-indigo #36 "Executor #4 for hudson-slave1 : executing libra-indigo #36" Id=59 Group=main WAITING on hudson.remoting.UserRequest@2947aa41 at java.lang.Object.wait(Native Method) - waiting on hudson.remoting.UserRequest@2947aa41 at java.lang.Object.wait(Object.java:485) at hudson.remoting.Request$1.get(Request.java:206) at hudson.remoting.Request$1.get(Request.java:172) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55) at hudson.Proc$RemoteProc.join(Proc.java:347) at hudson.Launcher$ProcStarter.join(Launcher.java:280) at hudson.tasks.Maven.perform(Maven.java:259) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:603) at hudson.model.Build$RunnerImpl.build(Build.java:172) at hudson.model.Build$RunnerImpl.doRun(Build.java:137) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:417) at hudson.model.Run.run(Run.java:1337) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:140) SCM polling for hudson.model.FreeStyleProject@67d77351[libra-indigo] "SCM polling for hudson.model.FreeStyleProject@67d77351[libra-indigo]" Id=182 Group=main WAITING on hudson.slaves.WorkspaceList@38b1a665 at java.lang.Object.wait(Native Method) - waiting on hudson.slaves.WorkspaceList@38b1a665 at java.lang.Object.wait(Object.java:485) at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:173) at hudson.model.AbstractProject.poll(AbstractProject.java:1230) at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:425) at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:454) at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Number of locked synchronizers = 1 - java.util.concurrent.locks.ReentrantLock$NonfairSync@175c7b86
Actually, just about every running job has the same thread: "Executor #0 for hudson-slave1 : executing cbi-papyrus-0.7-nightly #1623" Id=55 Group=main WAITING on hudson.remoting.UserRequest@689b9911 at java.lang.Object.wait(Native Method) - waiting on hudson.remoting.UserRequest@689b9911 at java.lang.Object.wait(Object.java:485) at hudson.remoting.Request$1.get(Request.java:206) at hudson.remoting.Request$1.get(Request.java:172) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55) at hudson.Proc$RemoteProc.join(Proc.java:347) at hudson.Launcher$ProcStarter.join(Launcher.java:280) at hudson.tasks.Ant.perform(Ant.java:216) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:603) at hudson.model.Build$RunnerImpl.build(Build.java:172) at hudson.model.Build$RunnerImpl.doRun(Build.java:137) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:417) at hudson.model.Run.run(Run.java:1337) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:140)
On slave1 I see this: java -Dbuckminster.output.root=/opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/buckminster.output -Dbuckminster.temp.root=/opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/buckminster.temp -Dhttp.proxyHost=proxy.eclipse.org -Dhttp.proxyPort=9898 -Dhttps.proxyHost=proxy.eclipse.org -Dhttps.proxyPort=9898 -Dhttp.nonProxyHosts="*.eclipse.org" -Dhttps.nonProxyHosts="*.eclipse.org" -Dcheckout.location=/opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master -Dreference.repository=https://hudson.eclipse.org/hudson/job/buckminster-mdt-ocl-core-3.2-master/lastSuccessfulBuild/artifact/MDT-OCL.p2.repository/ -XX:PermSize=64M -XX:MaxPermSize=128M -jar /shared/common/buckminster-3.7/plugins/org.eclipse.equinox.launcher_1.2.0.v20110502.jar -application org.eclipse.buckminster.cmdline.headless -data /opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/buildroot/buckminster.workspace --loglevel info -S /opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/org.eclipse.ocl.git/releng/org.eclipse.ocl.releng.buckminster/scripts/ocl-buckminster.script: stdout copier "java -Dbuckminster.output.root=/opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/buckminster.output -Dbuckminster.temp.root=/opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/buckminster.temp -Dhttp.proxyHost=proxy.eclipse.org -Dhttp.proxyPort=9898 -Dhttps.proxyHost=proxy.eclipse.org -Dhttps.proxyPort=9898 -Dhttp.nonProxyHosts="*.eclipse.org" -Dhttps.nonProxyHosts="*.eclipse.org" -Dcheckout.location=/opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master -Dreference.repository=https://hudson.eclipse.org/hudson/job/buckminster-mdt-ocl-core-3.2-master/lastSuccessfulBuild/artifact/MDT-OCL.p2.repository/ -XX:PermSize=64M -XX:MaxPermSize=128M -jar /shared/common/buckminster-3.7/plugins/org.eclipse.equinox.launcher_1.2.0.v20110502.jar -application org.eclipse.buckminster.cmdline.headless -data /opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/buildroot/buckminster.workspace --loglevel info -S /opt/users/hudsonbuild/workspace/buckminster-mdt-ocl-core-3.2-master/org.eclipse.ocl.git/releng/org.eclipse.ocl.releng.buckminster/scripts/ocl-buckminster.script: stdout copier" Id=530663 Group=main RUNNABLE (in native) at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:199) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) - locked java.io.BufferedInputStream@7a7c82c7 at java.io.FilterInputStream.read(FilterInputStream.java:90) at hudson.util.StreamCopyThread.run(StreamCopyThread.java:60)
Thanks, Denis. However, the thread dump of the job process does not show anything really useful. The tests are executed in a separate java process and that thread dump of that process is the interesting one. I managed to run the tests with the -Xdebug option and connect remotely from my IDE. I see that there is a thread hanging on the XML Validator that is blocked on reading an XML Schema from the Internet. Perhaps, this is not possible anymore, after some restrictions for accessing Internet were recently applied to the Hudson server. I will try to work this around in the test.
Created attachment 200346 [details] hudson thread dump Thread-8 is the problematic one hanging on socket read.
I was able to resolve the problem in the tests. However, the remote debug session interrupted unexpectedly and it looks like there are still some file locks in the workspace. This prevents successful execution of the libra-indigo job on 'master'. I also cannot clean the workspace on 'master' from the web front-end - it says that some ..../.nfsxxxxxx file cannot be deleted. The job works successfully on hudson-slave1 now. Could you clean the workspace of the libra-indigo job on 'master'?
> I see that there is a thread hanging on the XML Validator that is blocked > on reading an XML Schema from the Internet. Perhaps, this is not possible > anymore, after some restrictions for accessing Internet were recently applied > to the Hudson server. We haven't applied any further restrictions on Internet access to the Hudson servers, so I'd like to investigate this. What URL are you trying to access? I've wiped the job's workspace.
I am not sure about the exact URL, but the following one is a good guess: http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd In the thread dump I recognize the XML validator that is triggered by the project builder. The validator needs to read the corresponding XML schema in order to validate the XML file. In WTP we have the so called XML catalog that contains the popular XML schema locally and avoid the validator accessing them from Internet. I resolved the issue by adding the XML catalog plugins as dependency to the test.
If your tests are run from a separate Java process, you "may" need to redefine the proxy options. We've defined them in the Shell and in Hudson, but we've seen instances where this is not enough. We've set JAVA_ARGS and JVM-OPTS to: -Dhttp.proxyHost=proxy.eclipse.org -Dhttp.proxyPort=9898 -Dhttps.proxyHost=proxy.eclipse.org -Dhttps.proxyPort=9898 -Dhttp.nonProxyHosts="*.eclipse.org|172.30.206.*" -Dhttps.nonProxyHosts="*.eclipse.org" -Dftp.proxyHost=proxy.eclipse.org -Dftp.proxyPort=9898 -Dftp.nonProxyHosts="*.eclipse.org"
Do you have the same proxy settings in the Maven's settings.xml? Having them there should make all processes executed by Maven to have these proxy configuration.
Yes, this is our maven settings.xml file: <settings> <proxies> <proxy> <id>httpproxy</id> <active>true</active> <protocol>http</protocol> <host>proxy.eclipse.org</host> <port>9898</port> <nonProxyHosts>*.eclipse.org|172.30.206.*</nonProxyHosts> </proxy> <proxy> <id>httpsproxy</id> <active>true</active> <protocol>https</protocol> <host>proxy.eclipse.org</host> <port>9898</port> <nonProxyHosts>*.eclipse.org|172.30.206.*</nonProxyHosts> </proxy> </proxies> </settings>
OK, great. Next time if I have the same problem and I cannot resolve it in the tests, I will try adding these proxy settings explicitly to the process. I am closing this bug. Thanks for the help.