Community
Participate
Working Groups
Build: TPTP-4.2.1.1-200611271752 Follow the steps *exactly* as outlined below: Make sure that a standalone installation of the Agent Controller is configured and started. 1) Extract the zip file attached to a directory. It should contain three files: HelloWorld.class, Classpath.class, and ForClasspath.class. 2) Open the profile launch configuration and create a launch configuration of type "External Java Application". 3) Switch to the Main tab and browse to the "Classpath.class" file 4) Switch to the Monitor tab and de-select everything but "Execution Time Analysis" 5) Click apply 6) Repeat step 2-5 but select "HelloWorld.class" in step 3) 7) Keep alternating between launching Classpath and Helloworld. The launch progress will eventually get stuck at 78% and it will not respond until it is restarted. This is blocking me from running the automated launch test suites. I can consistenly reproduce this problem on my machine.
Created attachment 54719 [details] Test classes
I'm using IBM JRE 1.5: java version "1.5.0" Java(TM) 2 Runtime Environment, Standard Edition (build pwi32dev-20060511 (SR2)) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223-2006050 4 (JIT enabled) J9VM - 20060501_06428_lHdSMR JIT - 20060428_1800_r8 GC - 20060501_AA) JCL - 20060511a
Andy - please take a look immediately so that an assessment can be made for 4.2.1.1.
I am unable to reproduce this problem. I ran about 50 iterations of these tests without any problems. I tried both with a 4.3 TPTP client and a 4.2.1.1 client. My JVM is slightly different than yours. My JVM is: java version "1.5.0" Java(TM) 2 Runtime Environment, Standard Edition (build pwi32devifx-20060124) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-2006 0124 (JIT enabled) J9VM - 20051027_03723_lHdSMR JIT - 20051027_1437_r8 GC - 20051020_AA) JCL - 20060120 I believe the JVM you are using is one that Paul Slauenwhite was using for a problem he reported last week. I wasn't able to reproduce that problem either (nor was I able to find a place to download that JVM). Can you try this with a different JVM? Kevin says the automated test suites worked for him with 4.2.1.1 on an Itanium-based system.
I suspect that this is a concurrency problem that can't consistently be reproduced. Nevertheless, I can easily reproduce this problem when I alternate between launching the two applications. Navid too has hit similar problems. I will try this with a different JVM and let you know the results shortly.
Created attachment 54747 [details] The Agent Controller log file Here's the agent controller log file in case it's needed.
I can reproduce this with Sun JRE: java version "1.5.0_06" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05) Java HotSpot(TM) Client VM (build 1.5.0_06-b05, mixed mode) Btw, I have a hyperthreaded machine.
I can reproduce this after the third launch. Make sure that you perform the process launches quickly using the drop down menu of the profile toolbar item. Here's the order: HelloWorld, Classpath, HelloWorld (this process hangs).
I've come across this problem before. Note that someone has also mentioned this problem recently in the newsgroup. See the following entry posted on Nov 27th by Michael Sachs: Profiler - Launching stops at 78%
OK, I was waiting for one agent to terminate before starting the next. When I try it fast, I can reproduce the problem. The thing that happens is that an exception occurs inside ipcStopFlushing. We have an exception handler that catches the error and keeps this from manifesting itself as a crash, but there's a thread waiting for a response to the StopAgentDataFlush message. This is very probably caused by the first set of fixes we put into the 4.2.1.1 branch for memory leaks. I don't think it will be easy to fix. I'm don't really know why the exception is occurring in the ipcStopFlushing routine.
One more bit of information. I still can't reproduce this on my laptop (which is the only place I could run the IBM JVM). The machine I reproduced it on has a Pentium 4 with hyperthreading technology. So maybe the way that changes the timing is the key factor.
Ali - From what Andy has learned so far, it looks like the same issue would exist in 4.3. What is the difference in your 4.2.1.1 testing and 4.3 testing? (Was the hyperthreaded system used only in 4.2.1.1 testing?)
Further testing seems to indicate that this doesn't actually depend upon the speed with which you invoke the profiling runs. I can reproduce this on my hyperthreading box even if I pause between runs. To verify that you are seeing the same problem that I am, look in your "bin" directory immediately after a crash and see if it contains a file called "tptpParseError.log". I would expect that it does and that this file would contain a line that looks something like this: 11/29/06 16:08:24 "C:\Eclipse\Builds\AC4211_1127\bin\ACServer.exe") Unexpected exception occurred while parsing Cmd in message "<Cmd src="100" dest="127" ctxt="1281"><stopDataFlush iid="org.eclipse.tptp.legacy"></stopDataFlush></Cmd>" As it turns out, the expection handler for our parser catches a lot more than the parsing errors it was intended to catch, and it records what it has caught in this relatively obscure file. It turns out that the way the problem manifests itself is the result of the fact that we are waiting for a response to the stopDataFlush command. I don't think we need to wait, but we are waiting. If I take out the line that waits, then the exception handler masks the problem (which is not to say this fixes the problem). I've made this change in my sandbox and verified that it hides this problem (though I havent' done any testing to verify that it doesn't introduce something else). So, while it is not anything I would be proud of, I think we can make this problem "go away", which might be OK so long as we remember this and actually fix it in an upcoming release. The other troubling thing about this is that it seems like it should happen in 4.3 as well. I don't see any reason it wouldn't.
It does happen with 4.3 on my system. It also doesn't seem to depend on alternating between the two classes. I was able to reproduce it just by repeatedly (not quickly) launching the Classpath.class run.
I was using my laptop during 4.3 testing to launch the test suites that detected this problem. I had encountered this problem manually on my desktop before but I never had any consistent way of reproducing it. I was hoping that the reliability patches that went in during 4.3 had solved this problem but this doesn't seem to be the case. What puzzles me is why the Agent Controller just completely stops responding after the error. It always needs to be restarted after the error is encountered.
I hit this problem on _first_ try with the TPTP-4.2.1.1-200611271752A driver installed on my desktop! I'm using an IBM JVM 1.5.
This is why we need to stress test the Agent Controller. I believe most of these problems can be flushed out with proper stress test cases. See bug#160940
Created attachment 54849 [details] Proposed fix for this problem This attachment contains the source changes for the fix to this problem. This patch is based on the 4.2.1.1 code branch, but I think it can also be applied to the 4.3 based code. Since the change may not be approved for either the 4.3 or 4.2.1.1 releases, I'm putting the code here to be accessed when we are ready to put it in a future release.
this is really unfortunate. In reality both 4.3 and 4.2.1.1 are already closed up. The memory improvements were a big step forward and we have to accept that as the line fo those releases. It seems we need to get this into the 4.2.2 and it's sister 4.3 stream asap, and for sure in head so it can be throughly tested.
I found that this scenario and others works fine with AC from TPTP 4.2.1 when using both TPTP 4.3 and TPTP 4.2.1.1 clients. I don't know what was fixed in TPTP 4.2.1.1 regarding AC, but we could probably mention in the Readme, that if the user encounters one of the problems described in this bug, to try to use the AC from TPTP 4.2.1.
I've checked a fix for this into HEAD. It will be available in 4.4 builds. Also, Marius is right. This bug is not present in 4.2.1 or earlier releases. The fixes in 4.2.1.1 and 4.3 are mostly stability and memory leak fixes. There are significant improvements in those areas, but if someone is running into this problem a lot, the older releases might be useful.
*** Bug 167747 has been marked as a duplicate of this bug. ***
This fix is now in the 4.2.2 code stream also
As of TPTP 4.6.0, TPTP is in maintenance mode and focusing on improving quality by resolving relevant enhancements/defects and increasing test coverage through test creation, automation, Build Verification Tests (BVTs), and expanded run-time execution. As part of the TPTP Bugzilla housecleaning process (see http://wiki.eclipse.org/Bugzilla_Housecleaning_Processes), this enhancement/defect is verified/closed by the Project Lead since this enhancement/defect has been resolved and unverified for more than 1 year and considered to be fixed. If this enhancement/defect is still unresolved and reproducible in the latest TPTP release (http://www.eclipse.org/tptp/home/downloads/), please re-open.