This Bugzilla instance is deprecated, and most Eclipse projects now use GitHub or Eclipse GitLab. Please see the deprecation plan for details.
Bug 259060 - repeated DNFs on org.eclipse.jst.pagedesigner.tests
Summary: repeated DNFs on org.eclipse.jst.pagedesigner.tests
Status: NEW
Alias: None
Product: Java Server Faces
Classification: WebTools
Component: Core (show other bugs)
Version: 3.0.4   Edit
Hardware: PC Windows Vista
: P3 normal (vote)
Target Milestone: Future   Edit
Assignee: Raghunathan Srinivasan CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-17 01:49 EST by David Williams CLA
Modified: 2010-06-03 14:04 EDT (History)
4 users (show)

See Also:


Attachments
6 javacore "dump" files. (1.34 MB, application/octet-stream)
2008-12-17 01:49 EST, David Williams CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Williams CLA 2008-12-17 01:49:27 EST
Created attachment 120657 [details]
6 javacore "dump" files. 

Recent tests in 3.0 maintenance builds have consistently resulted in DNF for some JSF tests. 

The apparently pass locally with 512M heap, but fail on Linux with 728M heap. 

= = =

In the attached zips, 4 of the dumps happened "spontaneously" from running out of memory ... the other two I snagged while the process appeared 'hung' ... not sure those two reveal much ... I think at that point the test was just waiting with a dialog about "out of memory. we recommend you exit".
Comment 1 Raghunathan Srinivasan CLA 2008-12-17 09:47:28 EST
The same tests pass in the 3.1 builds. Are there any other difference in the setup between the two build system? We will continue to debug this issue.
Comment 2 David Williams CLA 2008-12-17 11:50:14 EST
(In reply to comment #1)
> ... Are there any other difference in the
> setup between the two build system? ...
> 

No, well ... except one test pre-reqs Ganymede and the other Galileo ... but the build system itself is the same. Perhaps a "fix" went into 3.0.4 but not carried over to 3.1 yet? It's also odd that it seems to pass sometimes. If there's more I can to help, let me know. 

Comment 3 David Williams CLA 2008-12-17 15:53:11 EST
Here's an oddity, in just built 
http://build.eclipse.org/webtools/committers/wtp-R3.0-M/20081217145642/M-3.0.4-20081217145642/


There is no "DNF" in the JUnit summary for that build, but looking at the logs, there's still lots of "out of memory" errors. Then, looking closer, I see there's the pagedesigner tests are not listed at all in the summary. 

I think what happened, is that the overall timelimit overall JUnit tests was hit so the whole JUnit cycle came to an abrupt end. This is  noticeable too in that the "total number of tests" (8584) is less in this run, than in others (8810). That is several subsequent suites (e.g. for JSDT were not ran due to this problem).




Comment 4 David Williams CLA 2008-12-17 16:07:19 EST
On another, "local" machine, I get similar DNF failures, but in one case, they all worked, except one:  

testMultiProject

which failed due to "out of memory". Just in case you wanted one test to focus on, that might be a good one? 


Comment 5 Debajit Adhikary CLA 2009-01-27 20:23:59 EST
Would it be possible to get access to the javacore dump files for the JSF test suites that failed? 

Test dump log: 
http://build.eclipse.org/webtools/committers/wtp-R3.0-M/20090127153754/M-3.0.4-20090127153754/testResults/consolelogs/testSysErrorLogs/org.eclipse.jst.pagedesigner.tests.AllTests.error.txt

Comment 6 David Williams CLA 2009-01-29 01:51:23 EST
I can get the dumps, but only if I get them within one hour, or so, of the build/test ... they are erased once the next one starts. I don't save them away anywhere. 

Was it that particular one? Or do you just want one more recent than the 6 I attached already? 

There does seem to me to be a real memory leak. It might be "because of the way the tests are ran", but a leak none the less. I say this since when I run the tests in my development environment, the memory just keeps going up and up, apparently with each one, but not sure that's literally true. Because this suite creates 110 projects, in quick succession, I wonder if there's some things done each import (e.g. build and validation) that never get a chance to really finish, because the next one starts up. Thus, each "job" building memory up more and more since nothing can finish. (build server is fast :) And, I think sometimes, it hits the limit, and sometimes not, depending on the semi-random method of letting some jobs run to completion, but not others. 

Once I saw 50 or 70 "jobs" created and displayed in the debugger, all waiting to finish something. Maybe that's normal, but I don't recall ever seeing that many. (Normally they are re-used at a faster pace than that) so there's maybe 10 or 15. That 70, btw, was on my "slow" developer machine ... the build server may stack up more, since 

I can think of a couple of approaches, to see if some simple "test hacks" could fix the issue. For example, before you import the next project, pause ... either sleep, or better yet, you can explicitly wait until some job finishes ... not sure which one, but the build job is typical to wait for. 

Another approach, even better in the long run, is to change the test so it doesn't have to create 110 projects! That's a lot. And, this test takes a long long time, compared to most. Ideally you could import one (or two :) projects, and then run all your tests on that one. I know that may not be possible, but thought it worth mentioning. 

If you still want a dump file, ping me and I'll keep an eye open for an opportunity. 
Comment 7 David Carver CLA 2009-04-28 22:18:45 EDT
This is going on 4 months now.   Is this planning to be addressed.  Failing tests or tests not completing shouldn't be targeted for Future.
Comment 8 Raghunathan Srinivasan CLA 2009-04-30 10:08:38 EDT
These tests consistently pass on our local machines and we have not been able to determine the cause for the random failures in the build environment. We planned to revisit this if we see the failures again. We will split the tests in to smaller units to see if that addresses the issue. I am setting the target to RC1 to do this work.
Comment 9 Raghunathan Srinivasan CLA 2009-09-03 14:11:56 EDT
We will split the tests and add them back to the build in the 3.1.2 timeframe.
Comment 10 David Williams CLA 2009-11-16 09:40:27 EST
These DNFs started to happen (again?) in 3.2 builds. 

Had they been removed from there? If so, I changed the way tests are ran (bug 295153) and might have re-enabled them. 

In the new scheme, to disable a test suite, you can rename the test.xml file to something like testHOLD.xml. 

If it has been enabled all along, but more DNFs just started happening, another thing that changed with the new test scheme, is that the order of the tests are now different. For example, if you re-use the same workspace from one test to another, it might larger than before. I doubt you re-use a workspace, but just thought I'd mention it, in case the order of your test suites might matter.
Comment 11 David Williams CLA 2009-11-16 10:06:31 EST
(In reply to comment #10)
> These DNFs started to happen (again?) in 3.2 builds. 
> 
> Had they been removed from there? If so, I changed the way tests are ran (bug
> 295153) and might have re-enabled them. 
> 

I looked, and see now these had been removed from the master list. Since we no longer have a master list, can you please disable them by renaming your 'test.xml' to 'testHOLD.xml'? You'll need to in both HEAD (3.2) and maintenance branch (3.1.x) if you have one. I haven't changed to the new builder in maintenance builds, but will soon.
Comment 12 David Williams CLA 2009-11-17 12:55:46 EST
I have (temporarily) hardcoded the exclusion of org.eclipse.jst.pagedesigner.tests until JSF team has time to disable it or remove it from build ... or fix! :)
Comment 13 Raghunathan Srinivasan CLA 2010-03-19 00:24:57 EDT
(In reply to comment #12)
> I have (temporarily) hardcoded the exclusion of
> org.eclipse.jst.pagedesigner.tests until JSF team has time to disable it or
> remove it from build ... or fix! :)

Please add the test back to the build. We would like to investigate any failure.
Comment 14 David Williams CLA 2010-03-19 12:30:34 EDT
(In reply to comment #13)
> (In reply to comment #12)

> 
> Please add the test back to the build. We would like to investigate any
> failure.

I have stopped excluding this test. Probably won't "show up" for its first run until Saturday, or so. 

Good luck.
Comment 15 Raghunathan Srinivasan CLA 2010-06-03 14:04:33 EDT
Mass update of Helios bugs