Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 322778 - [DB] DatabaseMetaData.getTables hangs/throws when called during test tearDown
Summary: [DB] DatabaseMetaData.getTables hangs/throws when called during test tearDown
Status: CLOSED FIXED
Alias: None
Product: EMF
Classification: Modeling
Component: cdo.net4j.db (show other bugs)
Version: 4.0   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Stefan Winkler CLA
QA Contact: Eike Stepper CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-16 07:56 EDT by Caspar D. CLA
Modified: 2011-06-23 03:37 EDT (History)
2 users (show)

See Also:
stefan: review? (stepper)


Attachments
Stacktrace (6.12 KB, text/plain)
2010-08-16 07:57 EDT, Caspar D. CLA
no flags Details
patch (2.54 KB, patch)
2010-08-18 09:14 EDT, Stefan Winkler CLA
no flags Details | Diff
Patch v2 (2.90 KB, patch)
2010-08-24 14:03 EDT, Eike Stepper CLA
no flags Details | Diff
Test Runtime Analysis (10.27 KB, text/plain)
2010-08-25 05:37 EDT, Stefan Winkler CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Caspar D. CLA 2010-08-16 07:56:05 EDT
See description. This happens immediately on the first test when I run
AllTestsDBDerby. After a while the call throws an EmbedSQLException.
After the problem, the scenario appears not to continue with the other
tests... (Or maybe it's stuck again?)

Will attach stacktraces shortly.

This might be platform-related, or not -- I don't know. (I'm on Linux.)
Comment 1 Caspar D. CLA 2010-08-16 07:57:46 EDT
Created attachment 176667 [details]
Stacktrace
Comment 2 Stefan Winkler CLA 2010-08-16 08:03:01 EDT
mhhh ... I just recently made the Derby tests execute again (they were broken - see Bug 321108).

I don't know if this is platform-related, I can try it on Windows if you like.

One problem with Derby is that there's a very restrictive locking mechanism. 
If there's a locking timeout during teardown then there might be the chance of a connection leak somewhere.
Comment 3 Stefan Winkler CLA 2010-08-16 11:01:48 EDT
I can reproduce this in windows as well.
This seems to affect mainly the Net4jDBTests - I guess there's a deadlock problem or even a leaked connection somewhere in there...
Comment 4 Stefan Winkler CLA 2010-08-18 04:31:07 EDT
Just as a sidenote/reminder: to debug Derby locking problems, see Bug 276926 which points at some useful Derby logging mechanisms for locking problems.

In fact, add the following parameters to VM arguments of the debug config:
-Dderby.locks.monitor=true
-Dderby.locks.deadlockTrace=true
-Dderby.language.logStatementText=true
-Dderby.stream.error.file=c:/work/derby-cdo.log

This leads to the result that the Connection CREATE/INSERT/SELECT Testtable still holds a lock which is waited for by getTables.

As I have suspected, there's a connection leak in the Net4jDB test cases. 

connection = store.getConnection();
...
connection = null;

I have introduced a try...finally block with 

connection.commit();
connection.close();

No it works.

Patch comes in a minute. I'm just running the rest of the testsuite ...
Comment 5 Stefan Winkler CLA 2010-08-18 09:14:02 EDT
Created attachment 176888 [details]
patch

Corrects the connection leak in the Net4jDBTests.
Comment 6 Stefan Winkler CLA 2010-08-18 09:18:12 EDT
... note that this just solves the locking issue.

Derby has similar issues to MySQL and PostgreSQL (see Bug 322972 and Bug 323006). The problem is that these tests have been introduced mainly as a basis for the raw offline replication which (AFAIK) only uses H2. Since the mapping of Java types to DB types is different for each DBMS, we should have a look at the Net4j backends and look into correcting the type mappings. I doubt, however, that we can correct all of them so maybe we end up with disabling some of the failing tests for some of the databases ...
Comment 7 Caspar D. CLA 2010-08-19 05:43:27 EDT
(In reply to comment #6)

Stefan, thanks for the fix.

> The problem is that these tests have been introduced mainly as a basis
> for the raw offline replication 

If that's the case, then I think we should have something like
DBConfigs.supportsRawOfflineReplication(), analogous to 
hasAuditSupport() and hasBranchingSupport().

> which (AFAIK) only uses H2.

We have a feature that works with only one specific DBMS? Why would other
DBMS's not be able to support it? Eike?

> maybe we end up with disabling some of the failing tests for 
> some of the databases ...

I think including tests in the general test suites and then skipping them
in the individual test classes, is a dubious strategy -- one that we've unfortunately already embarked on, sometimes with comments such as 
"PSQL fails, too - need to investigate" (DBAnnotationsTest.java:74). Of
course we forget about this stuff, which makes some tests look like they 
"pass" while in fact they're skipped.

As for fixing the errors: what I don't get is, why do the respective
DBAdapters work fine for all other tests, but have type and
case-sensitivity problems for these new tests?
Comment 8 Caspar D. CLA 2010-08-20 05:30:49 EDT
As for the original problem: the freezing no longer occurs,
but on my platform the Derby tests are extreeeeemely slow, and regularly
encounter "java.lang.OutOfMemoryError: Java heap space", even 
with -Xmx1024m specified in the run config. Is there another resource
leak somewhere?

And can someone comment how long the full Derby audit test suite (888
tests) takes on Windows? On my Linux setup it's on the order of hours.
Comment 9 Stefan Winkler CLA 2010-08-20 08:42:50 EDT
(In reply to comment #8)
> As for the original problem: the freezing no longer occurs,
> but on my platform the Derby tests are extreeeeemely slow, and regularly
> encounter "java.lang.OutOfMemoryError: Java heap space", even 
> with -Xmx1024m specified in the run config. Is there another resource
> leak somewhere?

I'm running with -Xmx1536m and it works.
 
> And can someone comment how long the full Derby audit test suite (888
> tests) takes on Windows? On my Linux setup it's on the order of hours.

I can confirm that I'm about halfway through the tests after 2.5 hours. But as far as I can remember, Derby was always slow. I think the only reason why we included support for Derby in the first place was the fact that it was the only in-memory DB which was supported out of the box in Eclipse - so it was possible to run the examples directly after getting CDO from p2. 



Eike, can you comment on comment #6 and comment #7?
Is the raw replication feature (and therefore, the Net4jDBTests) something that has to be tested and fixed for any database? Or is raw replication h2 only?
Comment 10 Caspar D. CLA 2010-08-23 07:14:23 EDT
(In reply to comment #9)

> I'm running with -Xmx1536m and it works.

Noted. But this level of memory consumption doesn't seem right to me. All
objects used in a specific test should be GC'able after that test, so how
is it that the memory requirements build up to exceed a gigabyte?

> I can confirm that I'm about halfway through the tests after 2.5 hours. But as
> far as I can remember, Derby was always slow.

There's slow and there's unbearably slow. In 2.0 the ~500 Derby tests took 
about half an hour on my machine. In 3.0 we have ~900 tests but they
seem to take an order of magnitude more time.
Comment 11 Stefan Winkler CLA 2010-08-24 01:52:23 EDT
(In reply to comment #10)

> There's slow and there's unbearably slow. In 2.0 the ~500 Derby tests took 
> about half an hour on my machine. In 3.0 we have ~900 tests but they
> seem to take an order of magnitude more time.

Ok, I'll try to do some TPTP profiling. I'll try to get a CDO 2.0 workspace set up again to check with 2.0...
Comment 12 Eike Stepper CLA 2010-08-24 14:03:47 EDT
Created attachment 177353 [details]
Patch v2
Comment 13 Stefan Winkler CLA 2010-08-24 15:50:19 EDT
(In reply to comment #11)
> Ok, I'll try to do some TPTP profiling. I'll try to get a CDO 2.0 workspace set
> up again to check with 2.0...

Ok, I've run a rather simple test: I modified the Derby testsuite to just run the ComplexTests.class testcases. These run in about 980 seconds on my linux machine both with 2.0 and with HEAD. So there seems to be no performance decrease in general. My guess is that the additional tests are just more demanding and perhaps also require additional memory -- maybe in the Derby driver. And maybe this additional memory requirements also cause a loss in performance somewhere...

Or do you have a concrete example of a testcase which runs significantly slower with HEAD than with 2.0?
Comment 14 Stefan Winkler CLA 2010-08-25 05:37:30 EDT
Created attachment 177399 [details]
Test Runtime Analysis

Ok, just to be sure, I ran both testsuites overnight, exported the JUnit logs and compared them. On the left you see TestCase names, the second column is runtime with 2.0 (maintenance branch), the last one is runtime with HEAD. As you can see, most TestCases which are present in both branches have comparable runtimes (the actual differences might be statistical errors or changes in the testcase code).
The report shows that it is some of the newly added testcases that really cause the longer runtime. I don't know what the problem is there. But if you (or someone else) thinks, this is a performance issue for Derby that should be addressed, then please file a new Bug and add a pointer to this one.
Comment 15 Stefan Winkler CLA 2010-08-25 07:05:08 EDT
Committed to HEAD.
Comment 16 Caspar D. CLA 2010-09-02 00:55:15 EDT
Stefan,

Thanks very much for your detailed analysis, I really appreciate
it.

As you noted, your results make it clear that the
slow execution time of the Derby tests is attributable to the
new tests. In particular, SetFeatureTest, Net4jDBTest, and 
DBStoreTest seem to be very time consuming. I'll take a closer 
look at those myself to see if there's any way those can be 
made to execute a bit faster.
Comment 17 Eike Stepper CLA 2011-06-23 03:37:17 EDT
Available in R20110608-1407