Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 318998

Summary: [DB] Support queryXRefs()
Product: [Modeling] EMF Reporter: Eike Stepper <stepper>
Component: cdo.dbAssignee: Stefan Winkler <stefan>
Status: CLOSED FIXED QA Contact: Eike Stepper <stepper>
Severity: enhancement    
Priority: P3 CC: erwin, saulius.tvarijonas, vroldanbet
Version: 4.0Keywords: noteworthy
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard: Power to the People
Bug Depends on:    
Bug Blocks: 300149    
Attachments:
Description Flags
draft
none
patch-v1
none
Patch v2 - ready to be committed
none
Excludes derived references
stepper: iplog+
Fixes xref sql query.
stepper: iplog+
Testcase for queryXRefs
stepper: iplog+
Patch combined - for future reference
none
Patch v4 - test 320690 is failing
none
Patch v5 - for future reference none

Description Eike Stepper CLA 2010-07-06 08:20:24 EDT
See bug 300149
Comment 1 Stefan Winkler CLA 2010-07-11 11:42:02 EDT
Created attachment 173979 [details]
draft

first draft of infrastructure - not finished yet

I came across two issues I need clarified first.
a.) what about feature-maps? any idea how XRefs and featuremaps could be interrelated? (I suppose, currently we don't support that, but I'm uncertain whether feature-maps could include potential XRefs)

b.) I'm not sure how to formulate the SQL Query:

Approach 1: make one SQL Query for each individual target ID
"SELECT cdo_id from foobar where foobar.ref = 123"
"SELECT cdo_id from foobar where foobar.ref = 456"

Pro: Simple
Con: Slow

Slow because I query objects which simply cannot be in that field, because they do not conform to the feature specification (e.g., foobar.ref may in no case reference OID456 because it's of another type than OID123).
-> Optimization A: query eClass of each target ID and only query those target IDs which could match

Con: Has to be done for each candidate source reference


Approach 2: make one SQL Query for all target IDs
"SELECT cdo_id from foobar where foobar.ref in (123,456)"
Pro: only one SQL statement gets issued
Con: does not scale - there's a limit to the length of SQL statements. If there's a large amount of target IDs, the SQL might get too long.

-> Optimization A could be applied here as well, so that the candidate list gets shorter.
Con: Still does not scale ultimately


Approach 3: make several SQL queries of type 2 but regard the max length of SQL statements
"SELECT cdo_id from foobar where foobar.ref in (123,...,...,..... < awful lot of stuff here >)"
"SELECT cdo_id from foobar where foobar.ref in (456,...,...,..... < more stuff here >)"

Pro: more scalable than alternative 2 and faster than alternative 1
Con: complex, more implementation effort

Eike, do you have another idea, maybe? Or do you prefer the simple solution (1), perhaps with optimization A?
Comment 2 Eike Stepper CLA 2010-07-12 04:11:39 EDT
(In reply to comment #1)
> Approach 2: make one SQL Query for all target IDs
> "SELECT cdo_id from foobar where foobar.ref in (123,456)"
> Pro: only one SQL statement gets issued
> Con: does not scale - there's a limit to the length of SQL statements. If
> there's a large amount of target IDs, the SQL might get too long.

That's what I had in mind.

> -> Optimization A could be applied here as well, so that the candidate list
> gets shorter.

What was Optimization A?

> Con: Still does not scale ultimately
> 
> 
> Approach 3: make several SQL queries of type 2 but regard the max length of SQL
> statements
> "SELECT cdo_id from foobar where foobar.ref in (123,...,...,..... < awful lot
> of stuff here >)"
> "SELECT cdo_id from foobar where foobar.ref in (456,...,...,..... < more stuff
> here >)"

This is something we can consider if/when someone complains. I expect the target set not to get too large in typical scenarios.
Comment 3 Eike Stepper CLA 2010-07-12 04:12:56 EDT
(In reply to comment #2)
> What was Optimization A?

Oh, I found it! I don't think that we need that right now.
Comment 4 Stefan Winkler CLA 2010-07-13 08:23:02 EDT
Created attachment 174133 [details]
patch-v1

Ok, implemented as agreed.

The implementation passes the one XRef test currently in existence.
Needs to be tested further. (e.g. the case of isMany() source references is not tested at all, currently).
Comment 5 Eike Stepper CLA 2010-07-13 11:48:06 EDT
Created attachment 174168 [details]
Patch v2 - ready to be committed
Comment 6 Stefan Winkler CLA 2010-07-14 10:30:24 EDT
Committed to HEAD
Comment 7 Erwin Betschart CLA 2010-07-21 04:55:40 EDT
Created attachment 174821 [details]
Excludes derived references
Comment 8 Erwin Betschart CLA 2010-07-21 04:57:21 EDT
Please reopen this bug.

Attached a patch which excludes derived features from the xref check.

I confirm to
1) The number of lines that you changed is smaller than 250.
2) You are the only author of these changed lines.
3) You apply the EPL to these changed lines.

for attachment 174821 [details].
Comment 9 Erwin Betschart CLA 2010-07-21 05:22:54 EDT
Created attachment 174824 [details]
Fixes xref sql query.

AbstractListTableMapping.queryXRefs(...) causes some problems:

Original sql query:
SELECT l_t.cdo_source, l_t.cdo_value, l_t.cdo_idx FROM base_whitepage_LayerCondition_layers_listl_t, base_whitepage_LayerConditiona_t WHERE  AND a_t.cdo_id=l_t.cdo_source AND a_t.cdo_version=l_t.cdo_version AND a_t.cdo_branch=l_t.cdo_branch AND cdo_branch=0 AND (cdo_revised=0) AND cdo_value IN (4572)

... FROM base_whitepage_LayerCondition_layers_listl_t ...
should be 
... FROM base_whitepage_LayerCondition_layers_list AS l_t ...

Query is then:
SELECT l_t.cdo_source, l_t.cdo_value, l_t.cdo_idx FROM base_whitepage_LayerCondition_layers_list AS l_t, base_whitepage_LayerCondition AS a_t WHERE  AND a_t.cdo_id=l_t.cdo_source AND a_t.cdo_version=l_t.cdo_version AND a_t.cdo_branch=l_t.cdo_branch AND cdo_branch=0 AND (cdo_revised=0) AND cdo_value IN (4572)

... WHERE  AND a_t.cdo...
should be 
... WHERE some clause not AND
is due to listJoin starting with an AND. Therefore changed order of mainTableWhere and listJoin.

Resulting query:
SELECT l_t.cdo_source, l_t.cdo_value, l_t.cdo_idx FROM base_whitepage_LayerCondition_layers_list AS l_t, base_whitepage_LayerCondition AS a_t WHERE cdo_branch=0 AND (cdo_revised=0) AND a_t.cdo_id=l_t.cdo_source AND a_t.cdo_version=l_t.cdo_version AND a_t.cdo_branch=l_t.cdo_branch AND cdo_value IN (4572)

This causes:
org.h2.jdbc.JdbcSQLException: Ambiguous column name CDO_BRANCH; SQL statement:

Changing:
builder.append(mainTableWhere);
to:
builder.append("a_t." + mainTableWhere);
gives working query:
SELECT l_t.cdo_source, l_t.cdo_value, l_t.cdo_idx FROM base_whitepage_LayerCondition_layers_list AS l_t, base_whitepage_LayerCondition AS a_t WHERE a_t.cdo_branch=0 AND (cdo_revised=0) AND a_t.cdo_id=l_t.cdo_source AND a_t.cdo_version=l_t.cdo_version AND a_t.cdo_branch=l_t.cdo_branch AND cdo_value IN (4572)
Comment 10 Erwin Betschart CLA 2010-07-21 05:24:01 EDT
Attached a patch which fixes the sql query in queryXRefs().

I confirm to
1) The number of lines that you changed is smaller than 250.
2) You are the only author of these changed lines.
3) You apply the EPL to these changed lines.

for attachment 174824 [details].
Comment 11 Stefan Winkler CLA 2010-07-21 05:47:05 EDT
Reopened.

Re: Derived References

I think the better way would be to exclude derived references from the calculation of the candidates in the first place. Else we might get inconsistent behaviour in between different store implementations.


Re: SQL
you are right, this is an error. A testcase would be nice, because apparently this error is in there because the case of isMany()-Xref-candidates is not tested.
Comment 12 Erwin Betschart CLA 2010-07-23 02:50:14 EDT
I'll try to provide a testcase may take some time...

I talked with Eike about the problem with derived / transient references and he said that they should be excluded from the given context in which the search for cross references is done.
Comment 13 Erwin Betschart CLA 2010-07-23 11:06:53 EDT
Created attachment 175075 [details]
Testcase for queryXRefs
Comment 14 Erwin Betschart CLA 2010-07-23 11:08:13 EDT
I confirm to
1) The number of lines that you changed is smaller than 250.
2) You are the only author of these changed lines.
3) You apply the EPL to these changed lines.

for attachment 175075 [details].



Btw.: isMany seems to work after sql & transient fixes...
Comment 15 Eike Stepper CLA 2010-07-24 08:45:38 EDT
Created attachment 175140 [details]
Patch combined - for future reference

I've fixed the non-persistent references issue directly in XRefsQueryHandler.
Comment 16 Eike Stepper CLA 2010-07-24 09:30:51 EDT
Created attachment 175147 [details]
Patch v4 - test 320690 is failing

Test 320690 is failing, but also with MEMStore. Investigating...
Comment 17 Eike Stepper CLA 2010-07-25 05:11:27 EDT
Stefan, I've found and fixed twoi issues with scalar xRefs:

1) Detached revisions must be excluded:
      builder.append(CDODBSchema.ATTRIBUTES_VERSION);
      builder.append(">0 AND ");
      
Stefan, does this have to be considered for many refs, too?

2) The idString must be respected:
      builder.append(" AND ");
      builder.append(valueField);
      builder.append(" IN ");
      builder.append(idString);
      
I've added extra checks in QueryXRefsContext.addXRef() to exclude NULL targets, but stores should optimize them away (DBStore does so now).

---

Another possible issue could be that both locking and xrefCheck can take a considerable amount of time. Should we consider async progress monitoring so that big commits do not time out?

---

Committed to HEAD
Comment 18 Eike Stepper CLA 2010-07-25 05:12:01 EDT
Created attachment 175171 [details]
Patch v5 - for future reference
Comment 19 Eike Stepper CLA 2011-06-23 03:39:35 EDT
Available in R20110608-1407