Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 329723 - Common Query Language as EMF Query2
Summary: Common Query Language as EMF Query2
Status: NEW
Alias: None
Product: EMF
Classification: Modeling
Component: cdo.core (show other bugs)
Version: 4.13   Edit
Hardware: PC Windows Vista
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-08 23:13 EST by saurav sarkar CLA
Modified: 2020-12-11 10:38 EST (History)
9 users (show)

See Also:


Attachments
Junit test cases for EMF Query integration (16.18 KB, application/zip)
2010-11-08 23:13 EST, saurav sarkar CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description saurav sarkar CLA 2010-11-08 23:13:09 EST
Created attachment 182676 [details]
Junit test cases for EMF Query integration

Hi All,

In Conjunction with the bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=256931
Opening this enhancement request.

CDO Has a Query interface through which every backend queries can be executed from the client, but then the queries are backend specific. A common query language is needed through which generic queries from the client can be executed. Already OCL as a server side query language is in place. We propose to make EMF Query 2 also as one option for backend language. User can  express their queries in textual based syntax.For this we need to analyze the requirements. Also to check the feasiblity and the time span for the task.

Lets start our discussion about the EMF Query2 integration with CDO here.I have written a test case where EMF Query2 based queries are fired from the client and the queries are handled in a server class called EMFQueryHandler.The QueryHandler and the test are on same lines of OCL tests and OCL Query Handler.The queries in the test case are written in LPG grammer based syntax.

Next Steps on this topic:
1.Checking the File based store implementation for CDO.Execution of EMF Query on the file based store.
2.Improving the test cases accordingly.
3.Analyzing the feasibility of the Extensibility aspect so that EMF Query could be mapped /or understood by other backend query languages.

How to run the test cases:

1. Unzip the file attached.
2. Put the org.eclipse.emf.cdo.server.emfquery plug-in in your workspace.
3. 'Files_In_Tests_Plugin' folder contains the files to be put in the org.eclipse.emf.cdo.tests plug-in.It contains the EMF Query test file and the changed manifest.mf which now has a dependency to EMF Query2 and the CDO server EMF Query plug-in.

Let me know if any issues coming while running the tests.

Please provide your comments/suggestions on this topic.

Thanks and Regards,
Saurav
Comment 1 Victor Roldan Betancort CLA 2011-02-21 10:22:38 EST
Hi Saurav,

I've been trying to run the code provided. I found some issues I would like to discuss.

I adapted the code to the latest status of Query2 HEAD code. There seem to be some plugin names changed.

EMFQueryHandler seems to assume there is a JVM listener to be able to open a session and view in the server side. This is not necessary, as the server is already aware of CDOView concept. So the "repository" argument is no longer necessary.

The hardest part was getting an instance of class "Index". IndexFactory seems to assume the code is running in an OSGi container, whereas CDO can run standalone. In such environment, IndexFactory fails to get the bundle Activator, which is used merely to get a path to a folder where to store the index file.

Apparently running in an OSGi environment doesnt seem to be a requisite for Query2, as I could run the query properly creating my own instance of PageableIndexImpl:

String absolutePath = File.createTempFile("emf_query2", ".index").getAbsolutePath();
Index index = new PageableIndexImpl(new PageableIndexImpl.Options(absolutePath, PageableIndexImpl.Options.DISABLED, PageableIndexImpl.Options.DISABLED));
			return QueryProcessorFactory.getDefault().createQueryProcessor(index);

However, I had to modify org.eclipse.emf.query2.index so it exports org.eclipse.emf.query.index.internal.impl. I had no other chance, since IndexFactory was completely unusable in a standalone scenario. In the other hand, making internal packages not exported made things difficult. We usually export internal packages and mark them as internal with x-internal:=true in MANIFEST.MF. I wont discuss here which option is better, but as you see, there is no room for others to play / extend / refactor the framework without internal exports ;)

So it looks like IndexFactory could be improved to work in standalone mode. For instance, the caller could decide provide the path to the index file. Another option is making the factory to check wheter OSGi is running.

Asumming those hacks I had to do, the test-case runs fine. The next step would be investigating on the possible benefits of a CDO-specific Index implementation.

Cheers,
Víctor.
Comment 2 Ashwani Kr Sharma CLA 2011-02-28 04:19:36 EST
Hi,

The IndexFactory was changed few days back to check if osgi is running and take care of it while retrieving location of index.
Currenlty, in non-osgi cases all index are kept in memory and no persistency is possible.

But this only improves stuff partially. We will enhance to ensure custom location of indexes can also be specified. 

Regards,
Ashwani Kr Sharma
Comment 3 Victor Roldan Betancort CLA 2011-02-28 06:57:20 EST
> The IndexFactory was changed few days back to check if osgi is running and take
> care of it while retrieving location of index.
> Currenlty, in non-osgi cases all index are kept in memory and no persistency is
> possible.

I've checked out the code and seems to work well now without NPEs :)
Now I don't need to call internal code ;)

> But this only improves stuff partially. We will enhance to ensure custom
> location of indexes can also be specified. 

Looking forward!
Comment 4 saurav sarkar CLA 2011-03-01 01:08:25 EST
Hi Victor,

Sorry i could not comment before because i was on vacation.
Thanks for testing out the code.

Please find the comments below.
(In reply to comment #1)
> Hi Saurav,
> I've been trying to run the code provided. I found some issues I would like to
> discuss.
> I adapted the code to the latest status of Query2 HEAD code. There seem to be
> some plugin names changed.
Yes some names have changed.

> EMFQueryHandler seems to assume there is a JVM listener to be able to open a
> session and view in the server side. This is not necessary, as the server is
> already aware of CDOView concept. So the "repository" argument is no longer
> necessary.
> The hardest part was getting an instance of class "Index". IndexFactory seems
> to assume the code is running in an OSGi container, whereas CDO can run
> standalone. In such environment, IndexFactory fails to get the bundle
> Activator, which is used merely to get a path to a folder where to store the
> index file.
> Apparently running in an OSGi environment doesnt seem to be a requisite for
> Query2, as I could run the query properly creating my own instance of
> PageableIndexImpl:
> String absolutePath = File.createTempFile("emf_query2",
> ".index").getAbsolutePath();
> Index index = new PageableIndexImpl(new PageableIndexImpl.Options(absolutePath,
> PageableIndexImpl.Options.DISABLED, PageableIndexImpl.Options.DISABLED));
>             return
> QueryProcessorFactory.getDefault().createQueryProcessor(index);
> However, I had to modify org.eclipse.emf.query2.index so it exports
> org.eclipse.emf.query.index.internal.impl. I had no other chance, since
> IndexFactory was completely unusable in a standalone scenario. In the other
> hand, making internal packages not exported made things difficult. We usually
> export internal packages and mark them as internal with x-internal:=true in
> MANIFEST.MF. I wont discuss here which option is better, but as you see, there
> is no room for others to play / extend / refactor the framework without
> internal exports ;)
> So it looks like IndexFactory could be improved to work in standalone mode. For
> instance, the caller could decide provide the path to the index file. Another
> option is making the factory to check wheter OSGi is running.

As Ashwani mentioned above the check for OSGI has been put.We would also enhance the IndexFactory for the persistence.
I have alread opened a Bugzilla entry https://bugs.eclipse.org/bugs/show_bug.cgi?id=338369

> Asumming those hacks I had to do, the test-case runs fine. The next step would
> be investigating on the possible benefits of a CDO-specific Index
> implementation.
> Cheers,
> Víctor.

There were couple of points i had mentioned above.

(a) To have the File based store implementation and then have the indexer working on that store.
(b) Generic indexer on the types of stores and later on customization.
(c) Mapping of EMF Query with other Query languages.

Please provide your comments on the above points and also if you could add your own points.We would need your CDO expertise and help on this to drive the topic forward.


One more point i am not clear is your point on the forum post i.e.

'The good thing about CDO is that a Resource doesn't need to be fully 
loaded to access any of its children EObjects,'.

Could you please explain this point.

Cheers and thanks again,
Saurav
Comment 5 Victor Roldan Betancort CLA 2011-03-01 05:49:49 EST
> There were couple of points i had mentioned above.
> 
> (a) To have the File based store implementation and then have the indexer
> working on that store.

There has been a prototype, but its obsolete now. I can't seem to understand the benefits of providing such file-based IStore. Having the indexer working with it is certainly not a good reason, as I managed to persist an index referencing CDOObjects in a database...

Furthermore, achieving scalability of huge models with a file-based IStore seems not trivial to me...

> (b) Generic indexer on the types of stores and later on customization.

To achieve good performance, the initial step would be probably to provide an IStore-specific Indexer. A IStore-agnostic Indexer would be desirable but most probable not really as fast as the IStore-specific variant.

> (c) Mapping of EMF Query with other Query languages.

This doesn't sound like a feature in which our CDO knowledge would help, huh? ;)

Having abstract notations (like an emf model) of query languages and performing model transformation from source language to query2 abstract notation would be probably the way to go ;)

If you mean creating an IStore-specific interpreter of EMF Query language, the I would agree :P

> One more point i am not clear is your point on the forum post i.e.
> 
> 'The good thing about CDO is that a Resource doesn't need to be fully 
> loaded to access any of its children EObjects,'.
> 
> Could you please explain this point.

Each CDOObject is identified in the framework by a CDOID. You can have random access to any CDOObject by just asking the framework "give me the object with this identifier" using, for instance, CDOView.getObject(CDOID id);

I don't know the internals of the current implementation of the Indexer. Maybe if you could provide more detail on how it works and its goals, I could have a better picture of how such indexer would make sense in a CDO environment.

Correct me if wrong, but from what I know its goal is trying to avoid loading entire Resources. So if a query is executed, it will help optimizing it by loading only resources subject to contain matches. I guess its probably more than that...

So the "avoid loading a whole resource" is not really a problem in CDO, but still, a CDO-based indexer implementation may still help on query optimization. Such index would persist references to objects by its CDOID, and would have random access to those EObjects without loading the resource.

So what I meant is that XMIResources are needed to be fully loaded to obtain access to its children. CDO provides you EObject access without loading its container tree or Resource.

Cheers,
Víctor.
Comment 6 saurav sarkar CLA 2011-03-15 13:03:58 EDT
Hi Victor,

Please find the comments below

(In reply to comment #5)
> > There were couple of points i had mentioned above.
> > 
> > (a) To have the File based store implementation and then have the indexer
> > working on that store.
> 
> There has been a prototype, but its obsolete now. I can't seem to understand
> the benefits of providing such file-based IStore. Having the indexer working
> with it is certainly not a good reason, as I managed to persist an index
> referencing CDOObjects in a database...
> 
> Furthermore, achieving scalability of huge models with a file-based IStore
> seems not trivial to me...
> 

You must be the better judge than me here :)

> > (b) Generic indexer on the types of stores and later on customization.
> 
> To achieve good performance, the initial step would be probably to provide an
> IStore-specific Indexer. A IStore-agnostic Indexer would be desirable but most
> probable not really as fast as the IStore-specific variant.
> 

Yes I agree.We can start with a IStore specific version of indexer.Which store would you see can be a good starting point/POC for the indexer? I suggest starting with a DB store.

> > (c) Mapping of EMF Query with other Query languages.
> 
> This doesn't sound like a feature in which our CDO knowledge would help, huh?
> ;)
> 
> Having abstract notations (like an emf model) of query languages and performing
> model transformation from source language to query2 abstract notation would be
> probably the way to go ;)
> 
Do you mean each and every query language e.g. SQL should have an abstract notation and then transform it ?

> If you mean creating an IStore-specific interpreter of EMF Query language, the
> I would agree :P
>
Yes I meant that
 
> > One more point i am not clear is your point on the forum post i.e.
> > 
> > 'The good thing about CDO is that a Resource doesn't need to be fully 
> > loaded to access any of its children EObjects,'.
> > 
> > Could you please explain this point.
> 
> Each CDOObject is identified in the framework by a CDOID. You can have random
> access to any CDOObject by just asking the framework "give me the object with
> this identifier" using, for instance, CDOView.getObject(CDOID id);
> 
> I don't know the internals of the current implementation of the Indexer. Maybe
> if you could provide more detail on how it works and its goals, I could have a
> better picture of how such indexer would make sense in a CDO environment.
> 
> Correct me if wrong, but from what I know its goal is trying to avoid loading
> entire Resources. So if a query is executed, it will help optimizing it by
> loading only resources subject to contain matches. I guess its probably more
> than that...
> 
The indexer contains information about the resources,eObjects and links between them(both forward and backward).All these information can be retrieved without actually loading the resources.URI and a ‘name’ attribute is indexed.But if you want more attributes of the model element, then the indexer loads the concerned resource.
> So the "avoid loading a whole resource" is not really a problem in CDO, but
> still, a CDO-based indexer implementation may still help on query optimization.
> Such index would persist references to objects by its CDOID, and would have
> random access to those EObjects without loading the resource.
> 
That would mean even the attribute viewing also wont require resource loading ?.


> So what I meant is that XMIResources are needed to be fully loaded to obtain
> access to its children. CDO provides you EObject access without loading its
> container tree or Resource.
> 
> Cheers,
> Víctor.
Things look interesting. I think we can proceed if decide on the following points.

(a)	Type of store and the Query language.
(b)	EMF model for Query Language for that store.
(c)	Model transformation language for transformation.
(d)	Changes required/API opening from the Query Indexer.

Let me know if it looks fine to you. Meanwhile I will explorer more CDO server code.

cheers,
Saurav
Comment 7 Eike Stepper CLA 2011-03-15 13:17:12 EDT
(In reply to comment #6)
> [...] Yes I agree.We can start with a IStore specific version of indexer.Which store
> would you see can be a good starting point/POC for the indexer? I suggest
> starting with a DB store.

I strongly suggest that you first try to build a backend type agnostic indexer with the generic IStore/IStoreAccessor API. Only if that turns out to be a performance bottleneck (which I don't expect) I would start to do backend specific tricks. I'd even be willing to extend the generic interfaces to better support indexing should it be necessary. But this will only pop out while actually trying it ;-)
Comment 8 Eike Stepper CLA 2011-06-23 04:30:22 EDT
Moving all open enhancement requests to 4.1
Comment 9 saurav sarkar CLA 2011-07-07 03:55:17 EDT
Hi All,

Sorry for the delay in this topic.

We are building up a storage processor framework in Query2 which deals with heterogeneous backend. Storage processor framework exposes API for query mapping, result set conversion etc. CDO based backend become one use case. So we have made one CDO Processor which contributes to the storage processor framework.The CDO Processor does not use any indexing mechanism and totally dependent on how backend is performing.

Now there were two ways thought to approach this topic.

Approach a (Query core and CDO Processor in the client side)
• Start the CDO server with Hibernate store
• Fire an EMF Query from the client.
• The EMF Query is handled by the Query core and eventually passed to the CDO Processor.CDO Processor gets the mapping done to HQL Query  and then open the CDO session and fire the store specific query.
• The Query handlers of specific store execute the query the very same way they have been doing now.
• The results are then again mapped back to Query2 result set.
• Query2 result set is shown to the user.
• The Query mapping and result set mapping has to be  store specific contributions.

Advantages

(a) No need to change the existing store specific query handlers.
(b) Dirty state handling of resources can happen at client side.
(c) Type safety check of queries happens at client side.


Disadvantages

(a) Client learns the new way of expressing query in CDO i.e with Query APIs.

Approach b( Query core and CDO Processor in the server side)

• It is according to the patch already attached.
• A server component called EMF Query Handler is established.
• Start the CDO server with Hibernate store.
• Client fires the query
• EMF Query Handler accepts the query and sends the execution call to Query core.
• The Query core calls CDO Processor.
• The CDO processor then opens the hibernate session and maps the emf query to HQL.Then it executes the query. This part is tightly bound to Hibernate Store now. Can be thought of some generic implementation here. Furthermore plug-in contribution from specific stores ?
• Dirty state handling may have to be done through CDOQueryInfo.getChangeSetData()


Advantages

(a) Client does not have to learn a new way of expressing the queries from client side.

Disadvantages
(a) Each store has to change of how to contribute the handling of query components to the EMF Query handler.

In both the cases the query mapping and result set mapping contribution has to be made by the specific stores.

A small POC has been done on both the solutions with a simple mapping of an EMF Query to HQL Query.

I would like to solicit comments from you all on this topic with the approaches mentioned above.

cheers,
Saurav
Comment 10 Axel Uhl CLA 2011-07-07 15:44:42 EDT
(In reply to comment #9)
> I would like to solicit comments from you all on this topic with the approaches
> mentioned above.

Hi Saurav,

sorry, I don't understand the difference. It sounds as if in both cases the EMF query is translated to a query that is sent to and processed by CDO. What am I missing?
Comment 11 Victor Roldan Betancort CLA 2011-07-08 09:47:35 EDT
Saurav, correct me if wrong, but from the user point of view I can only see one difference: users either express queries through CDO API or through Query2 APIs. Is that correct?

In such case, maybe it would be nice to have both alternatives?
- users tied to pure EMF APIs can benefit from Query2 back-end optimizations through Query2 API
- users tied to CDO APIs can perform queries through CDOView.createQuery(), and still benefit from Query2 back-end optimizations

So users can do the query in either way :)

But I see a more important difference, implementation-wise, in each approach.

In one approach query translation to HSQL is done at client side, in the other, at server side. Why not just always let CDOProcesor redirect the query to CDO Query API in both scenarios, and always let server side take care of query translation / interpretation? The problem with your proposed approach at client-side is that client-side must know which backend is being actually used to do the query translation. For instance, if you used OODB backend implementation, you'll need to translate Query2 to a different query language instead of HSQL, and client should be aware of that! It would be way better if we let that being handled at server side.

But the real highlight here is the new Query2 storage processor, that makes things way easier for different implementations like CDO or Teneo. I like it!

Let me know what you think :)
Comment 12 saurav sarkar CLA 2011-07-11 06:04:13 EDT
Hi Victor,

Many thanks for the comments provided.
Please scroll down from my opinions :)

(In reply to comment #11)
> Saurav, correct me if wrong, but from the user point of view I can only see one
> difference: users either express queries through CDO API or through Query2
> APIs. Is that correct?

Correct
> In such case, maybe it would be nice to have both alternatives?
> - users tied to pure EMF APIs can benefit from Query2 back-end optimizations
> through Query2 API
> - users tied to CDO APIs can perform queries through CDOView.createQuery(), and
> still benefit from Query2 back-end optimizations
> So users can do the query in either way :)
> But I see a more important difference, implementation-wise, in each approach.
> In one approach query translation to HSQL is done at client side, in the other,
> at server side. Why not just always let CDOProcesor redirect the query to CDO
> Query API in both scenarios, and always let server side take care of query
> translation / interpretation? The problem with your proposed approach at
> client-side is that client-side must know which backend is being actually used
> to do the query translation. For instance, if you used OODB backend
> implementation, you'll need to translate Query2 to a different query language
> instead of HSQL, and client should be aware of that! It would be way better if
> we let that being handled at server side.

Correct me if am wrong here that i thought that this mapping can be done on the server side only , by the contribution from specific stores.So whenever the call for the mapping comes, mapping is provided by the corresponding store.

> But the real highlight here is the new Query2 storage processor, that makes
> things way easier for different implementations like CDO or Teneo. I like it!
> Let me know what you think :)

Yes the storage processor framework can be quite powerful.
Let me know your further comments.

cheers,
Saurav
Comment 13 Victor Roldan Betancort CLA 2011-07-11 06:18:45 EDT
> Correct me if am wrong here that i thought that this mapping can be done on the
> server side only , by the contribution from specific stores.So whenever the
> call for the mapping comes, mapping is provided by the corresponding store.

I'd say it is not impossible to do it at client side, but I rather meant that it would be way more convenient to do it at server-side, because each IStore implementation would know how to do execute certain Query services.

I probably read between lines, I though you were suggesting to do it at client side...

So my opinion is to do that at server side, indeed.

> Yes the storage processor framework can be quite powerful.
> Let me know your further comments.

Yes, I'd do it this way:
- Clients can execute queries either though CDO API or Query2 API
- In either case, CDOProcessor sends the query to
the server through CDO query API. Both Query core and CDOProcessor stays at client side.
- CDO Server receives an incoming query. An IStore agnostic Query2 interpreter executes the query in terms of CDO Server API.

The real work here relies on the interpreter. And unless Query2 provides an immediate mapping to languages like HQL, an IStore-specific query interpreter won't be much easier to implement, I suspect.

In fact, chances are that we need a parser at server side, unless we provide in the CDO Query the actual Query2 language AST (i.e, as a EMF model).

I'm of course open to discussion, this is just a proposal :)
Comment 14 Eike Stepper CLA 2012-08-14 22:53:19 EDT
Moving all open issues to 4.2. Open bugs can be ported to 4.1 maintenance after they've been fixed in master.
Comment 15 Eike Stepper CLA 2013-06-27 04:08:41 EDT
Moving all outstanding enhancements to 4.3
Comment 16 Eike Stepper CLA 2014-08-19 09:28:04 EDT
Moving all open enhancement requests to 4.4
Comment 17 Eike Stepper CLA 2014-08-19 09:37:25 EDT
Moving all open enhancement requests to 4.4
Comment 18 Eike Stepper CLA 2015-07-14 02:14:33 EDT
Moving all open bugzillas to 4.5.
Comment 19 Eike Stepper CLA 2016-07-31 00:57:33 EDT
Moving all unaddressed bugzillas to 4.6.
Comment 20 Eike Stepper CLA 2017-12-28 01:11:35 EST
Moving all open bugs to 4.7
Comment 21 Eike Stepper CLA 2019-11-08 02:14:49 EST
Moving all unresolved issues to version 4.8-
Comment 22 Eike Stepper CLA 2019-12-13 12:47:47 EST
Moving all unresolved issues to version 4.9
Comment 23 Eike Stepper CLA 2020-12-11 10:38:36 EST
Moving to 4.13.