Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 202063 - Loading CDORevision chunks instead of CDOID chunks
Summary: Loading CDORevision chunks instead of CDOID chunks
Status: CLOSED FIXED
Alias: None
Product: EMF
Classification: Modeling
Component: cdo.core (show other bugs)
Version: 1.0   Edit
Hardware: PC Windows XP
: P4 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Eike Stepper CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 202064
  Show dependency tree
 
Reported: 2007-09-03 07:45 EDT by Simon Mc Duff CLA
Modified: 2010-06-29 09:22 EDT (History)
1 user (show)

See Also:


Attachments
Load in chunk - Beta 1 (27.95 KB, patch)
2007-09-03 22:39 EDT, Simon Mc Duff CLA
no flags Details | Diff
Loasd In Chunk (39.52 KB, patch)
2007-09-04 22:05 EDT, Simon Mc Duff CLA
no flags Details | Diff
Load Chunk Beta 3 (28.29 KB, patch)
2007-09-05 08:38 EDT, Simon Mc Duff CLA
no flags Details | Diff
Load Chunk beta 4 (29.08 KB, patch)
2007-09-05 22:05 EDT, Simon Mc Duff CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Mc Duff CLA 2007-09-03 07:45:14 EDT
You know ... that you did allowed to the the following test.

I created a LoadChunkRevisionRequest
It takes in parameters a list of CDOID.
It return a list of CDORevisionImpl.

After we receive the list of CDOID from LoadChunkRequest I call
LoadChunkRevisionRequest with the return list from LoadChunkRequest.

I can go up to 5800 objects/ sec with that!! I was amazed!!!

Test with referenceChunk at 

200- 5861 objects/sec
300- 5827 objects / sec
500 - 5727 objects / sec
1000 - 5147 object/sec
2000 - 4231 objects / sec 

Performance should go faster as I increased referenceChunk .... Why we do not
see that ?.. Easy.. the first chunk of the collection we fetch... it is a list
of CDOID. So it will not call LoadChunkRevision... More the refenceChunk is big
more we will fetch object sequentially for the first part...
If I change that (harcoded in the code to not fetch CDOID the first part).. I
can go up to 7500 objects / sec.. We should even go faster if instead of CDOID
we fetch directly the object.. (didn`t try it)

Anyway we should take advantage of that.. don`t you think ? We can even have a
thread that will fetch object by chunk...to provide objects to the main
thread.. (to not block it) all the time. We can call it... SmartReadAhead
Thread !


In brief, if we can detect when we should load objects by chunk (like when we
iterate through collection) we can win big time!!!

Let me know what you think !
Comment 1 Simon Mc Duff CLA 2007-09-03 09:53:17 EDT
Do you mind if I create the first version... I already have the code but it is not in place.

CDOViewImpl
setLoadRevisionOfCollectionInChunkSize(default = 1)


Each time we iterate (what ever strategy for the ReferenceByChunk), it will LoadRevision in chunk.

I planned to put it only for CDOStore when we get index from a collection.
Comment 2 Eike Stepper CLA 2007-09-03 10:27:58 EDT
I'd appreciate if you attach the work you've already done, given it's working already ;-)
Optimizing for CDO native objects (CDOStore based) is ok as well.
Comment 3 Simon Mc Duff CLA 2007-09-03 14:53:16 EDT
(In reply to comment #0)
> You know ... that you did allowed to the the following test.
> I created a LoadChunkRevisionRequest
> It takes in parameters a list of CDOID.
> It return a list of CDORevisionImpl.
> After we receive the list of CDOID from LoadChunkRequest I call
> LoadChunkRevisionRequest with the return list from LoadChunkRequest.
> I can go up to 5800 objects/ sec with that!! I was amazed!!!
> Test with referenceChunk at 
> 200- 5861 objects/sec
> 300- 5827 objects / sec
> 500 - 5727 objects / sec
> 1000 - 5147 object/sec
> 2000 - 4231 objects / sec 
> Performance should go faster as I increased referenceChunk .... Why we do not
> see that ?.. Easy.. the first chunk of the collection we fetch... it is a list
> of CDOID. So it will not call LoadChunkRevision... More the refenceChunk is big
> more we will fetch object sequentially for the first part...
> If I change that (harcoded in the code to not fetch CDOID the first part).. I
> can go up to 7500 objects / sec.. We should even go faster if instead of CDOID
> we fetch directly the object.. (didn`t try it)
> Anyway we should take advantage of that.. don`t you think ? We can even have a
> thread that will fetch object by chunk...to provide objects to the main
> thread.. (to not block it) all the time. We can call it... SmartReadAhead
> Thread !
> In brief, if we can detect when we should load objects by chunk (like when we
> iterate through collection) we can win big time!!!
> Let me know what you think !

I was too excited... 

At work, 1090 objects/sec. because my model have many small collection.. If I turned the LoadRevisionByChunk on... I can go up to 1250 objects per seconds.

The best is the following, if I serialize objects that are defined as containment in the model at the same time as the Owner.
I can go around 5777 objects/sec. The fastest I can go it is around 9000 objects/sec... but my model have other references that are not containement relationship ... so it slow me down.


I will post my patch soon. It will include 
- load revision in bulk for collection 
- load revision and all revision that have containment relationship.. 


I will post the patch when I will clean it a bit...put some comments as well.


Comment 4 Simon Mc Duff CLA 2007-09-03 22:39:41 EDT
Created attachment 77609 [details]
Load in chunk - Beta 1

Here my patch. I refactored my code home.. so I didn`t test it at work yet. (I will do tomorrow morning) Since it is 22:45 here.. I assume you will have time to look at it before I go to work and retest it. 


To activate the LoadByChunk

CDOViewImpl.setLoadRevisionCollectionChunkSize( 1000 );

This will Fetch objects in chunk of 1000 for collection. I put the code in CDOStore... but probably I will move it to CDORevisionManagerIMpl. (Same things you did for referenceChunk).

The other feature is always there. Basically if you look in LoadRevisionRequest, LoadRevisionIndication, LoadChunkRevisionRequest, LoadChunkRevisionindication you will see that it calculate what objects it should transfer at the same time.

Session.getAttachedObject(CDORevisionImpl revision, int referenceChunk, HashMap<CDOID, CDORevisionImpl> attachedObjects)
FOr now it attach only attributs that have the following :
 reference, not many relationship, containment.

I put the code in session.. because later we could have rules per Session or views... to what to load and when to load it.

Like I said.. my performance is 7 time faster with these feature.
I don`t what to enter in rules that much.. since they are bothering to maintain. We will see.

Let me know what you think.
Comment 5 Eike Stepper CLA 2007-09-04 02:30:40 EDT
First before I forget to tell, my parents come visit me today and tomorrow. 
So I probably won't have that much time to follow current work.

Your patch looks good. If you agree I'd like to propose a few minor changes.
I hope it's ok that I commit them to CVS so that we have a baseline to
further work on.
Comment 6 Eike Stepper CLA 2007-09-04 03:16:54 EDT
Hmm, don't have the time now to actually perform the changes I'd like to propose.
So here in brief:

LoadChunkRevisionRequest will also be needed with a timeStamp like LoadRevisionByTime
to support CDOAudit. I'd like to see if we can merge the new chunk functionality into
the existing signals (pass ids list in, return CDORevisionImpl[] array).
The returned array can be larger than the ids passed in to return revisions that should
additionally be attached.

We should avoid state in the signals as much as possible due to the new fail over strategies.
They currently reuse a request in a fail over situation. It could turn out that this is not good
or that additional API is needed to cancel/reset a signal. But with the above design of
LoadRevisionXYZRequest (pass n ids in, receive n + prefetchSize revisions) it should no
longer be necessary to carry objectsToAttach as a member field in the signals.

In case you already want to start with the above changes, note that LoadRevisionByVersion
request will only ever be passed a single CDOID to load (plus a version int). Later it could
be enhanced to return multiple versions of the revision with that id. I believe it is currently
not used as a signal from client, only as method call within the server. But this will change
as soon as I have time to add more client side functionality (show history, ...).

I haven't had time so far to go deeply into your changes for CDOStore but I have the feeling 
that it could be not sufficient to only change the get() method. What about the other methods?
Comment 7 Simon Mc Duff CLA 2007-09-04 07:08:39 EDT
(In reply to comment #5)
> First before I forget to tell, my parents come visit me today and tomorrow. 
> So I probably won't have that much time to follow current work.
> Your patch looks good. If you agree I'd like to propose a few minor changes.
> I hope it's ok that I commit them to CVS so that we have a baseline to
> further work on.

Not a problem!! Family is always more important than anything else!! 
I'm always open for changes...
Comment 8 Simon Mc Duff CLA 2007-09-04 07:13:16 EDT
(In reply to comment #6)
> Hmm, don't have the time now to actually perform the changes I'd like to
> propose.
> So here in brief:
> LoadChunkRevisionRequest will also be needed with a timeStamp like
> LoadRevisionByTime
> to support CDOAudit. I'd like to see if we can merge the new chunk
> functionality into
> the existing signals (pass ids list in, return CDORevisionImpl[] array).
> The returned array can be larger than the ids passed in to return revisions
> that should
> additionally be attached.
> We should avoid state in the signals as much as possible due to the new fail
> over strategies.
> They currently reuse a request in a fail over situation. It could turn out that
> this is not good
> or that additional API is needed to cancel/reset a signal. But with the above
> design of
> LoadRevisionXYZRequest (pass n ids in, receive n + prefetchSize revisions) it
> should no
> longer be necessary to carry objectsToAttach as a member field in the signals.
> In case you already want to start with the above changes, note that
> LoadRevisionByVersion
> request will only ever be passed a single CDOID to load (plus a version int).
> Later it could
> be enhanced to return multiple versions of the revision with that id. I believe
> it is currently
> not used as a signal from client, only as method call within the server. But
> this will change
> as soon as I have time to add more client side functionality (show history,
> ...).
> I haven't had time so far to go deeply into your changes for CDOStore but I
> have the feeling 
> that it could be not sufficient to only change the get() method. What about the
> other methods?

What you mean by the get method ? It is a hashset .. very fast.
Which others method. The loadByChunk only apply for collection ? OK.. I see what you mean. method like indexOf.. and everything.. This is coming as well. Like I said.. wasn't completely but I wanted to have your input.

I will work on that and send you the new patch.
Comment 9 Simon Mc Duff CLA 2007-09-04 22:05:29 EDT
Created attachment 77674 [details]
Loasd In Chunk

Now we have AbstractLoadRevisionRequest and AbstractLoadRevisionIndication
Request is taking care of the read. (Client side)
Indication is taking care of the write (Server side)

It always return List<CDORevisionImpl>.

I didn`t modify CDOStore yet.. since I have a question. With the proxy you did...

Let say we have a list of CDOReferenceProxy.... 
We call contains on that list....when the CDOReferenceProxy will be fetch??? 
I don`t see it in the code.

Let me know what you think!
Comment 10 Eike Stepper CLA 2007-09-05 03:51:16 EDT
Some thoughts:

1. You could be right, my changes to CDOStore with respect to CDOReferenceProxy don't seem to be complete, too
2. CDOReferenceProxy.resolve() is only called from CDOStore. Later it could also be called from CDOLegacyImpl.transferXYZ()
    In all cases I have the associated CDOFeature from the calling context. I consider to remove getFeature() from CDOReferenceProxy.
3. I don't see an important difference between
    LoadRevisionRequest <-> LoadChunkRevisionRequest and
    LoadRevisionRequestByTime <-> LoadChunkRevisionRequestByTime
    I suggest that you move your AbstractLoadRevisionRequest code to LoadRevisionRequest (will become the base for all revision loading).
    Then you can remove both LoadChunkRevision versions and use the existing signals.
    If you want me I can provide an initial implementation.
    
Comment 11 Simon Mc Duff CLA 2007-09-05 07:07:16 EDT
(In reply to comment #10)
> Some thoughts:
> 1. You could be right, my changes to CDOStore with respect to CDOReferenceProxy
> don't seem to be complete, too
> 2. CDOReferenceProxy.resolve() is only called from CDOStore. Later it could
> also be called from CDOLegacyImpl.transferXYZ()
>     In all cases I have the associated CDOFeature from the calling context. I
> consider to remove getFeature() from CDOReferenceProxy.


> 3. I don't see an important difference between
>     LoadRevisionRequest <-> LoadChunkRevisionRequest and
>     LoadRevisionRequestByTime <-> LoadChunkRevisionRequestByTime
>     I suggest that you move your AbstractLoadRevisionRequest code to
> LoadRevisionRequest (will become the base for all revision loading).
>     Then you can remove both LoadChunkRevision versions and use the existing
> signals.
>     If you want me I can provide an initial implementation.

The difference is one is optimized to send only 1 id LoadRevisionRequest  and 
LoadChunkRevisionRequest is optimized to send mnany ids. 
If you merge them... it could be possible but it will not be optimized anymore because you will need to have to send extra data to determine which kind of data you weant to send. So far, your stuff seems to be very compress.. so I thought it was important.

LoadRevisionRequest could always send a list of ID... if we have only one ID  it will send the size == 1.

You don't mind sending extra information ?



Comment 12 Eike Stepper CLA 2007-09-05 07:11:07 EDT
I think in this case it is better to have reduce the code and send 4 additional bytes.
You could use an additional boolean to indeicate whether only one or a size and multiple ids will follow.
I leave this up to you ;-)
Comment 13 Simon Mc Duff CLA 2007-09-05 08:38:45 EDT
Created attachment 77718 [details]
Load Chunk Beta 3

Here LoadChunk.....java  are not there anymore.

To simplify everything... I removed the optimization of the minus referenceChunk. I did some test .. and since the page buffer isn't reach... I don't see any performance degradation.

I copied the file from my internal network to here... so maybe I didn't copy them correctly... I will test it at home.. but you can review the changes..
Comment 14 Eike Stepper CLA 2007-09-05 13:46:26 EDT
Is there a special reason why LoadRevisionRequest.ids is a Set and not a List?
Comment 15 Simon Mc Duff CLA 2007-09-05 16:17:17 EDT
(In reply to comment #14)
> Is there a special reason why LoadRevisionRequest.ids is a Set and not a List?

Because the process that will send this information need to be sure that the CDOID are unique. Mainly they will build Set instead of List.

Make sense ?
Comment 16 Simon Mc Duff CLA 2007-09-05 21:11:18 EDT
(In reply to comment #14)
> Is there a special reason why LoadRevisionRequest.ids is a Set and not a List?

We will need also to double check something.

Now it is possible to bring CDORevision on the client side that we already have.

I believe the code doesn`t handle that :

  public synchronized boolean add(CDORevisionImpl revision)
    {
      CDORevisionImpl previousRevision = isEmpty() ? null : super.getFirst();
      if (previousRevision != null && previousRevision.isCurrent())
      {
        previousRevision.setRevised(revision.getCreated() - 1);
      }

      super.addFirst(revision);
      return true;
    }

Should we remove the previousRevision ?
Otherwise we could have a lot of revision of the same objects.
Comment 17 Simon Mc Duff CLA 2007-09-05 22:05:32 EDT
Created attachment 77766 [details]
Load Chunk beta 4

Just a little fix!!


It is working as advertise!! :-)
Comment 18 Eike Stepper CLA 2007-09-06 02:43:41 EDT
 (In reply to comment #15)
> Because the process that will send this information need to be sure that the
> CDOID are unique. Mainly they will build Set instead of List.
> 
> Make sense ?

No, I don't think so.
If I understand your code correctly, the revisions to be sent in addition are all *contained* in to-one references.
If they are *contained* they can only be contained in *one* feature, thus their IDs are unique by definition.
Comment 19 Eike Stepper CLA 2007-09-06 05:41:08 EDT
Committed to CVS.

It was too late to look at your recent patch because I already had applied numerous refactorings.
Please review my changes and propose a new patch for your recent changes.

BTW. did you execute the "CDO2 AllTests" suite to verify your changes?
The suite is far from being complete, but at least they should not fail.
Currently they pass all.

Some more comments:

1. Where does recursion of Session.collectContainedRevisions() stop?

2. CDORevisionResolverImpl.getRevisions() didn't look into the cache.

3. Revisions seemed to be added multiple times to RevisionManager.

4. CDOStore.get() can send 2 synchronous requests now, one after the other.

5. CDOStore.loadAhead() doesn't respect the view discriminator (CDOAudit.timeStamp). 
   NOT SOLVED SO FAR, but should be moved to CDOViewImpl.
   Don't fully understand the logic of CDOStore.loadAhead(), need to further dig...

6. CDORevisionResolverImpl.containsRevision() was incorrect.

7. CDORevisionResolverImpl.containsRevisionByTime() was missing. (No error due to 5.)

8. CDORevisionResolverImpl.getRevisionsByTime() was missing. (No error due to 5.)

Comment 20 Simon Mc Duff CLA 2007-09-06 07:02:23 EDT
(In reply to comment #18)
>  (In reply to comment #15)
> > Because the process that will send this information need to be sure that the
> > CDOID are unique. Mainly they will build Set instead of List.
> > 
> > Make sense ?
> No, I don't think so.
> If I understand your code correctly, the revisions to be sent in addition are
> all *contained* in to-one references.
No you don't understand it correctly.

CDOStore will grab the next ? CDOID that are not already fetch. It is onyl a collection. So it can be duplicate.

 if (cdoFeature.isReference())
    {
      if (cdoFeature.isMany() && result instanceof CDOID)
      {    	  
    	  if (view.getLoadRevisionCollectionChunkSize() > 1 && ! view.getSession().getRevisionManager().isRegisteredRevision((CDOID)result))
    	  {


> If they are *contained* they can only be contained in *one* feature, thus their
> IDs are unique by definition.

On the server side... Yes I take only the contained for now.. but with my smartreadeahead or any process but would like to fetch more data.. we take more.. So I prefer to keep my Map there as well


Make sense ?

Comment 21 Simon Mc Duff CLA 2007-09-06 08:02:11 EDT
Here some others comments:

loadRevision by chunk.. you can have duplicate on the server...
(MultiThread)
In this part of the code you could have duplicate...
In this case you shouldn't send exception. You should instead not insert it.

 int version = revision.getVersion();
      ListIterator<CDORevisionImpl> it = super.listIterator(0);
      while (it.hasNext())
      {
        CDORevisionImpl r = it.next();
        int v = r.getVersion();
        if (v == version)
        {
          throw new IllegalStateException("Duplicate revision");
        }

        if (v < version)
        {
          it.previous();
          break;
        }
      }

      it.add(revision);
      return true;
    }
Comment 22 Eike Stepper CLA 2007-10-16 05:25:14 EDT
Fixed in I200710101638.
Comment 23 Nick Boldt CLA 2008-01-28 16:44:32 EST
Move to verified as per bug 206558.
Comment 24 Eike Stepper CLA 2008-06-10 02:30:34 EDT
Reversioned due to graduation
Comment 25 Eike Stepper CLA 2008-09-11 13:48:32 EDT
CLOSING