Community
Participate
Working Groups
You know ... that you did allowed to the the following test. I created a LoadChunkRevisionRequest It takes in parameters a list of CDOID. It return a list of CDORevisionImpl. After we receive the list of CDOID from LoadChunkRequest I call LoadChunkRevisionRequest with the return list from LoadChunkRequest. I can go up to 5800 objects/ sec with that!! I was amazed!!! Test with referenceChunk at 200- 5861 objects/sec 300- 5827 objects / sec 500 - 5727 objects / sec 1000 - 5147 object/sec 2000 - 4231 objects / sec Performance should go faster as I increased referenceChunk .... Why we do not see that ?.. Easy.. the first chunk of the collection we fetch... it is a list of CDOID. So it will not call LoadChunkRevision... More the refenceChunk is big more we will fetch object sequentially for the first part... If I change that (harcoded in the code to not fetch CDOID the first part).. I can go up to 7500 objects / sec.. We should even go faster if instead of CDOID we fetch directly the object.. (didn`t try it) Anyway we should take advantage of that.. don`t you think ? We can even have a thread that will fetch object by chunk...to provide objects to the main thread.. (to not block it) all the time. We can call it... SmartReadAhead Thread ! In brief, if we can detect when we should load objects by chunk (like when we iterate through collection) we can win big time!!! Let me know what you think !
Do you mind if I create the first version... I already have the code but it is not in place. CDOViewImpl setLoadRevisionOfCollectionInChunkSize(default = 1) Each time we iterate (what ever strategy for the ReferenceByChunk), it will LoadRevision in chunk. I planned to put it only for CDOStore when we get index from a collection.
I'd appreciate if you attach the work you've already done, given it's working already ;-) Optimizing for CDO native objects (CDOStore based) is ok as well.
(In reply to comment #0) > You know ... that you did allowed to the the following test. > I created a LoadChunkRevisionRequest > It takes in parameters a list of CDOID. > It return a list of CDORevisionImpl. > After we receive the list of CDOID from LoadChunkRequest I call > LoadChunkRevisionRequest with the return list from LoadChunkRequest. > I can go up to 5800 objects/ sec with that!! I was amazed!!! > Test with referenceChunk at > 200- 5861 objects/sec > 300- 5827 objects / sec > 500 - 5727 objects / sec > 1000 - 5147 object/sec > 2000 - 4231 objects / sec > Performance should go faster as I increased referenceChunk .... Why we do not > see that ?.. Easy.. the first chunk of the collection we fetch... it is a list > of CDOID. So it will not call LoadChunkRevision... More the refenceChunk is big > more we will fetch object sequentially for the first part... > If I change that (harcoded in the code to not fetch CDOID the first part).. I > can go up to 7500 objects / sec.. We should even go faster if instead of CDOID > we fetch directly the object.. (didn`t try it) > Anyway we should take advantage of that.. don`t you think ? We can even have a > thread that will fetch object by chunk...to provide objects to the main > thread.. (to not block it) all the time. We can call it... SmartReadAhead > Thread ! > In brief, if we can detect when we should load objects by chunk (like when we > iterate through collection) we can win big time!!! > Let me know what you think ! I was too excited... At work, 1090 objects/sec. because my model have many small collection.. If I turned the LoadRevisionByChunk on... I can go up to 1250 objects per seconds. The best is the following, if I serialize objects that are defined as containment in the model at the same time as the Owner. I can go around 5777 objects/sec. The fastest I can go it is around 9000 objects/sec... but my model have other references that are not containement relationship ... so it slow me down. I will post my patch soon. It will include - load revision in bulk for collection - load revision and all revision that have containment relationship.. I will post the patch when I will clean it a bit...put some comments as well.
Created attachment 77609 [details] Load in chunk - Beta 1 Here my patch. I refactored my code home.. so I didn`t test it at work yet. (I will do tomorrow morning) Since it is 22:45 here.. I assume you will have time to look at it before I go to work and retest it. To activate the LoadByChunk CDOViewImpl.setLoadRevisionCollectionChunkSize( 1000 ); This will Fetch objects in chunk of 1000 for collection. I put the code in CDOStore... but probably I will move it to CDORevisionManagerIMpl. (Same things you did for referenceChunk). The other feature is always there. Basically if you look in LoadRevisionRequest, LoadRevisionIndication, LoadChunkRevisionRequest, LoadChunkRevisionindication you will see that it calculate what objects it should transfer at the same time. Session.getAttachedObject(CDORevisionImpl revision, int referenceChunk, HashMap<CDOID, CDORevisionImpl> attachedObjects) FOr now it attach only attributs that have the following : reference, not many relationship, containment. I put the code in session.. because later we could have rules per Session or views... to what to load and when to load it. Like I said.. my performance is 7 time faster with these feature. I don`t what to enter in rules that much.. since they are bothering to maintain. We will see. Let me know what you think.
First before I forget to tell, my parents come visit me today and tomorrow. So I probably won't have that much time to follow current work. Your patch looks good. If you agree I'd like to propose a few minor changes. I hope it's ok that I commit them to CVS so that we have a baseline to further work on.
Hmm, don't have the time now to actually perform the changes I'd like to propose. So here in brief: LoadChunkRevisionRequest will also be needed with a timeStamp like LoadRevisionByTime to support CDOAudit. I'd like to see if we can merge the new chunk functionality into the existing signals (pass ids list in, return CDORevisionImpl[] array). The returned array can be larger than the ids passed in to return revisions that should additionally be attached. We should avoid state in the signals as much as possible due to the new fail over strategies. They currently reuse a request in a fail over situation. It could turn out that this is not good or that additional API is needed to cancel/reset a signal. But with the above design of LoadRevisionXYZRequest (pass n ids in, receive n + prefetchSize revisions) it should no longer be necessary to carry objectsToAttach as a member field in the signals. In case you already want to start with the above changes, note that LoadRevisionByVersion request will only ever be passed a single CDOID to load (plus a version int). Later it could be enhanced to return multiple versions of the revision with that id. I believe it is currently not used as a signal from client, only as method call within the server. But this will change as soon as I have time to add more client side functionality (show history, ...). I haven't had time so far to go deeply into your changes for CDOStore but I have the feeling that it could be not sufficient to only change the get() method. What about the other methods?
(In reply to comment #5) > First before I forget to tell, my parents come visit me today and tomorrow. > So I probably won't have that much time to follow current work. > Your patch looks good. If you agree I'd like to propose a few minor changes. > I hope it's ok that I commit them to CVS so that we have a baseline to > further work on. Not a problem!! Family is always more important than anything else!! I'm always open for changes...
(In reply to comment #6) > Hmm, don't have the time now to actually perform the changes I'd like to > propose. > So here in brief: > LoadChunkRevisionRequest will also be needed with a timeStamp like > LoadRevisionByTime > to support CDOAudit. I'd like to see if we can merge the new chunk > functionality into > the existing signals (pass ids list in, return CDORevisionImpl[] array). > The returned array can be larger than the ids passed in to return revisions > that should > additionally be attached. > We should avoid state in the signals as much as possible due to the new fail > over strategies. > They currently reuse a request in a fail over situation. It could turn out that > this is not good > or that additional API is needed to cancel/reset a signal. But with the above > design of > LoadRevisionXYZRequest (pass n ids in, receive n + prefetchSize revisions) it > should no > longer be necessary to carry objectsToAttach as a member field in the signals. > In case you already want to start with the above changes, note that > LoadRevisionByVersion > request will only ever be passed a single CDOID to load (plus a version int). > Later it could > be enhanced to return multiple versions of the revision with that id. I believe > it is currently > not used as a signal from client, only as method call within the server. But > this will change > as soon as I have time to add more client side functionality (show history, > ...). > I haven't had time so far to go deeply into your changes for CDOStore but I > have the feeling > that it could be not sufficient to only change the get() method. What about the > other methods? What you mean by the get method ? It is a hashset .. very fast. Which others method. The loadByChunk only apply for collection ? OK.. I see what you mean. method like indexOf.. and everything.. This is coming as well. Like I said.. wasn't completely but I wanted to have your input. I will work on that and send you the new patch.
Created attachment 77674 [details] Loasd In Chunk Now we have AbstractLoadRevisionRequest and AbstractLoadRevisionIndication Request is taking care of the read. (Client side) Indication is taking care of the write (Server side) It always return List<CDORevisionImpl>. I didn`t modify CDOStore yet.. since I have a question. With the proxy you did... Let say we have a list of CDOReferenceProxy.... We call contains on that list....when the CDOReferenceProxy will be fetch??? I don`t see it in the code. Let me know what you think!
Some thoughts: 1. You could be right, my changes to CDOStore with respect to CDOReferenceProxy don't seem to be complete, too 2. CDOReferenceProxy.resolve() is only called from CDOStore. Later it could also be called from CDOLegacyImpl.transferXYZ() In all cases I have the associated CDOFeature from the calling context. I consider to remove getFeature() from CDOReferenceProxy. 3. I don't see an important difference between LoadRevisionRequest <-> LoadChunkRevisionRequest and LoadRevisionRequestByTime <-> LoadChunkRevisionRequestByTime I suggest that you move your AbstractLoadRevisionRequest code to LoadRevisionRequest (will become the base for all revision loading). Then you can remove both LoadChunkRevision versions and use the existing signals. If you want me I can provide an initial implementation.
(In reply to comment #10) > Some thoughts: > 1. You could be right, my changes to CDOStore with respect to CDOReferenceProxy > don't seem to be complete, too > 2. CDOReferenceProxy.resolve() is only called from CDOStore. Later it could > also be called from CDOLegacyImpl.transferXYZ() > In all cases I have the associated CDOFeature from the calling context. I > consider to remove getFeature() from CDOReferenceProxy. > 3. I don't see an important difference between > LoadRevisionRequest <-> LoadChunkRevisionRequest and > LoadRevisionRequestByTime <-> LoadChunkRevisionRequestByTime > I suggest that you move your AbstractLoadRevisionRequest code to > LoadRevisionRequest (will become the base for all revision loading). > Then you can remove both LoadChunkRevision versions and use the existing > signals. > If you want me I can provide an initial implementation. The difference is one is optimized to send only 1 id LoadRevisionRequest and LoadChunkRevisionRequest is optimized to send mnany ids. If you merge them... it could be possible but it will not be optimized anymore because you will need to have to send extra data to determine which kind of data you weant to send. So far, your stuff seems to be very compress.. so I thought it was important. LoadRevisionRequest could always send a list of ID... if we have only one ID it will send the size == 1. You don't mind sending extra information ?
I think in this case it is better to have reduce the code and send 4 additional bytes. You could use an additional boolean to indeicate whether only one or a size and multiple ids will follow. I leave this up to you ;-)
Created attachment 77718 [details] Load Chunk Beta 3 Here LoadChunk.....java are not there anymore. To simplify everything... I removed the optimization of the minus referenceChunk. I did some test .. and since the page buffer isn't reach... I don't see any performance degradation. I copied the file from my internal network to here... so maybe I didn't copy them correctly... I will test it at home.. but you can review the changes..
Is there a special reason why LoadRevisionRequest.ids is a Set and not a List?
(In reply to comment #14) > Is there a special reason why LoadRevisionRequest.ids is a Set and not a List? Because the process that will send this information need to be sure that the CDOID are unique. Mainly they will build Set instead of List. Make sense ?
(In reply to comment #14) > Is there a special reason why LoadRevisionRequest.ids is a Set and not a List? We will need also to double check something. Now it is possible to bring CDORevision on the client side that we already have. I believe the code doesn`t handle that : public synchronized boolean add(CDORevisionImpl revision) { CDORevisionImpl previousRevision = isEmpty() ? null : super.getFirst(); if (previousRevision != null && previousRevision.isCurrent()) { previousRevision.setRevised(revision.getCreated() - 1); } super.addFirst(revision); return true; } Should we remove the previousRevision ? Otherwise we could have a lot of revision of the same objects.
Created attachment 77766 [details] Load Chunk beta 4 Just a little fix!! It is working as advertise!! :-)
(In reply to comment #15) > Because the process that will send this information need to be sure that the > CDOID are unique. Mainly they will build Set instead of List. > > Make sense ? No, I don't think so. If I understand your code correctly, the revisions to be sent in addition are all *contained* in to-one references. If they are *contained* they can only be contained in *one* feature, thus their IDs are unique by definition.
Committed to CVS. It was too late to look at your recent patch because I already had applied numerous refactorings. Please review my changes and propose a new patch for your recent changes. BTW. did you execute the "CDO2 AllTests" suite to verify your changes? The suite is far from being complete, but at least they should not fail. Currently they pass all. Some more comments: 1. Where does recursion of Session.collectContainedRevisions() stop? 2. CDORevisionResolverImpl.getRevisions() didn't look into the cache. 3. Revisions seemed to be added multiple times to RevisionManager. 4. CDOStore.get() can send 2 synchronous requests now, one after the other. 5. CDOStore.loadAhead() doesn't respect the view discriminator (CDOAudit.timeStamp). NOT SOLVED SO FAR, but should be moved to CDOViewImpl. Don't fully understand the logic of CDOStore.loadAhead(), need to further dig... 6. CDORevisionResolverImpl.containsRevision() was incorrect. 7. CDORevisionResolverImpl.containsRevisionByTime() was missing. (No error due to 5.) 8. CDORevisionResolverImpl.getRevisionsByTime() was missing. (No error due to 5.)
(In reply to comment #18) > (In reply to comment #15) > > Because the process that will send this information need to be sure that the > > CDOID are unique. Mainly they will build Set instead of List. > > > > Make sense ? > No, I don't think so. > If I understand your code correctly, the revisions to be sent in addition are > all *contained* in to-one references. No you don't understand it correctly. CDOStore will grab the next ? CDOID that are not already fetch. It is onyl a collection. So it can be duplicate. if (cdoFeature.isReference()) { if (cdoFeature.isMany() && result instanceof CDOID) { if (view.getLoadRevisionCollectionChunkSize() > 1 && ! view.getSession().getRevisionManager().isRegisteredRevision((CDOID)result)) { > If they are *contained* they can only be contained in *one* feature, thus their > IDs are unique by definition. On the server side... Yes I take only the contained for now.. but with my smartreadeahead or any process but would like to fetch more data.. we take more.. So I prefer to keep my Map there as well Make sense ?
Here some others comments: loadRevision by chunk.. you can have duplicate on the server... (MultiThread) In this part of the code you could have duplicate... In this case you shouldn't send exception. You should instead not insert it. int version = revision.getVersion(); ListIterator<CDORevisionImpl> it = super.listIterator(0); while (it.hasNext()) { CDORevisionImpl r = it.next(); int v = r.getVersion(); if (v == version) { throw new IllegalStateException("Duplicate revision"); } if (v < version) { it.previous(); break; } } it.add(revision); return true; }
Fixed in I200710101638.
Move to verified as per bug 206558.
Reversioned due to graduation
CLOSING