Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 345049

Summary: Optimize CDORevisionCache implementations
Product: [Modeling] EMF Reporter: Eike Stepper <stepper>
Component: cdo.coreAssignee: Eike Stepper <stepper>
Status: CLOSED FIXED QA Contact: Eike Stepper <stepper>
Severity: enhancement    
Priority: P3 CC: caspar_d, vroldanbet
Version: 4.0Flags: stepper: review? (vroldanbet)
stepper: review? (caspar_d)
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard:
Attachments:
Description Flags
Patch v1
none
Patch v2
none
Patch v3 none

Description Eike Stepper CLA 2011-05-07 03:45:07 EDT
Non-auditing: CDOID --> CDORevision
Auditing: CDOID --> RevisionList
Branching: CDOIDAndVersion --> RevisionList

getObjectType() can be a simple cache lookup for non-branching (because the cache key is just the CDOID).
Comment 1 Eike Stepper CLA 2011-05-07 03:46:09 EDT
Created attachment 195000 [details]
Patch v1
Comment 2 Eike Stepper CLA 2011-05-07 03:47:16 EDT
Vik, please report how this patch impacts time and space efficiency.
Comment 3 Eike Stepper CLA 2011-05-07 03:48:55 EDT
Caspar, there is one test failure in non-auditing: CDOTransactionImpl.removeCrossReferences(). The new non-auditing cache returns null for a clean revision. Do you have an idea onhow to cope with this situation? Or do you think it's simply a bug in the new cache?
Comment 4 Caspar D. CLA 2011-05-09 00:42:39 EDT
Patch can't be applied. It contains a hunk that changes (not: creates)
CDORevisionCacheAuditing.java, but this file doesn't exist yet.

Please post a patch including the creation of that file, and I'll take
a look.
Comment 5 Victor Roldan Betancort CLA 2011-05-09 06:02:18 EDT
Sure, I'll give it a look. 

I performed some tests with a hacked CDORevisionCacheImpl without the typeMap, and it was slightly faster and considerably more space efficient (I could run our app with 256mb heap without OOME. I'll check the patch and compare :)
Comment 6 Victor Roldan Betancort CLA 2011-05-09 06:15:09 EDT
Eike,

correct me if wrong, but... aren't InternalCDORevisions in CDORevisionCacheNonAuditing being hard-referenced? Is there something I'm missing here?

Furthermore, there are some version and branch checks in that implementation. Why is that necessary in a NonAudit scenario?
Comment 7 Eike Stepper CLA 2011-05-10 02:06:10 EDT
(In reply to comment #4)
> Patch can't be applied. It contains a hunk that changes (not: creates)
> CDORevisionCacheAuditing.java, but this file doesn't exist yet.

No clue how I got into this state. Trying to fix...
Comment 8 Eike Stepper CLA 2011-05-10 02:07:24 EDT
(In reply to comment #6)
> correct me if wrong, but... aren't InternalCDORevisions in
> CDORevisionCacheNonAuditing being hard-referenced? 

*Strongly* referenced, yeah. My bad, I'll come up with a new patch today...
Comment 9 Eike Stepper CLA 2011-05-10 04:01:10 EDT
Created attachment 195180 [details]
Patch v2

Caspar, this patch should address the SVN hunk issue.

Vik, the non-auditing issue is not yet addressed. I'll look at that a little later today...
Comment 10 Caspar D. CLA 2011-05-10 05:04:25 EDT
Eike,

What goes wrong here is that the the tx expects it can ask the revManager
for a specific version. But your non-auditing cache implicitly always
evicts any other version in favor of the latest.

I'm not sure if non-auditing (which to me means that the *repo* will not
retain anything other than the latest version) necessarily implies that the
client-side cache refuse to hold any historic versions...

But regardless of your/my opinion on that, the whole issue can be avoided 
by fetching the clean revision from the tx's own cleanRevisions list,
instead of from the revManager, as follows:

  InternalCDORevision cleanRevision = cleanRevisions.get(referencer);

This will make the assertion pass always.
Comment 11 Eike Stepper CLA 2011-05-12 02:47:57 EDT
Created attachment 195461 [details]
Patch v3

I've fixed the bug in CDOTransactionImpl.removeCrossReferences() as Caspar suggested.

And the NonAuditing cache now uses string or soft references according to disableGC.

All tests are passing. Going to commit...

Vik, please have a look again and tell us how the footprint is impacted.
Comment 12 Eike Stepper CLA 2011-05-12 02:48:35 EDT
Committed revision 7686:
- trunk/plugins/org.eclipse.emf.cdo
- trunk/plugins/org.eclipse.emf.cdo.common
- trunk/plugins/org.eclipse.emf.cdo.net4j
- trunk/plugins/org.eclipse.emf.cdo.server
- trunk/plugins/org.eclipse.emf.cdo.tests
Comment 13 Eike Stepper CLA 2011-05-12 02:49:14 EDT
Resolved
Comment 14 Victor Roldan Betancort CLA 2011-05-12 12:58:24 EDT
Hi Eike,

it appears to me (using VisualVM) that soft references are not being evicted. Our tests failed for simple 40 minute import test with 256 mb heap. It did succeed flawlessly with the previous CDORevisionCacheImpl hacked with your email suggestions.

I'll keep making some tests to determine what's going on.
Comment 15 Victor Roldan Betancort CLA 2011-05-12 13:05:38 EDT
Correction,

what is not being evicted are AbstractCDORevisionCache.CacheSoftReference instances. I revisions are being removed correctly, but not CacheSoftReference instances, which according to VisualVM, are being strongly referenced by CDORevisionCacheNonAuditing.revisions HashMap.
Comment 16 Victor Roldan Betancort CLA 2011-05-12 13:20:33 EDT
The problem relies on:

org.eclipse.emf.cdo.internal.common.revision.CDORevisionCacheNonAuditing.removeRevision(CDOID, CDOBranchVersion)

which has following condition:

if (revision != null && ObjectUtil.equals(revision.getBranch(), branchVersion.getBranch()) && revision.getVersion() == branchVersion.getVersion())

if the revision has been already evicted, ref.get() will be null, and the "then block" code won't be executed, which means that the entry on "revisions" hashMap won't be removed.

I suggest we extract revisions.remove(id) from the "then block" right above, so it is executed anyway.
Comment 17 Victor Roldan Betancort CLA 2011-05-12 13:22:55 EDT
reopening for the identified bug on CDORevisionCacheNonAuditing
Comment 18 Victor Roldan Betancort CLA 2011-05-12 16:37:05 EDT
I confirm that, after my proposed fix, everything works as expected :)
Computationally it behaves as good as the previous cache version, but in terms of memory consumption, I'd say the cache consumes 50% less, which gives 50% more room for revisions.

Thanks for the excellent enhacement, Eike :)
Comment 19 Eike Stepper CLA 2011-05-13 06:03:40 EDT
(In reply to comment #16)
> I suggest we extract revisions.remove(id) from the "then block" right above, so
> it is executed anyway.

Isn't that what I already see in my code?

      Reference<InternalCDORevision> ref = revisions.get(id);
      if (ref != null)
      {
        InternalCDORevision revision = ref.get();
        if (revision != null && ObjectUtil.equals(revision.getBranch(), branchVersion.getBranch())
            && revision.getVersion() == branchVersion.getVersion())
        {
          revisions.remove(id);
          return revision;
        }
      }
      else
      {
        revisions.remove(id);
      }
Comment 20 Victor Roldan Betancort CLA 2011-05-13 06:13:46 EDT
No, revisions.remove(id) should be right under ref.get() and above the if clause.
Comment 21 Victor Roldan Betancort CLA 2011-05-13 06:16:48 EDT
Oh, I see you point. Yeah, but that else block is only executed if ref is null (the outer if clause). In the case of the inner If clause, there is no else block, so revision is not removed from the map in case revision == null.
Comment 22 Eike Stepper CLA 2011-05-13 06:25:30 EDT
(In reply to comment #21)
> Oh, I see you point. Yeah, but that else block is only executed if ref is null
> (the outer if clause). In the case of the inner If clause, there is no else
> block, so revision is not removed from the map in case revision == null.

But that is the purpose. It's supposed to be a safety check so that only the version is evicted that is the one that matches the key in the CacheSoftReference. 

Am I missing something? Can you paste your full removeRevision() method?
Comment 23 Victor Roldan Betancort CLA 2011-05-13 06:27:26 EDT
public InternalCDORevision removeRevision(CDOID id, CDOBranchVersion branchVersion)
  {
    synchronized (revisions)
    {
      Reference<InternalCDORevision> ref = revisions.get(id);
      if (ref != null)
      {
        InternalCDORevision revision = ref.get();
        revisions.remove(id);
        if (revision != null && ObjectUtil.equals(revision.getBranch(), branchVersion.getBranch())
            && revision.getVersion() == branchVersion.getVersion())
        {
          return revision;
        }
      }
      else
      {
        revisions.remove(id);
      }
    }

    return null;
  }
Comment 24 Victor Roldan Betancort CLA 2011-05-13 06:34:57 EDT
Lets see if I can explain better.

This method is called "removeRevision". It means that a revision is going to be removed from the cache. However, this method doesn't seem to consider that the soft reference CDORevision could be already GC'ed. That implies ref.get will return null, but ref won't be null, because the actual CacheSoftReference is still strongly referenced by the map.

That's were the ReferenecQueue mechanism kicks in (its polled every 60 seconds) and checks whether there are CacheSoftReference for which the referenced revision has been already evicted, with the purpose of removing it from the map.

Considering such scenario, removeRevision will be invoked by org.eclipse.emf.cdo.internal.common.revision.AbstractCDORevisionCache.work(Reference<? extends InternalCDORevision>).

So lets walktrhough the code: first we execute revision.get(id). That returns an instance of CacheSoftReference. so ref != null, and it executes the block inside. However reg.get == null, because the revision has been evicted. The next if block does not succeed, so "revisions.remove(id)" is skipped and finally null is returned.

That leaves a CacheSoftReference instance in the map, referencing null, and occupying useful heap memory.

Is this more clear now?
Comment 25 Eike Stepper CLA 2011-05-13 06:35:31 EDT
That doesn't seem right to me. In your case we wouldn't need the whole if statement (the return value is rather unimportant here).

The purpose of the if was to protect against this problem:

The actual eviction happens in a background thread, i.e. AbstractCDORevisionCache.work() --> CDORevisionCacheNonAuditing.removeRevision()

The "main" thread could add a newer revision to the cache while the eviction thread still thinks it has to evict the older one. 

I do agree, though, that the branch check does not make much sense in non-auditing.

I also understand that you seem to have an actual problem with the current implementation and maybe that your suggested solution seems to solve it. I just can't understand it...
Comment 26 Eike Stepper CLA 2011-05-13 06:36:17 EDT
Mid-air collision! My comment #25 replies to your comment #23
Comment 27 Victor Roldan Betancort CLA 2011-05-13 06:38:22 EDT
Eike, im sorry but I still don't understand what is a useless CacheSoftReference (referencing null) doing in memory, nor I understand it's purpose.
Comment 28 Eike Stepper CLA 2011-05-13 06:41:45 EDT
I got it!!! What about this:

  public InternalCDORevision removeRevision(CDOID id, CDOBranchVersion branchVersion)
  {
    synchronized (revisions)
    {
      Reference<InternalCDORevision> ref = revisions.get(id);
      if (ref != null)
      {
        InternalCDORevision revision = ref.get();
        if (revision != null)
        {
          if (revision.getVersion() == branchVersion.getVersion())
          {
            revisions.remove(id);
            return revision;
          }
        }
        else
        {
          revisions.remove(id);
        }
      }
    }

    return null;
  }
Comment 29 Victor Roldan Betancort CLA 2011-05-13 06:44:21 EDT
ah, finally :D yeah, thats actually better than what I have. Less branch checks (there are no branches in no audit), better performant cache. I'll test the code. 

Is still the version check needed?

I just need 40 minutes for the smallest stress test, ok? I'll ping you back when checked it
Comment 30 Eike Stepper CLA 2011-05-13 06:44:44 EDT
(In reply to comment #27)
> Eike, im sorry but I still don't understand what is a useless
> CacheSoftReference (referencing null) doing in memory, nor I understand it's
> purpose.

The most impoirtant aspect about CacheSoftReference is that it extends SoftReference and its ctor registers it with a ReferenceQueue. And then it adds key information so that the queue worker can know what the key of the already evicted revision is (used for map cleanup):

private static final class CacheSoftReference extends SoftReference<InternalCDORevision> implements CDORevisionKey
Comment 31 Eike Stepper CLA 2011-05-13 06:45:29 EDT
(In reply to comment #29)
> ah, finally :D yeah, thats actually better than what I have. Less branch checks
> (there are no branches in no audit), better performant cache. I'll test the
> code. 
> 
> Is still the version check needed?
> 
> I just need 40 minutes for the smallest stress test, ok? I'll ping you back
> when checked it

Sure! I'll commit it anyway as soon as my tests have finished and passed ;-)
Comment 32 Eike Stepper CLA 2011-05-13 06:48:27 EDT
Committed revision 7697
Comment 33 Eike Stepper CLA 2011-05-13 14:04:06 EDT
Added a !isEmpty() check in CDORevisionCacheAuditing.getObjectType() to prevent NoSuchElementExceptions:

      if (revisionList != null && !revisionList.isEmpty())
      {
        Reference<InternalCDORevision> ref = revisionList.getFirst();
        ...

I fear that, in addition, we will have to add general synchronization between 
a) the eviction thread and 
b) the group of possible app threads
Comment 34 Eike Stepper CLA 2011-05-13 14:06:36 EDT
Committed revision 7705:
- trunk/plugins/org.eclipse.emf.cdo.common
Comment 35 Eike Stepper CLA 2011-05-13 14:07:12 EDT
> I fear that, in addition, we will have to add general synchronization between
> a) the eviction thread and
> b) the group of possible app threads

That was nonsense. Everything is properly synchronized already!
Comment 36 Eike Stepper CLA 2011-05-14 04:56:45 EDT
Committed revision 7708
Comment 37 Eike Stepper CLA 2011-06-23 03:41:46 EDT
Available in R20110608-1407