Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 340709 - Adapter notification from CDOViewImpl.doInvalidate() can cause deadlock in rare situations
Summary: Adapter notification from CDOViewImpl.doInvalidate() can cause deadlock in ra...
Status: CLOSED WORKSFORME
Alias: None
Product: EMF
Classification: Modeling
Component: cdo.core (show other bugs)
Version: 4.2   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Eike Stepper CLA
QA Contact: Eike Stepper CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 323788 335080
  Show dependency tree
 
Reported: 2011-03-22 17:48 EDT by Eike Stepper CLA
Modified: 2020-05-29 02:17 EDT (History)
2 users (show)

See Also:


Attachments
Patch v1 (4.96 KB, patch)
2011-03-22 17:51 EDT, Eike Stepper CLA
no flags Details | Diff
Patch v2 (6.17 KB, patch)
2011-04-04 00:46 EDT, Eike Stepper CLA
no flags Details | Diff
Test v1 (3.85 KB, patch)
2011-04-11 15:41 EDT, Martin Fluegge CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eike Stepper CLA 2011-03-22 17:48:11 EDT
CDOViewImpl.doInvalidate() is a synchronized method as of bug 335080. It can send several events to listeners and notify EMF adapters while holding the monitor lock. The following scenario results in a deadlock:

1) A listener passes control to a separate thread and waits for its termination (Display.syncExec is the usual suspect)
2) That thread calls back to the CDOView and hits another synchronized block.
Comment 1 Eike Stepper CLA 2011-03-22 17:50:17 EDT
About to attach a patch that does all notifications asynchronously if doInvalidate() has already been called asynchronously...
Comment 2 Eike Stepper CLA 2011-03-22 17:51:41 EDT
Created attachment 191713 [details]
Patch v1

With this patch it seems that only 2 test cases fail. Probably because the test logic has to be rewritten to expect the notifications a little later...
Comment 3 Eike Stepper CLA 2011-04-04 00:46:26 EDT
Created attachment 192433 [details]
Patch v2

Issue with conflict resolvers remains!

The invalidation events that trigger the conflict resolution arrive too late in the game...
Comment 4 Martin Fluegge CLA 2011-04-11 15:41:31 EDT
Created attachment 192968 [details]
Test v1

After a while I managed to create a test cases which reproduces the dead lock. It is only a starting point for the different synchronized places. I also haven't tried your patch yet to see whether the test passed with it applied.
Comment 5 Eike Stepper CLA 2012-06-05 07:28:49 EDT
Moving all open bug reports to 4.1 because the release is very near and it's hghly unlikely that there will be spare time to address 4.0 problems.

Please make sure that your patches can be applied against the master branch and that your problem is not already fixed there!!!
Comment 6 Eike Stepper CLA 2012-08-14 22:50:43 EDT
Moving all open issues to 4.2. Open bugs can be ported to 4.1 maintenance after they've been fixed in master.
Comment 7 Eike Stepper CLA 2012-10-31 14:35:47 EDT
The following code at the beginning of the attached test case makes the test totally useless:

    // In case we discover a deadlock we will throw a TimeOutException
    new PollingTimeOuter()
    {
      @Override
      protected boolean successful()
      {
        return false;
      }
    }.assertNoTimeOut(3000);

Without it the test passes, as expected.
Comment 8 Eike Stepper CLA 2012-10-31 14:36:28 EDT
commit 8e8791fe60f3b7db907c4ee0d9b74080ec5765dc
Comment 9 Eike Stepper CLA 2013-06-27 04:00:42 EDT
Closing
Comment 10 Eike Stepper CLA 2020-05-29 02:14:50 EDT
The test case that was added through this bugzilla is harmful:

package org.eclipse.emf.cdo.tests.bugzilla;

import org.eclipse.emf.cdo.eresource.CDOResource;
import org.eclipse.emf.cdo.session.CDOSession;
import org.eclipse.emf.cdo.tests.AbstractCDOTest;
import org.eclipse.emf.cdo.tests.model1.Category;
import org.eclipse.emf.cdo.transaction.CDOTransaction;
import org.eclipse.emf.cdo.view.CDOViewInvalidationEvent;

import org.eclipse.net4j.util.event.IEvent;
import org.eclipse.net4j.util.event.IListener;

import java.util.concurrent.CountDownLatch;

/**
 * @author Martin Fluegge
 */
public class Bugzilla_340709_Test extends AbstractCDOTest
{
  private transient CountDownLatch latch = new CountDownLatch(1);

  public void test() throws Exception
  {
    CDOSession session = openSession();

    {
      CDOTransaction transaction = session.openTransaction();
      addTransactionListener(transaction);

      CDOResource resource = transaction.createResource(getResourcePath("test"));

      Category category = getModel1Factory().createCategory();
      category.setName("name1");

      resource.getContents().add(category);
      transaction.commit();
    }

    {
      CDOTransaction transaction = session.openTransaction();
      CDOResource resource = transaction.getResource(getResourcePath("test"));

      Category category = (Category)resource.getContents().get(0);
      category.setName("name2");

      transaction.commit();
    }
  }

  private void addTransactionListener(final CDOTransaction transaction)
  {
    transaction.addListener(new IListener()
    {
      @Override
      public void notifyEvent(IEvent event)
      {
        System.err.println(event);
        if (event instanceof CDOViewInvalidationEvent)
        {
          handleNotify(transaction);

          await(latch);
        }
      }

      private void handleNotify(final CDOTransaction transaction)
      {
        Thread handleNotifyThread = new Thread(new Runnable()
        {
          @Override
          public void run()
          {
            msg("DEAD");
            transaction.getRootResource();

            msg("LOCK");
            latch.countDown();
          }
        });

        handleNotifyThread.start();
      }
    });
  }
}

The started thread times out after 15 seconds, long after the test case has finished executing. That produces stack traces that are bogus in the scope of later test cases.

As commit 8e8791fe60f3b7db907c4ee0d9b74080ec5765dc has not added an actual fix for the stated problem (and more generally the stated problem is a known/accepted limitation) I'm going to remove the dangerous test case...
Comment 11 Eike Stepper CLA 2020-05-29 02:17:19 EDT
The removal is committed:
https://git.eclipse.org/c/cdo/cdo.git/commit/?id=b312ebf9f3c0dec37e246d77c80e6a7807490ede