Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 368570 - Incorrect schema invalid error when schema validating in multiple threads.
Summary: Incorrect schema invalid error when schema validating in multiple threads.
Status: CLOSED FIXED
Alias: None
Product: EMF
Classification: Modeling
Component: Core (show other bugs)
Version: 2.4.0   Edit
Hardware: PC Linux
: P3 major (vote)
Target Milestone: ---   Edit
Assignee: Ed Merks CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-13 12:20 EST by Daniel Dracott CLA
Modified: 2012-03-31 14:24 EDT (History)
0 users

See Also:


Attachments
suggested patch (788 bytes, application/octet-stream)
2012-01-13 12:20 EST, Daniel Dracott CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Dracott CLA 2012-01-13 12:20:15 EST
Created attachment 209471 [details]
suggested patch

We are experiencing problems when schema validating multiple instances of an EMF model in different threads concurrently. Specifically, we are getting incorrect errors on pattern validation.

We have identified that the problem lies with the internal regular expression matching code (org.eclipse.emf.ecore.xml.type.internal.RegEx), specifically the RangeToken#match(...) method. This method conditionally calls createMap() to initialize the internal map object, but it is possible for two threads to both attempt to initialize the map simultaneously in this way. When this happens, the accesses of map can interleave in a way that leads to the map not being correctly populated.

Our proposed solution is to synchronize on the RangeToken when checking if the map is initialized in RangeToken#match as shown in the attached patch.
Comment 1 Ed Merks CLA 2012-01-16 06:31:28 EST
Thanks for the bug report and the patch!

I see another possible way to address the problem.  I'm looking for a way that avoids doing any synchronization in the match method, because that hurts performance (not sure how much), but given its done before the null test it incurs a performance penalty for each and every call. Often in cases like this in other places in the EMF runtime, I simply allow the race condition but ensure that it doesn't matter who wins the race.  I.e., there's no harm in computing the map multiple times because each time produces the exact same result.  It might even be better performance even in the case of a race, because avoids a thread having to block ever.   So here's an approach along that line:

    private void createMap() {
        int asize = MAPSIZE/32;                 // 32 is the number of bits in `int'.
        int [] map = new int[asize];
        int nonMapIndex = this.ranges.length;
        // for (int i = 0;  i < asize;  i ++)  this.map[i] = 0;
        for (int i = 0;  i < this.ranges.length;  i += 2) {
            int s = this.ranges[i];
            int e = this.ranges[i+1];
            if (s < MAPSIZE) {
                for (int j = s;  j <= e && j < MAPSIZE;  j ++)
                    map[j/32] |= 1<<(j&0x1f); // s&0x1f : 0-31
            } else {
                nonMapIndex = i;
                break;
            }
            if (e >= MAPSIZE) {
                nonMapIndex = i;
                break;
            }
        }
        this.nonMapIndex = nonMapIndex;
        this.map = map;
        //for (int i = 0;  i < asize;  i ++)  System.err.println("Map: "+Integer.toString(this.map[i], 16));
    }

So we compute the map (and nonMapIndex) in local variables and only change the state shared across threads (the fields) at the very end.   Note that we make sure to assign the this.nonMapIndex first so it won't be possible for another thread to see a non-null map without the nonMapIndex already being properly set.  If multiple threads get into this method at the same time, no problem because each will compute the same result and any subsequent assignments will assign the same nonMapIndex value, i.e., structurally equal map arrays, so another thread won't be affected by which array they use.

Is this something you could test?

Thanks again for reporting the problem and especially for providing a patch too!
Comment 2 Daniel Dracott CLA 2012-01-18 10:30:21 EST
This proposed fix does not exhibit the bug in our current test environment; however, the current revision of the Java Memory Model in the Java Language Specification does *not* guarantee that the assumptions made in this implementation are safe.

Specifically, it is not guaranteed that the ordering of the initialization of map and the assigning of the map to the field will appear as happening in the same order to other Threads in the absence of proper synchronization: if there is no synchronization between the Thread that initialized this.map and a Thread that is reading it, a JVM could legitimately allow that other Thread to see the assignment to this.map without seeing it being completely initialized.

In the the same way, that thread could see the value assigned to this.map without the correct value of this.nonMapIndex.

See http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html for details on why it is not possible to reliably do lazy initialization for use by multiple threads without synchronizing when checking if lazy initialization has occurred.
Comment 3 Ed Merks CLA 2012-01-18 11:52:01 EST
There are other places in the EMF runtime that make similar assumptions so if JIT compilers made a habit of drastic reordering, we have other problems...

a few problems...

In any case, suppose we're really paranoid (which I'm not), so we did the following.

Change the call to create the map like this:

if (this.map == null)  this.map = this.createMap();

I.e., the create method returns the map.  And we make the creation method synchronized

    private static final int MAPSIZE = 256;
    private synchronized int [] createMap() {
        int asize = MAPSIZE/32;                 // 32 is the number of bits in `int'.
        int [] map = new int[asize];
        int nonMapIndex = this.ranges.length;
        for (int i = 0;  i < asize;  i ++)  map[i] = 0;
        for (int i = 0;  i < this.ranges.length;  i += 2) {
            int s = this.ranges[i];
            int e = this.ranges[i+1];
            if (s < MAPSIZE) {
                for (int j = s;  j <= e && j < MAPSIZE;  j ++)
                    map[j/32] |= 1<<(j&0x1f); // s&0x1f : 0-31
            } else {
                nonMapIndex = i;
                break;
            }
            if (e >= MAPSIZE) {
                nonMapIndex = i;
                break;
            }
        }
        this.nonMapIndex = nonMapIndex;
        return map;
        //for (int i = 0;  i < asize;  i ++)  System.err.println("Map: "+Integer.toString(this.map[i], 16));
    }

Note we only assign the shared nonMapIndex a correct value; the order of when or where it's done doesn't matter.  The shared map value can only be assigned after the synchronized method returns.  I don't think it's possible with this approach for another thread to see the map in a non-initialized state (t's only assigned after the synchronization) or to see the nonMapIndex assigned before the map is assigned, (again because the assignment happens after the synchronization).

It may be convoluted, but avoiding making this method expensive seems important.
Comment 4 Ed Merks CLA 2012-01-19 04:41:18 EST
If I read that interesting article correctly, my previous proposal would be safe for Java 5.0 and later if I mark the fields volatile even without any synchronization, right?  I.e., if they're volatile their writes can't be reordered with respect to the initialization code.
Comment 5 Daniel Dracott CLA 2012-01-26 05:40:11 EST
The method in comment 3 is not guaranteed to prevent other threads seeing the original default-initialization values in map; while there is synchronization around the writes into map the assignment of the return value to this.map has no synchronization, nor do the later reads from the field this.map and from the array itself. As such, there is no guarantee that a different thread could not see cached default-initialization values from the map, and thus get the wrong results.

The only way to guarantee that a Thread other than the one that initialized this.map that sees this.map as non-null also sees the array referenced by this.map as being fully initialized is for there to be a synchronizes-with edge from the initialization of this.map to the use; this requires that either
a) the reading Thread takes a lock prior to reading this.map that the initializing thread released after initializing the array,
or b) the reading Thread reads a volatile variable (possibly this.map itself) that the initializing thread wrote to after initializing the array.

Thus, making this.map volatile will ensure sufficient synchronization for Java 5.0 and later, though it may have an additional performance impact around all other accesses to the field this.map.
Comment 6 Ed Merks CLA 2012-01-26 06:22:28 EST
The fix is committed to git for 2.8. 

Thanks for the interesting insights!
Comment 7 Ed Merks CLA 2012-01-26 07:13:23 EST
Changing resolution...
Comment 8 Ed Merks CLA 2012-03-31 14:24:17 EDT
The changes are available in the M6 build.