| Summary: | Cannot load resource on a previously cleared ResourceSet | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Modeling] EMF | Reporter: | Victor Roldan Betancort <vroldanbet> | ||||||||||||||||||||||||||
| Component: | cdo.core | Assignee: | Eike Stepper <stepper> | ||||||||||||||||||||||||||
| Status: | CLOSED FIXED | QA Contact: | Eike Stepper <stepper> | ||||||||||||||||||||||||||
| Severity: | normal | ||||||||||||||||||||||||||||
| Priority: | P3 | CC: | alex.lagarde, bencamara8, stefan.schalomon, stephane.fournier | ||||||||||||||||||||||||||
| Version: | 4.1 | Flags: | stepper:
review-
stepper: review+ |
||||||||||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||||||||||
| Hardware: | PC | ||||||||||||||||||||||||||||
| OS: | Windows 7 | ||||||||||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||||||||||
| Bug Depends on: | |||||||||||||||||||||||||||||
| Bug Blocks: | 361250 | ||||||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||||||
|
Description
Victor Roldan Betancort
Created attachment 190374 [details]
Testcase v1
Testcase that demonstrates the issue
Created attachment 192849 [details]
patch v1
Modified CDOViewSetImpl to deregister CDOResource revision when removed from the ResourceSet
Improved test-case
Patch 2 contains patch 1 but patch 1 is not obsolete ;-( Created attachment 192897 [details]
Patch v2 (includes test)
Your patch fixes the problems as indicated by the test logic. However the added cleanup is not complete, as indicated by the additional test logic I've added. Please investigate...
(In reply to comment #4) > Created attachment 192897 [details] > Patch v2 (includes test) > > Your patch fixes the problems as indicated by the test logic. However the added > cleanup is not complete, as indicated by the additional test logic I've added. > Please investigate... Is this ticket planned for RC1 ? What are the missing tasks to make it resolved ? (In reply to comment #5) > Is this ticket planned for RC1? What are the missing tasks to make it resolved? RC1 is definitely past ;-( Vik, please let us know about your plans to address the issue in the new test case... Hi guys, I'm afraid i'll be busy for at least 2 weeks, and then I have 1 week for vacations. GA is at the end of June. Hopefully i'll have some time in the middle of June. Anyway, from my late efforts on Eike's recent requirements, things became a lot trickier, and makes me think we need a whole new approach. Satisfying new constraints on the test-case breaks a lot of things in the test-suite. In any case, we could submit the patch without Eike's new requirements (invalidation of detached elements after ResourceSet cleansing), which passes our current test-suite, and then tackle invalidation later. (In reply to comment #7) > [...] Hopefully i'll have some time in the middle of June. That will be nice, but definitely too late for RC4 (==GA). > Anyway, from my late efforts on Eike's recent requirements, things became a lot > trickier, and makes me think we need a whole new approach. Satisfying new > constraints on the test-case breaks a lot of things in the test-suite. > > In any case, we could submit the patch without Eike's new requirements > (invalidation of detached elements after ResourceSet cleansing), which passes > our current test-suite, and then tackle invalidation later. I have the impression the main challenge is to determine the set of EObjects that are contained in a resource without loading major parts of that resource. I think it goes beyond that. In pure EMF, if you clear Resources from a ResourceSet, those Resources does not become invalid or something like that. Now, that could be inconvenient for the CDO framework, but I don't understand why a CDOResource (and therefore, its content) that is cleared from its ResourceSet should become TRANSIENT. So my question is why is this new requirement necessary. (In reply to comment #9) > I think it goes beyond that. In pure EMF, if you clear Resources from a > ResourceSet, those Resources does not become invalid or something like that. I never said they are supposed to become INVALID. I'd rather expect them to become TRANSIENT. > Now, that could be inconvenient for the CDO framework, but I don't understand > why a CDOResource (and therefore, its content) that is cleared from its > ResourceSet should become TRANSIENT. So my question is why is this new > requirement necessary. By removing the resource from the resource set it gets outside of the managing scope of the CDOView that's associated with the resource set (more exact the CDOViewSet adapter of the ResourceSet). From the perspective of the former view the resource and all contained objects must "look" TRANSIENT (the local CDOState, not as in "removed from the repo"). (In reply to comment #10) > (In reply to comment #9) > > I think it goes beyond that. In pure EMF, if you clear Resources from a > > ResourceSet, those Resources does not become invalid or something like that. > > I never said they are supposed to become INVALID. I'd rather expect them to > become TRANSIENT. sorry, I meant TRANSIENT > > > Now, that could be inconvenient for the CDO framework, but I don't understand > > why a CDOResource (and therefore, its content) that is cleared from its > > ResourceSet should become TRANSIENT. So my question is why is this new > > requirement necessary. > > By removing the resource from the resource set it gets outside of the managing > scope of the CDOView that's associated with the resource set (more exact the > CDOViewSet adapter of the ResourceSet). From the perspective of the former view > the resource and all contained objects must "look" TRANSIENT (the local > CDOState, not as in "removed from the repo"). Yeah, I know that and I agree, but isn't that just a technical limitation of the framework? Maybe Ed could indicate us whether EMF specifies anything in this regard? (In reply to comment #11) > Yeah, I know that and I agree, but isn't that just a technical limitation of > the framework? Maybe Ed could indicate us whether EMF specifies anything in > this regard? What framework? EMF seems entirely unrelated here because EMF does not distinguish between objects that have a remote pendant and those that have not. (In reply to comment #12) > (In reply to comment #11) > > Yeah, I know that and I agree, but isn't that just a technical limitation of > > the framework? Maybe Ed could indicate us whether EMF specifies anything in > > this regard? > > What framework? EMF seems entirely unrelated here because EMF does not > distinguish between objects that have a remote pendant and those that have not. I'm talking about CDO. My point is that if users where using pure EMF APIs, maybe they would expect that Resource cleared from its ResourceSet to be working as if it still where in a ResourceSet (for instance, they can modify, create new EObjects and call save() as if it where in a ResourceSet). (In reply to comment #13) > (In reply to comment #12) > > (In reply to comment #11) > > > Yeah, I know that and I agree, but isn't that just a technical limitation of > > > the framework? Maybe Ed could indicate us whether EMF specifies anything in > > > this regard? > > > > What framework? EMF seems entirely unrelated here because EMF does not > > distinguish between objects that have a remote pendant and those that have not. > > I'm talking about CDO. My point is that if users where using pure EMF APIs, > maybe they would expect that Resource cleared from its ResourceSet to be > working as if it still where in a ResourceSet (for instance, they can modify, > create new EObjects and call save() as if it where in a ResourceSet). That would in fact require to load the entire resource before (or during removal from the resource set). I'm not sure without trying, but it could be already like that if the root objects of the resource transition to TRANSIENT. The reason is the "depopulation" of the associated revisions that turns CDOIDs into EObjects. Leaving the issue of whether the requirement is appropriate or not, I found several challenges in the "turning to transient" mechanism: - I managed to turn the CDOResource to transient. Lot of trivial test-cases started failing, probably related to the issue stated in last comment. - How to determinate which elements, under the CDOResource scope, are actually loaded in the CDOView, and should transit to TRANSIENT? (again, the recurrent containment issue). (In reply to comment #15) > - I managed to turn the CDOResource to transient. Lot of trivial test-cases > started failing, probably related to the issue stated in last comment. I can not judge that without further infos. > - How to determinate which elements, under the CDOResource scope, are actually > loaded in the CDOView, and should transit to TRANSIENT? (again, the recurrent > containment issue). I think there are two possible requirements: a) remove the resource as quickly as possible because we just want to get rid of it. --> only the already loaded objects would have to be derigistered from the view (not easily possible!) b) remove the resource because we want to continue to use it outside of the scope of cdo (and the repo). --> all objects must be loaded (easy to deregister them subsequently, but can be arbitrarily slow, depending on the resource size) Ok, back to this topic.
From the above, I infer that a CDOResource that has been cleared must then be "disconnected", and therefore become transient. From my point of view this is more a technical limitation that a real requirement: CDOResource must become transient because it will loose reachability to it s CDOViewSet (despite CDOResource itself has an attribute that references its managing CDOView).
So, accepting that, we have the two options described above. To me, b) appears more coherent, but for sure will be a performance issue (for instance, in our app I execute a clear frequently, and our models are huge).
option a) is also a performance bottleneck, and also somehow incoherent, because according to the new transient scope, consequence of CDOResource becoming "disconnected", all its content should be reachable, but we will
have a partially loaded tree instead. In such case, I can't see how useful that would be to clients, and that leads me to my initial simplistic approach, which is also not that coherent, but at least practical ("b" is practical, but infeasible performance wise; "a" is infeasible performance wise and also is impractical).
So, my simplistic approach, not totally coherent, but practical: clients clear the ResourceSet. CDOResource will simply be deregistered. Won't become transient. If the resource is cleared, its unlikely it is going to be used again. If so, clients MUST take care of that, so its client's responsibility to deal with a CDOResource that has been cleared.
There is still another way of rethink this whole issue: why must a resource that is cleared be deregistered? Easy: since that CDOResource is already instantiated and registered in the CDOView, there cannot be another instance of the same resource registered. Why is another instance becoming registered? Because the EMF's ResourceSet and ResourceFactory implementation CREATES a new instance each time its requested through ResourceSet.getResource (unless the Resource is already in the ResourceSet). So, I believe its *technically* not possible (due to EMF internals) to recover that instance from CDO instead of creating a new instance.
This is ResourceSet.getResource's internals:
public Resource getResource(URI uri, boolean loadOnDemand)
{
Map<URI, Resource> map = getURIResourceMap();
if (map != null)
{
Resource resource = map.get(uri);
if (resource != null)
{
if (loadOnDemand && !resource.isLoaded())
{
demandLoadHelper(resource);
}
return resource;
}
}
URIConverter theURIConverter = getURIConverter();
URI normalizedURI = theURIConverter.normalize(uri);
for (Resource resource : getResources())
{
if (theURIConverter.normalize(resource.getURI()).equals(normalizedURI))
{
if (loadOnDemand && !resource.isLoaded())
{
demandLoadHelper(resource);
}
if (map != null)
{
map.put(uri, resource);
}
return resource;
}
}
Resource delegatedResource = delegatedGetResource(uri, loadOnDemand);
if (delegatedResource != null)
{
if (map != null)
{
map.put(uri, delegatedResource);
}
return delegatedResource;
}
if (loadOnDemand)
{
Resource resource = demandCreateResource(uri);
if (resource == null)
{
throw new RuntimeException("Cannot create a resource for '" + uri + "'; a registered resource factory is needed");
}
demandLoadHelper(resource);
if (map != null)
{
map.put(uri, resource);
}
return resource;
}
return null;
}
As you can see, there is not way to delegate the gathering of the Resource. Neither it can be done in org.eclipse.emf.ecore.resource.impl.ResourceSetImpl.createResource(URI, String):
public Resource createResource(URI uri, String contentType)
{
Resource.Factory resourceFactory = getResourceFactoryRegistry().getFactory(uri, contentType);
if (resourceFactory != null)
{
Resource result = resourceFactory.createResource(uri);
getResources().add(result);
return result;
}
else
{
return null;
}
}
ResourceFactory implementations just receive a URI instance, nothing else. If there was more information (for instance, the ResourceSet as context to the ResourceFactory) maybe we could have a chance to return the already created Resource instance, instead of creating a new one. However, I don't think there are chances to introduce changes there.
There still may be a chance. Do you see that getURIResourceMap() in ResourceSetImpl.getResource? Yes, nasty hack, but it may be our best shot... Inject CDOResources in that map. If they become cleared, they are cached in that map. Lifecycle of the CDOResource? If the CDOView is still alive, the ResourceSet is the same... why shouldn't the CDOResource instance be the same?
So to sum up. Currently we have an exception raising due to clearing a ResourceSet and then fetching the same resource. If we try to make sense out of this, we reach to the same conclusion Eike did: Due to implementation limitations, CDOResource must become "disconnected" from the repository, to keep consistency in the framework. Technically speaking, this will become a huge bottleneck (we don't have extra containment information to determine which objects to transit to transient). So, our options are:
- 2 horribly performant solutions ("a" and "b")
- simplistic approach, simply deregister CDOResource. Not transitions to TRANSIENT. Clients should take care here if something goes wrong...
- Take advantage of the current ResourceSetImpl implementation, and use ResourceMap for something it probably wasn't meant to be used for...
Well, so far that's all I can come up... Maybe anyone else could think of more solutions...
After some experiments, I confirm its possible to create a fix to this issue by using org.eclipse.emf.ecore.resource.impl.ResourceSetImpl.uriResourceMap to reference the removed Resource instances. After notification, CDOViewSetImpl can add this to the map. This way we don't need to deregister any CDOResource (and therefore neither its content). Next time user calls org.eclipse.emf.ecore.resource.ResourceSet.getResource(URI, boolean), the cached instanced will be retrieved, and ResourceSet logic to create a new instance won't be executed. However, there are some test-cases of our test-suite that start failing... so some extra rework is needed... Created attachment 198270 [details]
patch v3
new patch that fixes the issue in a different manner, avoiding deregistration,
and therefore objects (resource and content) do not need to transit to a different state.
Registered resource is cached instead in ResourceSet. This way we avoid creating new resource instances upon retrieval through ResourceSet.getResource().
The patch includes the test-case
All tests pass
So the current situation is: - patch v2 fixes the issue, but needs rework to achieve consistency of the object graph (all loaded objects under containment must be deregistered) - patch v3 fixes the issue without the need of deregistration (so object graph state remains consistent), but its implemented taking advantage (a.k.a hacky tricks) of the implementation of ResourceSetImpl So patch v2 shouldn't be marked as obsolete... Eike, I'll wait for your review and comments. Created attachment 199003 [details]
Alternative approach v1
Just some minor changes and a "patch rename".
Vik, I'm not very experienced with these resource maps. For example, are they normally holding on to the resources they contain? Can we discuss that next week?
Created attachment 199049 [details]
Alternative approach v2
- URIResourceMap clean-up upon CDOView deactivation
- improved original test-case
- added granularity demonstration
I'd like to step back and look at the scenario from a client's perspective again. IIRC the two major use cases are these calls: 1) resourceSet.getResources().remove(x) 2) resourceSet.getResources().clear() Internally case 2) can be seen as a series of cases 1). I basically have two questions: a) Can we exclude that the removed resource(s) is (are) not used anymore from the time of the removal? b) What happens if the removed resource(s) is (are) dirty? > a) Can we exclude that the removed resource(s) is (are) not used anymore from > the time of the removal? I'd say yes, but in pure EMF, nothing stops your from referencing a ResourceImpl and using it, even when it has been cleared from its ResourceSet. > b) What happens if the removed resource(s) is (are) dirty? I'd say the most convenient would to be to discard changes. Not sure how that would affect to other interrelated Resources... (i.e. cross references between EObjects in different Resources). (In reply to comment #24) > > a) Can we exclude that the removed resource(s) is (are) not used anymore from > > the time of the removal? > > I'd say yes, but in pure EMF, nothing stops your from referencing a > ResourceImpl and using it, even when it has been cleared from its ResourceSet. We could transition it to INVALID to prevent that from happening. > > b) What happens if the removed resource(s) is (are) dirty? > > I'd say the most convenient would to be to discard changes. Not sure how that > would affect to other interrelated Resources... (i.e. cross references between > EObjects in different Resources). If you were to discard the deltas of the resource object then you'd certainly need to discard deltas of all contained objects (see performance problem) plus adjust a bunch of referencing objects. I'd prefer the strategy to disallow removal of dirty resources from the resourceset. > If you were to discard the deltas of the resource object then you'd certainly
> need to discard deltas of all contained objects (see performance problem) plus
> adjust a bunch of referencing objects.
>
> I'd prefer the strategy to disallow removal of dirty resources from the
> resourceset.
The problem of that approach is that pure EMF API clients won't expect such restrictions upon removal of dirty resources, and even if they were aware of the underlying implementation (CDOResource), they would have to use CDO APIs to perform rollback before clearing.
After a discussion through Skype with Eike, we managed to reach a consensus: - CLEAN Resources would be removed and transited to INVALID. Nothing would prevent users to access they contents (contents won't be invalidated). - DIRTY Resources contained in the ResourceSet: if the resources to remove is CLEAN, the same as above would happen. If the Resource to remove is DIRTY, an exception will raise. If we are clearing up the ResourceSet, and some resources are DIRTY, rollback will happen, CLEAN Resources will transit to INVALID, and DIRTY resources will transit to INVALID_CONFLICT. We can't rollback a single Resource in CDO, because the transaction scope is a CDOTransaction, which may contain many interrelated resources, which is different from EMF, where the transaction scope is a single Resource. All that will be managed in CDOViewSetImpl. Alternative approach v2 shall be discarded. This is a variant of initial approach. Created attachment 199331 [details]
patch v3
Found some complexities during the creation the patch.
The main problem is that it is difficult to know if a Resource is dirty if Resource.isTrackingModification() == false. If it is the case, the resource's content may modified, but it still will return resource.isModified() == false.
So the current solution does the following:
- if the resource's view is clean (meaning the resourceset has no dirty resources), then not problem, resource transits to INVALID and it is removed from ResourceSet
- if resource contains dirtyResources, but we have called ResourceSet.clear(), then all CDOTransactions in the CDOViewSet are rolledback, and all resources contained by resourceSet transit to INVALID
- the special case is when we have a mixed situation:
* Case 1: We are removing a resource that is not dirty, but is contained in a ResourceSet with dirty Resources, or
* Case 2: We are removing a resource that is dirty, but there are more dirty resources, or
* Case 3: We are removing a resource that is the only dirty resource of the ResourceSet, and there may be more resources
in the current patch, all these situations throw a CDOException. Our proposal ideally wanted to:
- Case 1. Transit resource to INVALID
- Case 2: CDOException is thrown
- Case 3: Transaction is rolledback, and Resource transits to INVALID
As I said, its difficult to achieve those ideal behaviours without the information of Resource.isModified() being invalid
The whole test-suite passes, meaning that those special cases were a CDOException rises are not happening,
and that leads me to think its not a common use-case scenario.
Created attachment 199337 [details]
patch v3
now, this is the real patch v3 :P
Created attachment 199348 [details]
Patch v4
Created attachment 199349 [details]
Patch v5
Created attachment 199352 [details]
Patch v6
Safer now in case of errors.
Created attachment 199355 [details]
Patch v7
Jesus, I hate this bugzilla. All places *in* CDO where resources are removed from their resource set had to be changed to bypass the new notification handling. In particular resource.delete() and tx.rollback(). Two test cases had to be removed because they were based on assumptions that are now obsolete. All tests seem to succeed now.
Committed revision 8638 Closing. |