Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 340094

Summary: MalformedByteSequenceException running standalone
Product: [Modeling] Acceleo Reporter: Ed Willink <ed>
Component: CoreAssignee: Project Inbox <acceleo-inbox>
Status: CLOSED FIXED QA Contact:
Severity: major    
Priority: P3 CC: laurent.goubet, mariot.chauvin, milesparker, rainer, stephane.begaudeau
Version: 3.0.0   
Target Milestone: ---   
Hardware: PC   
OS: Windows Vista   
Whiteboard:
Bug Depends on: 340127    
Bug Blocks:    

Description Ed Willink CLA 2011-03-15 17:01:27 EDT
With M6, after overloading isInWorkspace to return false and so workaround Bug 340091, I get:

java.lang.RuntimeException: Problems running workflow GeneratePivotModel: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
	at org.eclipse.emf.mwe2.launch.runtime.Mwe2Runner.run(Mwe2Runner.java:96)
	at org.eclipse.emf.mwe2.launch.runtime.Mwe2Runner.run(Mwe2Runner.java:73)
	at org.eclipse.emf.mwe2.launch.runtime.Mwe2Runner.run(Mwe2Runner.java:64)
	at org.eclipse.emf.mwe2.launch.runtime.Mwe2Runner.run(Mwe2Runner.java:55)
	at org.eclipse.emf.mwe2.launch.runtime.Mwe2Launcher.run(Mwe2Launcher.java:74)
	at org.eclipse.emf.mwe2.launch.runtime.Mwe2Launcher.main(Mwe2Launcher.java:35)
Caused by: org.eclipse.emf.ecore.resource.impl.ResourceSetImpl$1DiagnosticWrappedException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
	at org.eclipse.emf.ecore.resource.impl.ResourceSetImpl.handleDemandLoadException(ResourceSetImpl.java:315)
	at org.eclipse.emf.ecore.resource.impl.ResourceSetImpl.demandLoadHelper(ResourceSetImpl.java:274)
	at org.eclipse.emf.ecore.resource.impl.ResourceSetImpl.getResource(ResourceSetImpl.java:397)
	at org.eclipse.acceleo.common.utils.ModelUtils.load(ModelUtils.java:330)
	at org.eclipse.acceleo.engine.service.AbstractAcceleoGenerator.initialize(AbstractAcceleoGenerator.java:317)
	at org.eclipse.ocl.examples.build.acceleo.GeneratePivotVisitors.<init>(GeneratePivotVisitors.java:109)
	at org.eclipse.ocl.examples.build.acceleo.MyGeneratePivotVisitors.<init>(MyGeneratePivotVisitors.java:40)
	at org.eclipse.ocl.examples.build.utilities.PivotVisitorCodeGenerator.invokeInternal(PivotVisitorCodeGenerator.java:85)
	at org.eclipse.emf.mwe.core.lib.AbstractWorkflowComponent.invoke(AbstractWorkflowComponent.java:126)
	at org.eclipse.emf.mwe.core.lib.Mwe2Bridge.invoke(Mwe2Bridge.java:34)
	at org.eclipse.emf.mwe.core.lib.AbstractWorkflowComponent.invoke(AbstractWorkflowComponent.java:201)
	at org.eclipse.emf.mwe2.runtime.workflow.AbstractCompositeWorkflowComponent.invoke(AbstractCompositeWorkflowComponent.java:31)
	at org.eclipse.emf.mwe2.runtime.workflow.AbstractCompositeWorkflowComponent.invoke(AbstractCompositeWorkflowComponent.java:31)
	at org.eclipse.emf.mwe2.runtime.workflow.Workflow.run(Workflow.java:19)
	at org.eclipse.emf.mwe2.launch.runtime.Mwe2Runner.run(Mwe2Runner.java:94)
	... 5 more
Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:713)
	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:586)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1740)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipString(XMLEntityScanner.java:1437)
	at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:191)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:798)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
	at javax.xml.parsers.SAXParser.parse(SAXParser.java:375)
	at org.eclipse.emf.ecore.xmi.impl.XMLLoadImpl.load(XMLLoadImpl.java:181)
	at org.eclipse.emf.ecore.xmi.impl.XMLResourceImpl.doLoad(XMLResourceImpl.java:231)
	at org.eclipse.emf.ecore.resource.impl.ResourceImpl.load(ResourceImpl.java:1497)
	at org.eclipse.emf.ecore.resource.impl.ResourceImpl.load(ResourceImpl.java:1285)
	at org.eclipse.emf.ecore.resource.impl.ResourceSetImpl.demandLoad(ResourceSetImpl.java:255)
	at org.eclipse.emf.ecore.resource.impl.ResourceSetImpl.demandLoadHelper(ResourceSetImpl.java:270)

The 'Invalid byte 1 of 1-byte UTF-8 sequence' is on the emtl file that appears to be in a compressed binary format rather than the ASCII XML format of M5. Has a compression been introduced in M6 and not activated standalone?
Comment 1 Laurent Goubet CLA 2011-03-16 05:41:46 EDT
We have indeed taken advantage of EMF binary resources with M6, and used the content types in order to determine whether we face an XMI or a binary resource when registering the resource factories... without considering for a moment that content types would not be of any help in standalone.

Changing importance as the binary resource is now the default serialization choice, which would provoke such errors to all users launching generations in standalone.

For now, you can workaround the issue by changing back the serialization format : right click on your Acceleo project, and change "binary" to "XMI" in the "Acceleo Compiler" section.
Comment 2 Ed Willink CLA 2011-03-18 13:21:08 EDT
I'm seeing this come from the builder too.

Thread [Worker-3] (Suspended (exception MalformedByteSequenceException))	
	UTF8Reader.invalidByte(int, int, int) line: 713	
	UTF8Reader.read(char[], int, int) line: 586	
	XMLEntityScanner.load(int, boolean) line: 1740	
	XMLEntityScanner.skipString(String) line: 1437	
	XMLVersionDetector.determineDocVersion(XMLInputSource) line: 191	
	JAXPConfiguration(XML11Configuration).parse(boolean) line: 798	
	JAXPConfiguration(XML11Configuration).parse(XMLInputSource) line: 764	
	SAXParser(XMLParser).parse(XMLInputSource) line: 148	
	SAXParser(AbstractSAXParser).parse(InputSource) line: 1242	
	SAXParserImpl(SAXParser).parse(InputSource, DefaultHandler) line: 375	
	XMLRootHandler.parseContents(InputSource) line: 174	
	XMLRootElementContentDescriber2.fillContentProperties(InputSource, Map) line: 207	
	XMLRootElementContentDescriber2.checkCriteria(InputSource, Map) line: 132	
	XMLRootElementContentDescriber2.describe(InputStream, IContentDescription, Map) line: 173	
	ContentTypeCatalog.describe(ContentType, ILazySource, ContentDescription, Map) line: 214	
	ContentTypeCatalog.collectMatchingByContents(int, IContentType[], List, ILazySource, Map) line: 190	
	ContentTypeCatalog.internalFindContentTypesFor(ILazySource, IContentType[][], Comparator, Comparator) line: 403	
	ContentTypeCatalog.internalFindContentTypesFor(ContentTypeMatcher, ILazySource, String, boolean) line: 450	
	ContentTypeCatalog.getDescriptionFor(ContentTypeMatcher, ILazySource, String, QualifiedName[]) line: 346	
	ContentTypeCatalog.getDescriptionFor(ContentTypeMatcher, InputStream, String, QualifiedName[]) line: 360	
	ContentTypeMatcher.getDescriptionFor(InputStream, String, QualifiedName[]) line: 86	
	ContentDescriptionManager.readDescription(File) line: 435	
	ContentDescriptionManager.getDescriptionFor(File, ResourceInfo) line: 345	
	File.getContentDescription() line: 275	
	PlatformResourceURIHandlerImpl$WorkbenchHelper.getContentDescription(String, Map<?,?>) line: 369	
	PlatformContentHandlerImpl.contentDescription(URI, InputStream, Map<?,?>, Map<Object,Object>) line: 88	
	PlatformResourceURIHandlerImpl(URIHandlerImpl).contentDescription(URI, Map<?,?>) line: 267	
	ExtensibleURIConverterImpl.contentDescription(URI, Map<?,?>) line: 362	
	ResourceSetImpl$2(ResourceFactoryRegistryImpl).getContentTypeIdentifier(URI) line: 164	
	ResourceSetImpl$2(ResourceFactoryRegistryImpl).getFactory(URI, Map<String,Object>, Map<String,Object>, Map<String,Object>, String, boolean) line: 130	
	ResourceSetImpl$2.delegatedGetFactory(URI, String) line: 450	
	ResourceSetImpl$2(ResourceFactoryRegistryImpl).getFactory(URI, Map<String,Object>, Map<String,Object>, Map<String,Object>, String, boolean) line: 151	
	ResourceSetImpl$2(ResourceFactoryRegistryImpl).getFactory(URI, String) line: 92	
	ResourceSetImpl.createResource(URI, String) line: 422	
	ResourceSetImpl.demandCreateResource(URI) line: 239	
	ResourceSetImpl.getResource(URI, boolean) line: 391	
	ModelUtils.load(URI, ResourceSet) line: 330	
	CreateRunnableAcceleoOperation.run(IProgressMonitor) line: 122	
	AcceleoCompileOperation.doCompileResources(IProgressMonitor) line: 225	
	AcceleoCompileOperation.run(IProgressMonitor) line: 127	
	AcceleoBuilder.incrementalBuild(IResourceDelta, IProgressMonitor) line: 232	
	AcceleoBuilder.build(int, Map, IProgressMonitor) line: 94	
	BuildManager$2.run() line: 717	
	SafeRunner.run(ISafeRunnable) line: 42	
	BuildManager.basicBuild(int, IncrementalProjectBuilder, Map<String,String>, MultiStatus, IProgressMonitor) line: 191	
	BuildManager.basicBuild(IBuildConfiguration, int, IBuildContext, ICommand[], MultiStatus, IProgressMonitor) line: 228	
	BuildManager$1.run() line: 281	
	SafeRunner.run(ISafeRunnable) line: 42	
	BuildManager.basicBuild(IBuildConfiguration, int, IBuildContext, MultiStatus, IProgressMonitor) line: 284	
	BuildManager.basicBuildLoop(IBuildConfiguration[], IBuildConfiguration[], int, MultiStatus, IProgressMonitor) line: 340	
	BuildManager.build(IBuildConfiguration[], IBuildConfiguration[], int, IProgressMonitor) line: 363	
	AutoBuildJob.doBuild(IProgressMonitor) line: 143	
	AutoBuildJob.run(IProgressMonitor) line: 241	
	Worker.run() line: 54
Comment 3 Stephane Begaudeau CLA 2011-03-21 09:28:08 EDT
We are using the eclipse extension points to register our content describer, in stand alone they are not used so now we will use a custom ResourceSet (AcceleoResourceSetImpl) that delegate everything except the creation of the ResourceFactoryRegistry.

We have a custom ResourceFactoryRegistry (AcceleoResourceFactoryRegistry) that also delegates everything but if the loading of the an "emtl" resource fails we will try to test the file with our own content describer to determine if we are dealing with a xmi or binary resource. If that also fail, we will let EMF use a basic ResourceImpl.

I'll close this issue after the creation of the stand alone tests suite.
Comment 4 Ed Willink CLA 2011-03-21 09:34:20 EDT
Please allow me to very strongly discourage the use a custom ResourceSet; I did
it myself and it makes it very difficult to coexist with other tools that don't
use the 'correct' ResourceSet class. Too much code expects to do new
ResourceSetImpl() and then pass it around.

Please try to just use an eAdapter to extend whatever ResourceSet is in use,
and perhaps previde an inityializeResourceSet(xxx) method to prepare it for
use.
Comment 5 Stephane Begaudeau CLA 2011-03-21 11:38:52 EDT
I am not requiring anyone else to use my resource set. I am configuring the resource set when it is provided by someone else, but when Acceleo creates a resource set for internal use, I just use my custom resource set which is just a standard resource set with this configuration. The configuration is not very complicated:

- resourcetSet.setResourceFactoryRegistry(new AcceleoResourceFactoryRegistry())

I thought of discarding this custom resource set too but mainly because this class is very short, it was mainly there to ensure that by using this resource set we do not forget the configuration of the resource factory registry:

public class AcceleoResourceSetImpl extends ResourceSetImpl {

	/**
	 * The constructor.
	 */
	public AcceleoResourceSetImpl() {
		super();
		resourceFactoryRegistry = new AcceleoResourceFactoryRegistry();
	}
}
Comment 6 Stephane Begaudeau CLA 2011-03-22 10:46:51 EDT
Fix for stand alone generation contributed on HEAD.

If the user provide an EObject already loaded for the generation, we will configure its resource set to use our ResourceFactoryRegistry and after the generation we will restore its original ResourceFactoryRegistry.

If the user only gives us the uri of the model, we will load it in our own preconfigured resource set.

Our ResourceFactoryRegistry can also be created with another ResourceFactoryRegistry as a delegate. By using it that way, we will delegate every request for a factory that we can't handle to this delegate. If the user provide the loaded EObject, the original ResourceFactoryRegistry of its ResourceSet will be used as a delegate.

Unit tests have also been created to ensure the loading of binary and xmi resources in stand alone.
Comment 7 Ed Willink CLA 2011-03-23 17:29:21 EDT
Works for me in M6a. Ta.
Comment 8 Miles Parker CLA 2011-12-19 16:25:43 EST
Stephen,

I'm having this same problem for my own code. In this case I have a very simple Binary resource defined, and I'm determining type based on file extension. I've customized the workflow launcher to be able to add some class path entries, but otherwise it's standard..

Can I ask, how did you modify your setup so that the workflow would accept binary resources without croaking on the UTF error?

thanks,

Miles
Comment 9 Stephane Begaudeau CLA 2011-12-20 03:42:30 EST
If you want to use an EMF model serialized with a binary resource, you have two potential solutions. If your resources can only be binary resources, then you just have to register a binary resource factory for the extension of your files. If you can have in your file binary or xmi resources, you need a content type descriptor which will look at the content of the file to figure out the factory to use.

1- use the content type to figure out if your resource is a binary resource or a xmi resource so that when you will load your model, EMF will look at its content type to figure out the matching factory:

resourceSet.getResourceFactoryRegistry().getContentTypeToFactoryMap().put("myContentType", new CustomBinaryResourceFactoryImpl());

public class CustomBinaryResourceFactoryImpl extends ResourceFactoryImpl {
    @Override
    public Resource createResource(URI uri) {
        return new CustomBinaryResourceImpl(uri);
    }
}

public class CustomBinaryResourceImpl extends BinaryResourceImpl {
    ....
}

And when you are creating resources, you can use:
resourceSet.createResource(myModelURI, "myContentType")

Same thing with a xmi content type and a xmi based resource factory...

You need to register your content types in the plugin.xml like this:
<extension
        point="org.eclipse.emf.ecore.content_parser">
     <parser
           class="org.eclipse.CustomResourceFactoryImpl"
           contentTypeIdentifier="myXmiContentType">
     </parser>
     <parser
           class="org.eclipse.CustomBinaryResourceFactoryImpl"
           contentTypeIdentifier="myContentType">
     </parser>
  </extension>

And then, you can register in the plugin.xml the content type describer which will read the file to compute the matching content type.

<extension
        point="org.eclipse.core.contenttype.contentTypes">
     <content-type
           base-type="org.eclipse.core.runtime.xml"
           file-extensions="emtl"
           id="myId"
           name="XMI File"
           priority="low">
        <describer
              class="org.eclipse.core.runtime.content.XMLRootElementContentDescriber2">
           <parameter
                 name="element"
                 value="{}">
           </parameter>
        </describer>
     </content-type>
     <content-type
           describer="org.eclipse.acceleo.model.mtl.resource.CustomBinaryResourceContentDescriber"
           file-extensions="emtl"
           id="myId"
           name="Binary File"
           priority="normal">
     </content-type>
  </extension>

With the class:
public class CustomBinaryResourceContentDescriber implements IContentDescriber {
public int describe(...) throws IOException {
    XMLContentDescriber xmlContentDescriber = new XMLContentDescriber();
    if (xmlContentDescriber.describe(contents, description) == IContentDescriber.VALID) {
        return IContentDescriber.INVALID;
    }
    return VALID;
    }
}
(We check if it's a XML file with the XMLDescriber and if its is, then it's not a binary resource)

2- Or you could register directly the factory for your extension (if your files can only be serialized as binary resources so you don't have to analyze the file to see if you have a binary or a xmi resource) like this:

(Register your factory in the global registry)
Resource.Factory.Registry.INSTANCE.getExtensionToFactoryMap().put("myExtension", new CustomBinaryResourceFactoryImpl());

(or register it only for your resource set)
resourceSet.getResourceFactoryRegistry().getExtensionToFactoryMap().put("myExtension", new CustomBinaryResourceFactoryImpl());

And after that, you can load the resource in the resource set without any trouble.


(In reply to comment #8)
> Stephen,
> 
> I'm having this same problem for my own code. In this case I have a very simple
> Binary resource defined, and I'm determining type based on file extension. I've
> customized the workflow launcher to be able to add some class path entries, but
> otherwise it's standard..
> 
> Can I ask, how did you modify your setup so that the workflow would accept
> binary resources without croaking on the UTF error?
> 
> thanks,
> 
> Miles
Comment 10 Miles Parker CLA 2011-12-20 13:40:22 EST
(In reply to comment #9)
> 
> 2- Or you could register directly the factory for your extension (if your files
> can only be serialized as binary resources so you don't have to analyze the
> file to see if you have a binary or a xmi resource) like this:

Thanks Stephen. And yes, that's the approach I've been taking. In fact, I wrote a blog about it. :)

http://milesparker.blogspot.com/2011/01/supporting-multiple-resource-types-with.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+MetaBeta+%28meta+beta%29

That all works great, except that it breaks with the MWE workflow loader itself. Here, the loader is essentially running standalone, and doesn't seem to be picking up the additional factory to extension mapping (I'm actually using two extensions, one for the bin and one for the straight xml formats for the files). MWE has some complex injector stuff going on as you know, and I've tried to create my own runtime resource set initializer but that doesn't seem to be working. Perhaps here there is some additional xml configuration that needs to be happening. So I was wondering when you said..

"we will configure its resource set to use our ResourceFactoryRegistry and after the generation we will restore its original ResourceFactoryRegistry."

I was wondering where you did that?
Comment 11 Stephane Begaudeau CLA 2011-12-21 08:07:20 EST
(In reply to comment #10)
> That all works great, except that it breaks with the MWE workflow loader
> itself. Here, the loader is essentially running standalone, and doesn't seem to
> be picking up the additional factory to extension mapping (I'm actually using
> two extensions, one for the bin and one for the straight xml formats for the
> files). MWE has some complex injector stuff going on as you know, and I've
> tried to create my own runtime resource set initializer but that doesn't seem
> to be working. Perhaps here there is some additional xml configuration that
> needs to be happening. So I was wondering when you said..
> 
> "we will configure its resource set to use our ResourceFactoryRegistry and
> after the generation we will restore its original ResourceFactoryRegistry."
> 
> I was wondering where you did that?

In the java launcher of an Acceleo generator, the constructors call the operations initialize(...). In the two methods  "initialize(...)" we have this:

resourceFactoryRegistry = resourceSet.getResourceFactoryRegistry();
resourceSet.setResourceFactoryRegistry(new AcceleoResourceFactoryRegistry(resourceFactoryRegistry));

And then, in the method "postGenerate(...)" of this java launcher, we are doing this:

resourceSet.setResourceFactoryRegistry(resourceFactoryRegistry);

All those methods have a protected visibility so you can overwrite them, if you want to customize this behavior. Our resource factory registry is using the original resource factory registry as a delegate.

Stephane.