Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 368019

Summary: [generator] JavaIoFileSystemAccess always uses default char encoding
Product: [Modeling] TMF Reporter: Jan Koehnlein <jan>
Component: XtextAssignee: Jan Koehnlein <jan>
Status: CLOSED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: sebastian.zarnekow, stephan.herrmann, tmf.xtext-inbox
Version: 2.2.1Flags: jan: juno+
Target Milestone: M5   
Hardware: All   
OS: All   
Whiteboard:

Description Jan Koehnlein CLA 2012-01-06 06:40:13 EST
Currently, there is no way to set the encoding of files created by the JavaIoFileSystemAccess. It uses a FileWriter which assumes the default encoding. This should also be bad for the standalone compiler.

In the UI scenario (EclipseResourceFileSystemAccess2) we take the encoding from the workspace settings, which is a good idea. 

For model files, we have the IEncodingProvider that takes care of workspace settings in the UI scenario. As different languges could generate to the same target language, reusing the IEncodingProvider does not seem to be an option.

Alternatively, we could store the encoding in the OutputConfiguration causing an asymmetry with the EclipseResourceFSA. 

Any opinions?
Comment 1 Sebastian Zarnekow CLA 2012-01-06 06:59:36 EST
When writing files we should use the encoding provider of the written file type (if available) and not of the language that is contributing the generator. Otherwise the files could probably never be read correctly by the language implementation, could they?
Comment 2 Jan Koehnlein CLA 2012-01-06 07:04:03 EST
(In reply to comment #1)

Yes, but usually there is no provider for the target language. Otherwise we'd need IResourceServiceProviders for all target languages.
Comment 3 Sebastian Zarnekow CLA 2012-01-06 07:15:39 EST
(In reply to comment #2)
> (In reply to comment #1)
> 
> Yes, but usually there is no provider for the target language. Otherwise we'd
> need IResourceServiceProviders for all target languages.

I think it's reasonable to assume that the target language will be read with the default encoding of the system / of the containing eclipse project / folder if no ResourceServiceProvider is available.
Comment 4 Jan Koehnlein CLA 2012-01-17 10:03:37 EST
Pushed to MASTER.
Comment 5 Stephan Herrmann CLA 2012-05-16 12:57:38 EDT
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > 
> > Yes, but usually there is no provider for the target language. Otherwise we'd
> > need IResourceServiceProviders for all target languages.
> 
> I think it's reasonable to assume that the target language will be read with
> the default encoding of the system / of the containing eclipse project / folder
> if no ResourceServiceProvider is available.

Using the default encoding of the containing eclipse project would be fine, but looking at the corresponding commit that doesn't seem to be implemented?

Any recommendations how to achieve that effect?
Comment 6 Sebastian Zarnekow CLA 2012-05-18 17:47:21 EDT
The encoding is determined like this:

protected String getEncoding(IFile fileToBeWritten) throws CoreException {
  return fileToBeWritten.getCharset(true);
}

What problems do you face with that approach?
Comment 7 Sebastian Zarnekow CLA 2012-05-18 17:51:02 EDT
(In reply to comment #6)
> The encoding is determined like this:
> 
> protected String getEncoding(IFile fileToBeWritten) throws CoreException {
>   return fileToBeWritten.getCharset(true);
> }
> 
> What problems do you face with that approach?

Sorry, I missed the actual subject of the ticket. I'm afraid you'll have to provide an IEncodingProvider that tries to read the encoding from the .settings folder. I don't see another option.
Comment 8 Stephan Herrmann CLA 2012-05-18 19:04:06 EDT
I think the following is pretty common:

- workspace has platform dependent default encoding, nobody cares because ...
- all projects properly define UTF-8 as their encoding

Now, shouldn't all files be generated in UTF-8 by default? But they aren't,
they still use the workspace default.

I had hoped that this bug would fix the common problems once and for all :)

Currently, I worked around this by sub-classing JavaIoFileSystemAccess
and letting the generator call myFileSystemAccess.setEncoding(enc).

I have no idea about the impact of providing my on IEncodingProvider,
but I'm afraid doing relevant work there will further degrade performance
when *reading* lots of resources, whereas I only want to influence the
*writing* of resources created by the file system access.

Maybe I'll try to make my file system access smarter to infer the
encoding from any enclosing folder inside the workspace ...
Comment 9 Stephan Herrmann CLA 2012-05-18 19:48:23 EDT
Reopening because I have a solution that seems to deliver what was already
promised in comment 3 :)

(In reply to comment #3)
> I think it's reasonable to assume that the target language will be read with
> the default encoding of the system / of the containing eclipse project / folder
> if no ResourceServiceProvider is available.


Here's what I came up with:

In the JavaIoFileSystemAccess implement getEncoding like so:

	protected String getEncoding(URI fileURI) {
		IResourceServiceProvider resourceServiceProvider = registry.getResourceServiceProvider(fileURI);
		if(resourceServiceProvider != null)
			return resourceServiceProvider.getEncodingProvider().getEncoding(fileURI);
		String containerEncoding = ResourceUtil.findEncodingFromContainer(fileURI);
		if (containerEncoding != null)
			return containerEncoding;
		return encodingProvider.getEncoding(fileURI);
	}

where ResourceUtil.findEncodingFromContainer is this:

	public static String findEncodingFromContainer(URI uri) {
		try {				
			IWorkspaceRoot wsRoot = ResourcesPlugin.getWorkspace().getRoot();
			IContainer container = null;
			if (uri.isPlatformResource()) {
				IPath path = new Path(uri.toPlatformString(true));
				container = wsRoot.getProject(path.segment(0));
				for (int s=1; s<path.segmentCount()-1; s++)
					container = container.getFolder(new Path(path.segment(s)));
			} else if (uri.isFile()) {
				IPath path = new Path(uri.toFileString());
				container = wsRoot.getContainerForLocation(path);
			}
			if (container != null)
				return container.getDefaultCharset();
		} catch (Throwable t) {
			// nop
		}
		return null; // no encoding found
	}

Sorry about the layout ...
Comment 10 Sebastian Zarnekow CLA 2012-05-19 04:30:38 EDT
I'm afaid your solution does not work in general since it assumes a workspace to be present. 

Btw: What your utility does is pretty much the same as what IFile.getCharset(true) does.

Injecting an own IEncodingProvider that deals with the file extension of the generated artifacts is the way to go. There you could do he WS specific stuff but it won't work for Xtext in general.
Comment 11 Stephan Herrmann CLA 2012-05-19 09:00:23 EDT
(In reply to comment #10)
> I'm afaid your solution does not work in general since it assumes a workspace
> to be present. 

Sorry, my code snippet isn't beautiful, but if no workspace is opened
it just catches the IllegalStateException and continues with the
original strategy. So all is clean in fact.
 
> Btw: What your utility does is pretty much the same as what
> IFile.getCharset(true) does.

thanks for the hint.

> Injecting an own IEncodingProvider that deals with the file extension of the
> generated artifacts is the way to go. There you could do he WS specific stuff
> but it won't work for Xtext in general.

Can I register an IEncodingProvider just for the *writing* case??

I'll stick to my special file system access. I just wanted to help that
future users of Xtext wouldn't even have to worry about this.
I understood that your goal was exactly to respect the encoding of
the enclosing folder / project. Why not implement this for the case
where a workspace exists? I'm sure that's what people typically expect.
Comment 12 Sebastian Zarnekow CLA 2012-05-19 09:27:33 EDT
(In reply to comment #11)
> (In reply to comment #10)
> > I'm afaid your solution does not work in general since it assumes a workspace
> > to be present. 
> 
> Sorry, my code snippet isn't beautiful, but if no workspace is opened
> it just catches the IllegalStateException and continues with the
> original strategy. So all is clean in fact.
> 

Well, if you have a workspace available, you could and should use the EclipseResourceFileSystemAccess2 instead of the JavaIoFileSystemAccess. It's not a good idea to introduce a dep to the resources abstraction from a component that's only dependent on Java io files.

> Can I register an IEncodingProvider just for the *writing* case??
> 

Why would you want to do that? If I'm not mistaken, you have a project setting for the encoding which says 'UTF-8' or something. Reading and writing files to that project should all be performed with that encoding, shouldn't it?
Comment 13 Eclipse Webmaster CLA 2017-10-31 11:23:59 EDT
Requested via bug 522520.

-M.