Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 421776

Summary: [server] Metadata fails on project name with emoji
Product: [ECD] Orion Reporter: John Arthorne <john.arthorne>
Component: ServerAssignee: Anthony Hunter <ahunter.eclipse>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: ahunter.eclipse, mamacdon
Version: unspecified   
Target Milestone: 5.0 M1   
Hardware: PC   
OS: Windows 7   
Whiteboard:
Attachments:
Description Flags
Screen shot of the browser working on Linux
none
screenshot of legacy vs simple metastore on Windows none

Description John Arthorne CLA 2013-11-14 14:38:59 EST
If you have a project name with unprintable characters (e.g., emoji), the simple metadata store can't seem to handle it.  There is a workspace with a project called "ð©". The search indexer has code like this:

for (String projectName : workspace.getProjectNames()) {
  ProjectInfo project = store.readProject(workspace.getUniqueId(), projectName);

readProject is returning null, even though the workspace claims to contain that project.
Comment 1 Anthony Hunter CLA 2013-11-14 17:31:06 EST
I can successfully create a project programatically with emoji characters.

The client can successfully get this list of projects and display on the editor page and in the navigator. You cannot get at anything under the project nor create files / or folders in the project:

{"HttpCode":404,"Message":"File not found: /anthony-OrionContent/Project ὃ6ὃ5/","Severity":"Error","Code":0}
Comment 2 John Arthorne CLA 2013-11-15 17:35:09 EST
So if create succeeds but then I can't access anything in the folder afterwards it sounds like a bug. If we can't represent it on disk then we should fail to create, and if we can represent it then I should be able to access the contents.
Comment 3 Anthony Hunter CLA 2013-11-18 14:43:17 EST
Created attachment 237537 [details]
Screen shot of the browser working on Linux

This is not a problem running against Linux. The screen shot shows the successful creation and edit of a project and file with emoji characters.

On Windows however, we return an error 500 because we cannot create files with these characters, we need to return a proper error that is displayed to the user.
Comment 4 Mark Macdonald CLA 2013-11-18 16:28:10 EST
Created attachment 237538 [details]
screenshot of legacy vs simple metastore on Windows

(In reply to Anthony Hunter from comment #3)
> On Windows however, we return an error 500 because we cannot create files
> with these characters, we need to return a proper error that is displayed to
> the user.

I don't think this is a Windows problem. On Windows I can use emojis everywhere when I'm running Orion with the legacy metastore.

But when I use the simple metastore, emojis only work in subfolder and file names. Using an emoji as a top-level folder breaks -- the emoji characters seem to be corrupted by `workspace.json`.

Attaching a pic showing
Comment 5 Mark Macdonald CLA 2013-11-18 16:37:53 EST
(In reply to Mark Macdonald from comment #4)
> Attaching a pic showing

…Attaching a pic showing 
Comment 6 Mark Macdonald CLA 2013-11-18 16:38:55 EST
(In reply to Mark Macdonald from comment #4)
> Attaching a pic showing

So bugzilla doesn't like emojis either. The character I used in the screenshot was U+1F424 and I tried it in  top-level folders, subfolders, and filenames.
Comment 7 Mark Macdonald CLA 2013-11-18 16:49:45 EST
I did my testing on a local server, and noticed that the VM had been running with
> -Dfile.encoding=Cp1252

When I change that to UTF-8, everything just works.

So I think the problem is that the simple metastore relies on the default JVM encoding when it writes your metadata files. It needs to either always write them as UTF-8, or perhaps just encode all non-ASCII characters as escapes (as the legacy metastore did).
Comment 8 Anthony Hunter CLA 2013-11-18 17:48:12 EST
(In reply to Mark Macdonald from comment #7)
> I did my testing on a local server, and noticed that the VM had been running
> with
> > -Dfile.encoding=Cp1252
> 
> When I change that to UTF-8, everything just works.
> 
> So I think the problem is that the simple metastore relies on the default
> JVM encoding when it writes your metadata files. It needs to either always
> write them as UTF-8, or perhaps just encode all non-ASCII characters as
> escapes (as the legacy metastore did).

I committed a test:
http://git.eclipse.org/c/orion/org.eclipse.orion.server.git/commit/?id=477197eabaf25b0ca8ff859b343be152dbf54c2b

This test is successful on Linux but fails on Windows.

However, I can fail the test on Linux by changing the encoding on Linux to ISO-8859-1.

I did some quick reading and the FileReader/FileWriter I am using does not explicitly set the character encoding as you have shown. This needs to be fixed.
Comment 9 Anthony Hunter CLA 2013-11-19 11:59:25 EST
This problem has been fixed with commit:
http://git.eclipse.org/c/orion/org.eclipse.orion.server.git/commit/?id=75f8128ebb004cb0185e6ee4f9973239cad4b022