Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 420082

Summary: Cannot open files whose name includes certain character combinations
Product: [ECD] Orion Reporter: Mark Macdonald <mamacdon>
Component: ClientAssignee: Simon Kaegi <simon_kaegi>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: simon_kaegi
Version: 3.0Flags: simon_kaegi: review+
Target Milestone: 4.0 RC3   
Hardware: PC   
OS: Windows 7   
Whiteboard:

Description Mark Macdonald CLA 2013-10-22 10:19:33 EDT
Bug occurs in Firefox (25) and Chrome (30).

1. Go to the editor page
2. Create a file named "foo 你好.js"
3. Click on it. 
4. You get a 404 error:
> File not found: /user/project/foo%20你好.js

The file's href in the navigator is #/file/user/project/foo%2520%E4%BD%A0%E5%A5%BD.js (Note the apparent extra "%25" after foo)

Strangely, the following similar filenames all work OK:
> foo bar.js
> 你好.js
> foo你好.js

What seems to break it is the combination of Chinese characters and the encoded space.
Comment 1 Mark Macdonald CLA 2013-10-22 10:25:32 EDT
Also, using a semicolon in a filename produces a file that cannot be deleted from the navigator. Eg.

1. Create a file named "foo;bar.js"
2. Select it, try to delete. 
3. "File not found: /user/project/foo"  <-- note everything after the semicolon is missing
Comment 2 Simon Kaegi CLA 2013-10-22 17:51:34 EDT
I've pushed a fix worked on by Mark and Ken to address the encoding issues and the raw problem. We still need to address the semi-colon of doom before closing this bug.
Comment 3 Simon Kaegi CLA 2013-10-22 22:49:39 EDT
Semi-colons that are in path segments are stripped from the request (at least in Jetty). e.g. when I do req.getRequestURI() it's as if the semi-colon bit was never there.

The reason is that in the URI sense the semi-colon is used to start a path parameter and the handling of these is incredibly flaky and inconsistent across servlet containers. Regardless our server NEEDS to encode semi-colons and likely a few other characters that are reserved in path segments.

more info here -- http://cdivilly.wordpress.com/2011/04/22/java-servlets-uri-parameters/

"Within a path segment, the characters "/", ";", "=", and "?" are reserved"
Comment 4 Simon Kaegi CLA 2013-10-23 13:45:37 EDT
Another really problematic character is the beloved ":" / colon. Colon is actually  legal in the context of a path however our Server is using Eclipse's Path and IPath which recognize colon as part of a device.

In both the cases of ":" and ";" the correct solution is to encode them however the way we're using Java URI on the server ends up being lossy with respect to how we handle encoded but legal URI path characters.

For 4.0 the following fixes have been pushed...
1) the file client will do the encoding of ";" when it makes requests to work-around the servlet container summarily truncating the pathInfo.
2) The server will prevent creation of files and folders containing ":"

--
In both cases we should be able to encode the character and be fine however this is more work/risk than is reasonable in 4.0. I've talked and show code to John briefly around the issue with Java URI and how our use of it is lossy with respect to encoded but legal URI path characters. I will open a separate bug to look at this in 5.0.