Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 357111

Summary: [DSTORE]File with invalid characters can't be opened in editor
Product: [Tools] Target Management Reporter: Samuel Wu <samuelwu>
Component: RSEAssignee: David McKnight <dmcknigh>
Status: RESOLVED FIXED QA Contact: Martin Oberhuber <mober.at+eclipse>
Severity: normal    
Priority: P3 CC: ddykstal.eclipse, dmcknigh
Version: unspecified   
Target Milestone: 3.4 M7   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on:    
Bug Blocks: 377646    
Attachments:
Description Flags
bad.lst
none
patch to turn a file into binary mode if it fails in text mode
none
updated patch
none
updated patch none

Description Samuel Wu CLA 2011-09-08 11:33:24 EDT
Build Identifier: org.eclipse.rse_3.2.3.R32x_v201105041224-7L78FA98SreKDIw_Io8X4hiODRYp

Our customer has a file on a linux host whose encoding is ISO-8859-1 but this file contains a few x'9c. When opening this file in the editor, the user got the error message RSEF1002 and file was not opened.

Reproducible: Always

Steps to Reproduce:
1. Upload the attached file bad.lst to a linux host as binary
2. Create a dstore connection to the linux host
3. Set the file transfer mode of *.lst to text
4. Open the file bad.lst 
An error message RSEF1002 poped up saying 
"Operation failed. File system input or output error"

The following exception was shown in the details.
ava.nio.charset.UnmappableCharacterException: Input length = 1
java.nio.charset.UnmappableCharacterException: Input length = 1

	at java.nio.charset.CoderResult.throwException(CoderResult.java:272)

	at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:793)

	at org.eclipse.rse.services.files.DefaultFileServiceCodePageConverter.convertFileFromRemoteEncoding(DefaultFileServiceCodePageConverter.java:77)

	at org.eclipse.rse.internal.services.dstore.files.DStoreFileService.download(DStoreFileService.java:847)

	at org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.download(FileServiceSubSystem.java:807)

	at org.eclipse.rse.files.ui.resources.SystemEditableRemoteFile.doDownload(SystemEditableRemoteFile.java:661)

	at org.eclipse.rse.files.ui.resources.SystemEditableRemoteFile.download(SystemEditableRemoteFile.java:632)

	at org.eclipse.rse.internal.files.ui.view.DownloadAndOpenJob.run(DownloadAndOpenJob.java:144)

	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
Comment 1 Samuel Wu CLA 2011-09-08 11:34:20 EDT
Created attachment 203007 [details]
bad.lst
Comment 2 Samuel Wu CLA 2011-09-08 11:39:38 EDT
We've suggested the customer to change the transfer mode to binary and the file can be opened properly. But the customer want's something more friendly. 
Wonder whether it's possible to provide an option to on the error message dialog to allow the user to download and open the file as is, just like the binary transfer mode does. A warning message can also be put up to let the user know that the content of the file may get corrupted if he changes and saves the file.
Comment 3 David McKnight CLA 2011-09-08 11:49:37 EDT
Hi Samuel, is the attached file the original file or the one that got corrupted?  I put it on a system via binary transfer and tried to reproduce the problem but I didn't hit any conversion issues.  Which encoding is the customer workspace in?
Comment 4 David McKnight CLA 2011-09-08 11:51:18 EDT
(In reply to comment #2)
> We've suggested the customer to change the transfer mode to binary and the file
> can be opened properly. But the customer want's something more friendly. 
> Wonder whether it's possible to provide an option to on the error message
> dialog to allow the user to download and open the file as is, just like the
> binary transfer mode does. A warning message can also be put up to let the user
> know that the content of the file may get corrupted if he changes and saves the
> file.

That sounds like a useful enhancement but I think something like that would have to wait until the next release.
Comment 5 David McKnight CLA 2011-09-08 11:55:10 EDT
I played a bit more with this and discovered that if the workspace encoding is "UTF-8" the text transfer works however if it's "cp1252", then it fails.  I wonder if there is an incompatibility with those chars and Cp1252.
Comment 6 Samuel Wu CLA 2011-09-08 14:05:13 EDT
I changed the workstation encoding to UTF-8 and the file was opened fine. This is interesting since the character x'9C is not in the ISO-8859-1 character set but it's  in the CP1252 one.
Comment 7 David McKnight CLA 2011-09-08 14:39:14 EDT
(In reply to comment #6)
> I changed the workstation encoding to UTF-8 and the file was opened fine. This
> is interesting since the character x'9C is not in the ISO-8859-1 character set
> but it's  in the CP1252 one.

Yeah, I stepped through the code and noticed that decoding of the IS0-8859-1 buffer is successful.  The exception is hit when an attempt is made to encode the decoded buffer into cp1252:

        ...
	// for conversion to the local encoding
	Charset charset = Charset.forName(localEncoding);
	CharsetEncoder encoder = charset.newEncoder();
	byte[] localBuffer = null;
	// convert to the specified local encoding
--->	ByteBuffer lclBuf = encoder.encode(decodedBuf);
	localBuffer = lclBuf.array();
        ...
Comment 8 Samuel Wu CLA 2011-10-05 21:31:16 EDT
Hi Dave,
Any further update on this one? Thanks.
Comment 9 David McKnight CLA 2011-10-06 11:40:51 EDT
(In reply to comment #8)
> Hi Dave,
> Any further update on this one? Thanks.

Since this character is not available in the ISO-8859-1 charset, I wonder if there's an issue in assuming ISO-8859-1 as the source encoding.  If you change the remote encoding for the remote file to cp1252, does the open work okay?
Comment 10 Samuel Wu CLA 2011-10-06 16:00:28 EDT
Hi Dave,
From your comment #7, it seems to have something to do with the target encoding on the workstation side. Wonder whether a fix can be found for it.
Comment 11 David McKnight CLA 2011-10-06 16:15:12 EDT
(In reply to comment #10)
> Hi Dave,
> From your comment #7, it seems to have something to do with the target encoding
> on the workstation side. Wonder whether a fix can be found for it.

I'm thinking this is a user error in that a file that is not ISO-8859-1 is getting converted from ISO-8859-1 to cp1252.  Since the file isn't ISO-8859-1, I would think the correct thing to do is to indicate it as such prior to the transfer.
Comment 12 Samuel Wu CLA 2011-10-06 16:51:47 EDT
The user already knew the work around (see comment #2). In the old RSE the file was opened in the editor although it contained invalid characters. The user wants the old behavior back.
Comment 13 David McKnight CLA 2011-10-06 16:57:39 EDT
(In reply to comment #12)
> The user already knew the work around (see comment #2). In the old RSE the file
> was opened in the editor although it contained invalid characters. The user
> wants the old behavior back.

The old behaviour would give the user a corrupt file.  The user may not even notice the corruption, but after he/she saves to the remote server, this would amount to data loss.  I suppose we could add an option to open the file anyway, when the corruption is hit but, again, that's an enhancement that can't go into a back release.
Comment 14 Samuel Wu CLA 2011-10-07 08:29:40 EDT
OK. Let's wait for the next release.
Comment 15 David McKnight CLA 2012-03-30 13:03:58 EDT
Created attachment 213399 [details]
patch to turn a file into binary mode if it fails in text mode

Samuel, does the approach in this match make sense to you?
Comment 16 Samuel Wu CLA 2012-04-13 11:08:34 EDT
Hi Dave,
The patch works fine. Can you please port this back to 3.2.2 maintenance stream? Thanks.
Comment 17 David McKnight CLA 2012-04-13 11:23:12 EDT
Adding Dave Dykstal to this.  Dave, can you weigh in on whether you think this approach is a good idea or not?  This is a scenario where a user wanted to use text transfer but with my proposed solution we end up changing to use binary transfer when the text conversion fails.
Comment 18 David McKnight CLA 2012-04-25 11:30:23 EDT
Created attachment 214537 [details]
updated patch
Comment 19 David McKnight CLA 2012-04-25 11:38:21 EDT
Created attachment 214538 [details]
updated patch
Comment 20 David McKnight CLA 2012-04-25 11:39:26 EDT
I've committed the change to the HEAD stream.