Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 92022

Summary: GB18030: 4-bytes characters can't be recognized as java program arguments.
Product: [Eclipse Project] JDT Reporter: Cheng xu <xucheng>
Component: DebugAssignee: JDT-Debug-Inbox <jdt-debug-inbox>
Status: CLOSED FIXED QA Contact:
Severity: normal    
Priority: P3    
Version: 3.1   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Attachments:
Description Flags
4-bytes characters can't be regonized as java program parameter
none
Screenshot of common tab
none
Screenshot of VM encoding none

Description Cheng xu CLA 2005-04-19 22:50:04 EDT
OS:		Windows XP
Language:	Simplified Chinese
Build level:	20050418
JDK version:	J2RE 1.4.2 IBM Windows 32 build cn142sr1a-20050209 (JIT 
enabled: jitc)

Summary: GB18030: 4-bytes characters can't be recognized as java program 
arguments.

Steps to recreate problem: 
1. Create a Java project and a java class as below.
******************************
public class test {
	public static void main(String[] args) {
		try {
			if(args[0] !=null)
				System.out.println(args[0]);
		}catch (Exception e) {
		}
	}
}
******************************
2. Build and Run it as Java Application.
3. Select "Run"-->"Run..." in menu, focused on the specified Java Application, 
click "Arguments" tab.
4. Enter GB18030 chars like [unicode(Extension A)3400, unicode(Uigur) 0680, 
unicode(Tibetan set) 0f4d, unicode(Mongolian set) 1827, unicode(Yi Syllables 
set) a322, unicode(Yi Radical set) a493,] in Program arguments field.
5. Click "Apply" button, then Click "Run" button.
6. Verify arguments in Console. 
	--> Problem: ".classpath" show in console. 4-bytes characters can't be 
recognized as java program arguments.

Expected Result:  4-bytes characters arguments could correctly displayed in 
console.


Remark:
1. workspace file is attached.
2. If enter any single unicode as java program arguments, console will show 4-
bytes character as question mark.
3. No such problem in RHEL4.0 and SLES9 platform.
4. 4-bytes Unicode arrange:
CharSet		GB18030 Range		Unicode
--------------------------------------------------------
Extension A	0x8139EE39-0x82358738	0x3400-0x4DB5
Uigur		0x81318132-0x81319934	0x060C-0x06FE
Tibetan		0x8132E834-0x8132FD31	0x0F00-0x0FCF
Mongolian	0x8134D238-0x8134E337	0x1800-0x18A9
Yi Syllables	0x82359833-0x82368F30	0xA000-0xA48F
Yi Radical	0x82368F31-0x82369435	0xA490-0xA4C6

5. Before executing above steps, all the fonts have been set to GB18030 
character set from menu Windows -> Preference -> General -> Appearance -> 
Colors & Fonts.
Comment 1 Cheng xu CLA 2005-04-19 23:10:36 EDT
Created attachment 20095 [details]
4-bytes characters can't be regonized as java program parameter

You can find 4-bytes characters as I mentioned in the description from the
attached workspace file.
Comment 2 Darin Wright CLA 2005-04-20 11:41:50 EDT
To get DBCS characters to display in the console properly, you must set the 
console encoding (unless the default encoding is correct). This can be done on 
the "common" tab of your associated launch configuration. Please verify you 
have the correct encoding set to display your character set.
Comment 3 Cheng xu CLA 2005-04-20 20:36:31 EDT
Created attachment 20162 [details]
Screenshot of common tab

The common tab has already set to GB18030. It's default setting.
Comment 4 Kevin Barnes CLA 2005-04-21 09:55:53 EDT
Did you set the file encoding vm arugument on the target process? (-Dfile.encoding=GB18030)
Comment 5 Cheng xu CLA 2005-04-21 21:20:04 EDT
Created attachment 20216 [details]
Screenshot of VM encoding

I haven't found file encoding argument in the variable list, so I direct wrote
"-Dfile.encoding=GB18030" in the VM argments field. If I do wrong, pls remind
me, thanks.
The result still is ".classpath"
Comment 6 Kevin Barnes CLA 2005-04-22 09:10:03 EDT
Looks like a bug in the way we pass arguments to the process.

new Test case (UTF-8):
    public static void main(String[] args) {
        try {
            if (args[0] != null)
                System.out.println(args[0]); //Pass \uFEFC as arg[0]
            
            String foo = "\uFEFC";
            
            System.out.println(foo);
        } catch (Exception e) {
        }
    }

arg[0] is displayed as \\FEFC in the variables view
foo is displayed properly in the same view (and the console)
Comment 7 Darin Wright CLA 2005-04-22 14:29:34 EDT
If you run the same program from a DOS command line, do the correct characters 
appear? The program args are simply passed to the program as typed. We do not 
process unicode characters any differently.
Comment 8 Cheng xu CLA 2005-04-24 22:07:17 EDT
Dos Command Line can't show any 4-bytes character. It only supports 2 fonts as 
system default installed.
Comment 9 Kevin Barnes CLA 2005-05-02 12:16:47 EDT
Arguments are passed to the target VM as entered. They are not encoded. We believe this is the 
expected behavior. Do other application behave differently?
Comment 10 Cheng xu CLA 2005-05-08 04:37:17 EDT
Closed as windows limitation.
Comment 11 Cheng xu CLA 2005-05-08 04:38:57 EDT
Closed as windows limitation.