Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 340272

Summary: Encoding of generated Entities is always ISO8859-1
Product: [WebTools] Dali JPA Tools Reporter: Burghard Britzke <bubi>
Component: GeneralAssignee: Pascal Filion <pascal.filion>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: bubi, ding870, mail, mathias.griepentrog, neil.hauge, olafzieger, smaragdraichu
Version: 2.3.3   
Target Milestone: 3.0 M7   
Hardware: All   
OS: All   
Whiteboard:
Attachments:
Description Flags
Using the right encoding when generating a Java file
none
Using the right encoding when generating a Java file neil.hauge: iplog+

Description Burghard Britzke CLA 2011-03-17 02:19:09 EDT
Build Identifier: 20110301-1815

Generated Entities (e. g. with german umlauts like "schüler") are encoded in ISO8859-1 even if the Text file encoding is set to UTF-8.

Reproducible: Always

Steps to Reproduce:
0. Set the Encoding to UTF-8 using "Project->Properties->Resource"
1. Start a Database Server (e.g. Derby Network Server)
2. Connect to it using the Data Source Explorer
3. Select a Project with JPA Facette
4. Select JPA-Tools->Generate Entities from Tables...
5. Select a Table and click "next"
6. On "Table Associations" tab and click "Next"
7. On "Customize Default Entity Generation" Tab and click "Next"
8. On "Customize Individual Entities" Tab select one Table and enter a Class name with a german Umlaut (e. g. "äöüß") click "Finish"
9. Open the generated Class in the "Java Editor" and see the wrongly encoded content.
Comment 1 Pascal Filion CLA 2011-03-23 08:23:33 EDT
I tried the test case and it seems to be working properly, i.e. I have the file enconding set to ISO-8859-1 in the IDE preferences (General->Workspace). I created a Java project and set the encoding to UTF-8. I generated the entities and I used äöüß as the file name for one of the classes, the Java editor is showing the right encoding.

However, there is one setting that overrides the IDE and project levels encoding and it's the Content Types in the IDE preferences. In the IDE preferences under General->Content Types, Text->Java Source File, if the Default encoding is set, then it will be picked up.

If I don't set the project level encoding and the IDE level encoding is set to ISO-8859-1, I do get the issue, but if I change the content type for Java Source File to UTF-8, then the problem goes away.
Comment 2 Burghard Britzke CLA 2011-03-23 15:48:29 EDT
(In reply to comment #1)

we repeated the issue on minimal four different installations and two different OS platforms

For our installations the (General->Workspace) setting mostly has been set to UTF-8 and the (Project->Properties->Resource) setting has been set to "Inherited from container (UTF-8)". The Content Type for Java Source File is not set (e. g. empty).

following effects we recognized for different settings (here only for mac os x but I will try it for winxp tomorrow, too)

In all cases the filenames of the Source Files are encoded correctly!

Workspace->Project->Content Type->Effect
UTF-8->inherited->empty->Class Names with � (Raute with Question Mark in it)
UTF-8->inherited->UTF-8->Class Names with � (Raute with Question Mark in it)

ISO-8859-1->inherited->empty->Class Names missing the äöü <--- THIS IS WHERE YOU GOT AN ISSUE TOO
ISO-8859-1->UTF-8->empty-> Class Names with � (Raute with Question Mark in it)
ISO-8859-1->UTF-8->UTF-8-> Class Names with � (Raute with Question Mark in it) 

You told that you got the issue if IDE level encoding is ISO-8859-1 ... but for this settings a class name encoding of ISO-8859-1 is not an issue but the expected result.

We can repeat the issue for various settings and at least for mac os x and winxp.

But let's take a closer look at our config:
even if I think it does not matter: we use derby 10.7.1.1, EclipseLink 2.1.2 (Helios)

Dali Java Persistence Tools Version 2.3.3.v201010220000 (Build id: 20100915173744)  - mentioned above
Comment 3 Burghard Britzke CLA 2011-03-24 04:45:35 EDT
Here are the results for our WinXP tests: 

File names are all OK with umlauts. But in the Project Explorer View Class Names are cut before the Umlaut (We never started a class name with an umlaut in our tests). In the Editor View the class names are displayes as follows:

UTF-8->inherited->empty->Class Names with � (empty square)
UTF-8->ISO-8859-1->empty->Class Names OK
UTF-8->inherited->ISO-8859-1->Class Names OK

ISO-8859-1->inherited->empty-> OK
ISO-8859-1->UTF-8->empty-> Class Names with � (empty square)
ISO-8859-1->empty->UTF-8-> Class Names with � (empty square) 

On WinXP we have tested with
Version 2.3.1.v201006300000 (Build 20100730021206)
Comment 4 Burghard Britzke CLA 2011-03-25 06:47:23 EDT
Repeated the Tests with WinXP and 
Dali Java Persistence Tools Version 2.3.3.v201010220000
(Build id: 20100915173744)

Same as Version 2.3.1 on WinXP which we tested yesterday

THE RESULTING ENCODING IS ISO-8859-1 no matter which encoding is choosen by either 

Workspace->Preferences, 
Project->Properties or
Content Type->Text->Java Source File.

You can repeat the error even if you try to generate Entities with UTF-16 encoding.
Comment 5 Pascal Filion CLA 2011-03-31 07:29:45 EDT
I am trying to understand the problem. Are you saying the incorrect display of the class name is in the Project Explorer view? Which would means it's not displaying the file name coming from the file system correctly.
Comment 6 Burghard Britzke CLA 2011-03-31 13:54:49 EDT
(In reply to comment #5)
> I am trying to understand the problem. Are you saying the incorrect display of
> the class name is in the Project Explorer view? Which would means it's not
> displaying the file name coming from the file system correctly.

In the -->Project Explorer View<-- 
the filenames are displayed correctly. The Class Names are cut before the umlauts.

In the -->Editor View<--
the Class Names are always displayed as if they where ISO-8859-1 encoded. This means:
if the file encoding is set to UTF-8 by inheriting from the workspace settings or project properties, the generated files are displayed in a wrong manner. If you change the encoding settings for that file after generation, it is displayed correctly.

I think this is a secure sign that the class names (the text in the java source file) is encoded in ISO-8859-1 but it should be encoded as preset in the project properties or in the workspace settings.
Comment 7 Pascal Filion CLA 2011-04-04 13:39:50 EDT
Created attachment 192486 [details]
Using the right encoding when generating a Java file

No encoding was set when converting a String into byte[], JDK uses the system's default encoding. The fix was to use Eclipse's encoding for proper convertion.
Comment 8 Pascal Filion CLA 2011-04-05 06:44:58 EDT
Created attachment 192539 [details]
Using the right encoding when generating a Java file

This patch only has the fix, the previous patch was also deleting trailing spaces.
Comment 9 Neil Hauge CLA 2011-04-26 17:42:04 EDT
Committed to head.