Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 289644 - [fwkAdmin] bundles.info should be in UTF-8
Summary: [fwkAdmin] bundles.info should be in UTF-8
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: p2 (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 3.6 M7   Edit
Assignee: Andrew Niefer CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 307472 (view as bug list)
Depends on:
Blocks: 307472 457176
  Show dependency tree
 
Reported: 2009-09-16 14:41 EDT by bungeman CLA
Modified: 2015-01-12 13:19 EST (History)
3 users (show)

See Also:


Attachments
patch (4.26 KB, patch)
2010-04-09 17:34 EDT, Andrew Niefer CLA
no flags Details | Diff
patch (7.04 KB, patch)
2010-04-12 14:18 EDT, Andrew Niefer CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description bungeman CLA 2009-09-16 14:41:19 EDT
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.13) Gecko/2009080315 Ubuntu/9.04 (jaunty) Firefox/3.0.13
Build Identifier: M20090211-1700

The current code to read and write the bundles.info file uses the default encoding. For example in org.eclipse.equinox.internal.simpleconfigurator.utils.SimpleConfiguratorUtils#readConfiguration(URL, URI)

r = new BufferedReader(new InputStreamReader(url.openStream()));

should be

r = new BufferedReader(new InputStreamReader(url.openStream(), Charset.forName("UTF-8"));

The writer would need to be updated similarly.

There is an RCP application for which a bundles.info is generated during the build. Currently this is not a problem as all of the bundle names contain only lower ASCII characters which map the same in almost any character set. However, should there ever be a bundle with an odd name in the future, the bundles.info encoded on the build machine may not deserialize correctly on a customer machine with a different default encoding. A similar situation could occur in the rarer case that the default encoding should change.

Reproducible: Always

Steps to Reproduce:
1. Generate a bundles.info file with a bundle in a file named ಠ_ಠ.jar (note that this is "\u3232_\u3232.jar", as I realize bugzilla may bugger it up itself, or you may not have a font with this glyph).
2. Change the default character set, or move the file to a machine with a different default character set.
3. Watch the a jar not get picked up as a bundle.



characters != bytes && !plainTextFile.exists()
Comment 1 bungeman CLA 2009-09-16 14:47:09 EDT
It appears I was right, bugzilla did bugger it up. It appears that the bug submission sanitizes the input (converted the character in the edit box which maps to U+3232 to ಠ) and the display end does it again (converted the & in ಠ to &). Converting back and forth between text encodings leads to headache.
Comment 2 bungeman CLA 2009-09-16 15:01:27 EDT
It appears I did the conversions wrong myself. The name should read "\u0CA0_\u0CA0.jar". I apparently forgot that ಠ is decimal and should map to U+0CA0. Sorry for any confusion this may cause.
Comment 3 Pascal Rapicault CLA 2009-09-16 21:50:23 EDT
See also bug #289544.
Comment 4 John Arthorne CLA 2009-10-28 23:07:55 EDT
See also bug 282554 for a similar problem with eclipse.ini.
Comment 5 Andrew Niefer CLA 2010-04-09 17:34:51 EDT
Created attachment 164437 [details]
patch

proposed patch
Comment 6 Andrew Niefer CLA 2010-04-12 14:18:17 EDT
Created attachment 164605 [details]
patch

Updated patch.  Don't write utf-8 if the simpleconfigurator is an older version that won't be able to read it.
Comment 7 Andrew Niefer CLA 2010-04-12 14:18:40 EDT
this patch is released.
Comment 8 Darin Wright CLA 2010-04-19 17:05:55 EDT
*** Bug 307472 has been marked as a duplicate of this bug. ***