Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 328828 - PDE API Tooling crashes if plugin.xml contains non-ascii characters
Summary: PDE API Tooling crashes if plugin.xml contains non-ascii characters
Status: VERIFIED FIXED
Alias: None
Product: PDE
Classification: Eclipse Project
Component: API Tools (show other bugs)
Version: 3.6.1   Edit
Hardware: PC Windows XP
: P3 major (vote)
Target Milestone: 3.7 M4   Edit
Assignee: Olivier Thomann CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-27 09:52 EDT by Holger Herzog CLA
Modified: 2010-11-02 13:04 EDT (History)
3 users (show)

See Also:


Attachments
Eclipse Error Log File (579.29 KB, text/plain)
2010-10-27 09:56 EDT, Holger Herzog CLA
no flags Details
example plugin.xml (701 bytes, application/octet-stream)
2010-10-27 10:03 EDT, Holger Herzog CLA
no flags Details
Proposed fix + regression test (4.64 KB, patch)
2010-10-28 10:03 EDT, Olivier Thomann CLA
no flags Details | Diff
test zip file (701 bytes, application/octet-stream)
2010-10-28 10:03 EDT, Olivier Thomann CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Holger Herzog CLA 2010-10-27 09:52:16 EDT
Build Identifier: M20100909-0800

When having non ASCII characters in plugin.xml files that are analyzed by API tooling (occurs in the source as in the baseline binaries as well), API tooling crashes. In some cases it produces the following log entry: "Error logged from API Tools Core: Unable to parse XML document. SAXParseException: Content is not allowed in trailing section."

As a consequence, filtered API tooling compile error markers may remain unremovable in the projects, which makes API tooling unusable in a professional team setup.

The cause of this bug can be found in the follwing method of class org.eclipse.pde.api.tools.internal.util.Util:

public static char[] getInputStreamAsCharArray(InputStream stream, int length, String encoding) throws IOException {

In the last line of this method (M20100909-0800), the following call is made in the _intention_ to convert a byte buffer's content into a char array:

return charsetDecoder.decode(byteBuffer).array();

In fact, the called method CharBuffer.array() does return an array, but this array is a buffer which *may* be longer that it's payload. This e.g. is the case if the incoming byte buffer containes some UTF-8 escape sequences (i.e. non-ASCII characters). The dimension of the char array initially equals the number of bytes. Given some non-ASCII characters, we have a little bit more bytes than characters. As a consequence, the char array remains beeing a bit too big; in my case beeing padded with trailing 0-bytes.

Util.getInputStreamAsCharArray() returns the whole char buffer array, including the trailing 0-bytes. This later is passed to the SAX parser which crashes indicating invalid content in trailing section. (see above)

Reproducible: Always

Steps to Reproduce:
1. add non ascii-characters into a plugin.xml that is relevant for API tooling (either in the source or in the baseline)
2. activate api tooling or trigger a full rebuild
Comment 1 Holger Herzog CLA 2010-10-27 09:56:22 EDT
Created attachment 181837 [details]
Eclipse Error Log File

The eclipse error log containing many exceptions of the type described.
Comment 2 Olivier Thomann CLA 2010-10-27 09:57:46 EDT
Could you please attach such a plugin.xml file ?
Comment 3 Holger Herzog CLA 2010-10-27 10:03:44 EDT
Created attachment 181838 [details]
example plugin.xml

An example plugin.xml that contains non-ASCII characters (e.g."Ä") and therefore causes this error.
Comment 4 Holger Herzog CLA 2010-10-27 10:51:38 EDT
The following patch fixes the problem in my environment. Of course someone need to review it, because I'm not very familiar with CharBuffers:

	public static char[] getInputStreamAsCharArray(InputStream stream, int length, String encoding) throws IOException {
		Charset charset = null;
		try {
			charset = Charset.forName(encoding);
		} catch (IllegalCharsetNameException e) {
			System.err.println("Illegal charset name : " + encoding); //$NON-NLS-1$
			return null;
		} catch(UnsupportedCharsetException e) {
			System.err.println("Unsupported charset : " + encoding); //$NON-NLS-1$
			return null;
		}
		CharsetDecoder charsetDecoder = charset.newDecoder();
		charsetDecoder.onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE);
		byte[] contents = getInputStreamAsByteArray(stream, length);
		ByteBuffer byteBuffer = ByteBuffer.allocate(contents.length);
		byteBuffer.put(contents);
		byteBuffer.flip();
		CharBuffer charBuffer = charsetDecoder.decode(byteBuffer);
		charBuffer.compact(); // ensure payload starting at 0
		char[] array = charBuffer.array();
		int lengthToBe = charBuffer.position();
		if (array.length > lengthToBe) {
			array = Arrays.copyOf(array, lengthToBe);
		}
		return array;
	}
Comment 5 Olivier Thomann CLA 2010-10-28 10:03:09 EDT
Created attachment 181937 [details]
Proposed fix + regression test

You also need the next attached file inside the same folder than UtilTest.
Comment 6 Olivier Thomann CLA 2010-10-28 10:03:58 EDT
Created attachment 181938 [details]
test zip file
Comment 7 Olivier Thomann CLA 2010-10-29 12:03:07 EDT
Released for 3.7M4.
Michael, please verify.
Comment 8 Olivier Thomann CLA 2010-10-29 12:03:35 EDT
Comment on attachment 181937 [details]
Proposed fix + regression test

ilog+ as this is a derived work on the previous patch.
Comment 9 Olivier Thomann CLA 2010-10-29 12:04:35 EDT
(In reply to comment #8)
> (From update of attachment 181937 [details] [diff])
> ilog+ as this is a derived work on the previous patch.
In fact no patch was submitted as is. So clearing the iplog flag.
Comment 10 Olivier Thomann CLA 2010-10-29 12:36:14 EDT
Fixed.
Comment 11 Michael Rennie CLA 2010-11-01 12:54:05 EDT
All smoke + unit tests pass. Verified.
Comment 12 Holger Herzog CLA 2010-11-02 13:04:49 EDT
Changed Version to 3.6.1 as this is the version where the bug was recognized.