Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 328828

Summary: PDE API Tooling crashes if plugin.xml contains non-ascii characters
Product: [Eclipse Project] PDE Reporter: Holger Herzog <holger.herzog>
Component: API ToolsAssignee: Olivier Thomann <Olivier_Thomann>
Status: VERIFIED FIXED QA Contact:
Severity: major    
Priority: P3 CC: darin.eclipse, Michael_Rennie, Olivier_Thomann
Version: 3.6.1   
Target Milestone: 3.7 M4   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Attachments:
Description Flags
Eclipse Error Log File
none
example plugin.xml
none
Proposed fix + regression test
none
test zip file none

Description Holger Herzog CLA 2010-10-27 09:52:16 EDT
Build Identifier: M20100909-0800

When having non ASCII characters in plugin.xml files that are analyzed by API tooling (occurs in the source as in the baseline binaries as well), API tooling crashes. In some cases it produces the following log entry: "Error logged from API Tools Core: Unable to parse XML document. SAXParseException: Content is not allowed in trailing section."

As a consequence, filtered API tooling compile error markers may remain unremovable in the projects, which makes API tooling unusable in a professional team setup.

The cause of this bug can be found in the follwing method of class org.eclipse.pde.api.tools.internal.util.Util:

public static char[] getInputStreamAsCharArray(InputStream stream, int length, String encoding) throws IOException {

In the last line of this method (M20100909-0800), the following call is made in the _intention_ to convert a byte buffer's content into a char array:

return charsetDecoder.decode(byteBuffer).array();

In fact, the called method CharBuffer.array() does return an array, but this array is a buffer which *may* be longer that it's payload. This e.g. is the case if the incoming byte buffer containes some UTF-8 escape sequences (i.e. non-ASCII characters). The dimension of the char array initially equals the number of bytes. Given some non-ASCII characters, we have a little bit more bytes than characters. As a consequence, the char array remains beeing a bit too big; in my case beeing padded with trailing 0-bytes.

Util.getInputStreamAsCharArray() returns the whole char buffer array, including the trailing 0-bytes. This later is passed to the SAX parser which crashes indicating invalid content in trailing section. (see above)

Reproducible: Always

Steps to Reproduce:
1. add non ascii-characters into a plugin.xml that is relevant for API tooling (either in the source or in the baseline)
2. activate api tooling or trigger a full rebuild
Comment 1 Holger Herzog CLA 2010-10-27 09:56:22 EDT
Created attachment 181837 [details]
Eclipse Error Log File

The eclipse error log containing many exceptions of the type described.
Comment 2 Olivier Thomann CLA 2010-10-27 09:57:46 EDT
Could you please attach such a plugin.xml file ?
Comment 3 Holger Herzog CLA 2010-10-27 10:03:44 EDT
Created attachment 181838 [details]
example plugin.xml

An example plugin.xml that contains non-ASCII characters (e.g."Ä") and therefore causes this error.
Comment 4 Holger Herzog CLA 2010-10-27 10:51:38 EDT
The following patch fixes the problem in my environment. Of course someone need to review it, because I'm not very familiar with CharBuffers:

	public static char[] getInputStreamAsCharArray(InputStream stream, int length, String encoding) throws IOException {
		Charset charset = null;
		try {
			charset = Charset.forName(encoding);
		} catch (IllegalCharsetNameException e) {
			System.err.println("Illegal charset name : " + encoding); //$NON-NLS-1$
			return null;
		} catch(UnsupportedCharsetException e) {
			System.err.println("Unsupported charset : " + encoding); //$NON-NLS-1$
			return null;
		}
		CharsetDecoder charsetDecoder = charset.newDecoder();
		charsetDecoder.onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE);
		byte[] contents = getInputStreamAsByteArray(stream, length);
		ByteBuffer byteBuffer = ByteBuffer.allocate(contents.length);
		byteBuffer.put(contents);
		byteBuffer.flip();
		CharBuffer charBuffer = charsetDecoder.decode(byteBuffer);
		charBuffer.compact(); // ensure payload starting at 0
		char[] array = charBuffer.array();
		int lengthToBe = charBuffer.position();
		if (array.length > lengthToBe) {
			array = Arrays.copyOf(array, lengthToBe);
		}
		return array;
	}
Comment 5 Olivier Thomann CLA 2010-10-28 10:03:09 EDT
Created attachment 181937 [details]
Proposed fix + regression test

You also need the next attached file inside the same folder than UtilTest.
Comment 6 Olivier Thomann CLA 2010-10-28 10:03:58 EDT
Created attachment 181938 [details]
test zip file
Comment 7 Olivier Thomann CLA 2010-10-29 12:03:07 EDT
Released for 3.7M4.
Michael, please verify.
Comment 8 Olivier Thomann CLA 2010-10-29 12:03:35 EDT
Comment on attachment 181937 [details]
Proposed fix + regression test

ilog+ as this is a derived work on the previous patch.
Comment 9 Olivier Thomann CLA 2010-10-29 12:04:35 EDT
(In reply to comment #8)
> (From update of attachment 181937 [details] [diff])
> ilog+ as this is a derived work on the previous patch.
In fact no patch was submitted as is. So clearing the iplog flag.
Comment 10 Olivier Thomann CLA 2010-10-29 12:36:14 EDT
Fixed.
Comment 11 Michael Rennie CLA 2010-11-01 12:54:05 EDT
All smoke + unit tests pass. Verified.
Comment 12 Holger Herzog CLA 2010-11-02 13:04:49 EDT
Changed Version to 3.6.1 as this is the version where the bug was recognized.