Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 232720 - [BIDI]The result is not correct when preview in DOC/PPT and the result is not consistent when open the ppt file with PPT2003 and PPT 2007 [1202] [1205]
Summary: [BIDI]The result is not correct when preview in DOC/PPT and the result is not...
Status: CLOSED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: BIRT (show other bugs)
Version: 2.3.0   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 2.5.1   Edit
Assignee: Gang Liu CLA
QA Contact: Xiaodan Wang CLA
URL:
Whiteboard: Non-Auto
Keywords:
Depends on:
Blocks:
 
Reported: 2008-05-19 02:09 EDT by Xiaodan Wang CLA
Modified: 2009-06-22 23:01 EDT (History)
7 users (show)

See Also:
wenfeng.fwd: iplog+
wenfeng.fwd: pmc_approved+


Attachments
report design (5.25 KB, text/plain)
2008-05-19 02:09 EDT, Xiaodan Wang CLA
no flags Details
screenshot (57.04 KB, image/pjpeg)
2008-05-19 02:14 EDT, Xiaodan Wang CLA
no flags Details
screenshot from PPT 2003 (7.97 KB, image/jpeg)
2008-07-23 13:31 EDT, Lina Kemmel CLA
no flags Details
zip file of doc file, ppt file and screenshot (54.65 KB, application/x-zip-compressed)
2008-07-28 06:09 EDT, Xiaodan Wang CLA
no flags Details
Patch (5.74 KB, patch)
2008-07-28 14:08 EDT, Lina Kemmel CLA
Lina.Kemmel: review?
Details | Diff
zip file of doc file, ppt file (2.55 KB, application/x-zip-compressed)
2008-07-29 03:26 EDT, Xiaodan Wang CLA
no flags Details
screenshot in 2007 (28.24 KB, image/pjpeg)
2008-10-15 23:24 EDT, Jun Ouyang CLA
no flags Details
Another screenshot showing it working (68.21 KB, image/jpeg)
2008-12-08 12:10 EST, Lina Kemmel CLA
no flags Details
File generated by ppt 2007 (118.21 KB, application/vnd.ms-powerpoint)
2009-02-25 00:39 EST, Jun Ouyang CLA
no flags Details
File generated by BIRT (7.33 KB, application/vnd.ms-powerpoint)
2009-02-25 00:40 EST, Jun Ouyang CLA
no flags Details
Patch to fix incomplete lang attribute value specification (1.76 KB, patch)
2009-03-18 12:10 EDT, Lina Kemmel CLA
Lina.Kemmel: review?
Details | Diff
the zip file of the generated PPT and HTML (2.65 KB, application/x-zip-compressed)
2009-05-20 23:31 EDT, Xiaodan Wang CLA
no flags Details
screenshot (94.90 KB, image/jpeg)
2009-05-20 23:32 EDT, Xiaodan Wang CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Xiaodan Wang CLA 2008-05-19 02:09:46 EDT
Created attachment 100861 [details]
report design

Description:
The result is not consistent when preview the report containing arabic words in PPT 2003 and 2007.

Build number:
2.3.0.v20080519-0630

Steps to reproduce:
1. Preview the attached report design.

Expected result:
The text is reversed.

Actual result:
See the screenshot.

Error log:
N/A
Comment 1 Xiaodan Wang CLA 2008-05-19 02:14:18 EDT
Created attachment 100864 [details]
screenshot
Comment 2 Lina Kemmel CLA 2008-07-23 13:31:16 EDT
Created attachment 108230 [details]
screenshot from PPT 2003

Does the problem occur on PPT 2003 or 2007?

Here is what I see in PPT 2003. Please notice that 2003 doesn't render Bidi text correctly on a non-Bidi enabled OS.
Comment 3 Lina Kemmel CLA 2008-07-23 13:34:06 EDT
The screenshot from PPT 2003 shows Arabic text reordered correctly.
Comment 4 Xiaodan Wang CLA 2008-07-28 06:09:48 EDT
Created attachment 108530 [details]
zip file of doc file, ppt file and screenshot

The Arabic words in the doc file generated from the report are in opposite direction with those in ppt file generated from the report.
Comment 5 Lina Kemmel CLA 2008-07-28 07:21:24 EDT
I was able to reproduce the problem. It seems to be connected with the OS locale. PPT 2003 requires Bidi as a primary locale, not just basic Bidi enablement.

It did work for me in the Hebrew locale (as shown in the screenshot - attachment 108230 [details]), but I see the problem reported in the Russian locale.
Comment 6 Lina Kemmel CLA 2008-07-28 07:32:46 EDT
I will try to figure out if for PPT there can be added some markup to specify character run direction explicitly, like what we are doing for DOC.
Comment 7 Lina Kemmel CLA 2008-07-28 13:42:28 EDT
Sorry, accidentally ran on an old build. The problem is not reproducible for me even in a non-Bidi locale.

Xiaodan, can you please give some details on your runtime environment?

- Which PPT version is experiencing the problem?
- What is the machine locale?
- Does the OS support Bidi as a primary or supplemental language? (On WinXP, "Control Panel -> Regional and Language Options -> Languages -> Install files for complex script..." enables Bidi languages as supplemental ones.)
Comment 8 Lina Kemmel CLA 2008-07-28 14:08:20 EDT
Created attachment 108555 [details]
Patch

Anyway, we can write language and directional attributes out.

The patch includes the following changes:

- Renderer drives now the 'rtl' text style rather from the run level of the specific text fragment being processed, than on the whole paragraph level,

- PPT writer adds two new attributes:
  (a) 'dir', with possible values 'ltr' and 'rtl' matching the text style set by 
      PageDeviceRender,
  (b) 'lang', with currently possible values 'HE' (Hebrew), 'AR' (Arabic) and
      'EN-US' (Englisg US). Additional languages can be also addressed if
      necessary.
Comment 9 Xiaodan Wang CLA 2008-07-29 03:26:33 EDT
Created attachment 108609 [details]
zip file of doc file, ppt file

(In reply to comment #7)
> Sorry, accidentally ran on an old build. The problem is not reproducible for me
> even in a non-Bidi locale.
> Xiaodan, can you please give some details on your runtime environment?
> - Which PPT version is experiencing the problem?
> - What is the machine locale?
> - Does the OS support Bidi as a primary or supplemental language? (On WinXP,
> "Control Panel -> Regional and Language Options -> Languages -> Install files
> for complex script..." enables Bidi languages as supplemental ones.)

Lina,
Here are the infomations you need:
PPT version: (11.8169.8202) SP3
Machine locale: English (United States)
Supplemental language support: the "Control Panel -> Regional and Language Options -> Languages -> Install files for complex script..." checkbox in is tick
Comment 10 Mohamed El-Kholy CLA 2008-08-04 08:22:58 EDT
Hello 
I have the same configuration (US locale and same version of Power point), the Arabic text is displayed correctly on my machine.



Comment 11 Jun Ouyang CLA 2008-10-15 02:42:38 EDT
The patch is applied.
Comment 12 Jun Ouyang CLA 2008-10-15 23:24:30 EDT
Created attachment 115215 [details]
screenshot in 2007

The direction is wrong in ppt 2007
Comment 13 Jun Ouyang CLA 2008-10-15 23:25:23 EDT
The direction is correct in ppt 2003.
Comment 14 Lina Kemmel CLA 2008-12-08 12:10:19 EST
Created attachment 119817 [details]
Another screenshot showing it working

Guys, I am puzzled by this bug... Seems to work for me (including recognizing Arabic characters by PPT 2007), on English or Russian machine.
Comment 15 Lina Kemmel CLA 2009-02-24 11:57:18 EST
Jun,
Can you please create in PPT 2007 itself a file containing a few Arabic words separated with a white space and attach the file it to the bugzilla?
Thanks!
Comment 16 Jun Ouyang CLA 2009-02-25 00:35:52 EST
Lima,

I created a ppt file with ppt 2007 as you said, BIDI worked fine.
I compared the file with the one generated by BIRT and found that the key difference is the attribute "lang". The file created by Powerpoint 2007 uses "lang=3D'AR-DZ'", the file generated by BIRT uses "lang=3D'AR'".

I changed the attribute to "lang=3D'AR-DZ'" or "lang=3D'AR-IQ'" etc., then the BIDI was ok in both 2003 and 2007.
Comment 17 Jun Ouyang CLA 2009-02-25 00:39:48 EST
Created attachment 126674 [details]
File generated by ppt 2007
Comment 18 Jun Ouyang CLA 2009-02-25 00:40:29 EST
Created attachment 126675 [details]
File generated by BIRT
Comment 19 Lina Kemmel CLA 2009-02-25 06:31:35 EST
Jun, this is great news, thanks!

I will change the lang value to AR-XX (probably AR-DZ to be on the safe side) in the code.

(Weird though, it works for us with "lang=3D'AR'" too... That said, reliable testing of the fix cannot be done on our side ;))
Comment 20 Lina Kemmel CLA 2009-03-18 12:10:04 EDT
Created attachment 129247 [details]
Patch to fix incomplete lang attribute value specification

Replaced 'AR' with 'AR-DZ', and also 'HE' with 'HE-IL'.
Comment 21 Wei Yan CLA 2009-05-18 02:04:20 EDT
defer to future as we have no resource to resolve the BIDI issues.
Comment 22 Yu Chen CLA 2009-05-18 23:29:37 EDT
Patch applied.
Comment 23 Xiaodan Wang CLA 2009-05-20 23:31:34 EDT
Created attachment 136596 [details]
the zip file of the generated PPT and HTML

These are generated with build (2.5.0.v20090521-0630), and the result of PPT and HTML are still not the same.
Comment 24 Xiaodan Wang CLA 2009-05-20 23:32:01 EDT
Created attachment 136597 [details]
screenshot
Comment 25 Xiaodan Wang CLA 2009-05-20 23:32:53 EDT
Reopen for further investigation.
Comment 26 JingwenShen CLA 2009-05-27 04:44:23 EDT
Hi Lina,
After the patch in ppt, i found there are still some problems in PPTWriter:

If user doesn't set "rtl" properties on text element, "dir=3D'rtl' lang=3D'AR-DZ'" still should to be outputted when it is UCharacter.UnicodeBlock.HEBREW or UCharacter.UnicodeBlock.ARABIC.
So the hebrew text or arabic text will displayed correctly.

The code should be corrected as follows:
private String buildI18nAttributes( String text, boolean rtl )
	{
		if ( text == null )
			return ""; //$NON-NLS-1$

		for ( int i = text.length( ); i-- > 0; )
		{
			UnicodeBlock block = UCharacter.UnicodeBlock.of( text.charAt( i ) );
			// If there is a Hebrew or Arabic content, write the
			// corresponding language attribute
			if ( UCharacter.UnicodeBlock.HEBREW.equals( block ) )
			{
				return " dir=3D'rtl' lang=3D'HE-IL'"; //$NON-NLS-1$
			}
			if ( UCharacter.UnicodeBlock.ARABIC.equals( block )
					|| UCharacter.UnicodeBlock.ARABIC_PRESENTATION_FORMS_A
							.equals( block )
					|| UCharacter.UnicodeBlock.ARABIC_PRESENTATION_FORMS_B
							.equals( block )
					|| UCharacter.UnicodeBlock.ARABIC_SUPPLEMENT.equals( block ) )
			{
				return " dir=3D'rtl' lang=3D'AR-DZ'"; //$NON-NLS-1$
			}
		}
		// If no actual RTL content was found (e.g. in case the text
		// consists of sheer neutral characters), indicate Arabic language
		if ( rtl )
			return " dir=3D'rtl' lang=3D'AR-DZ'"; //$NON-NLS-1$
		else
		{
			// XXX Other language attributes can be addressed as needed
			return " dir=3D'ltr' lang=3D'EN-US'"; //$NON-NLS-1$
		}
	}

Comment 27 Lina Kemmel CLA 2009-05-27 05:33:33 EDT
Hi,

I am not sure this is necessary... 
Does it work after the change?

'rtl' here is not a property set by user, but a character run property resolved by the Bidi engine behind the scenes. The outcome of this Bidi resolution depends on various factors, not only intrinsic character properties.
As a rule, 'rtl' will match a run of Hebrew/Arabic and associated characters, and 'ltr' - non-Bidi (e.g. English) literals and associated characters.
However, this can be different, e.g. in presence of control characters.

So I believe we should respect this 'rtl' property rather than intrinsic character properties (decided here based on belonging to a UnicodeBlock).
Comment 28 JingwenShen CLA 2009-06-09 02:58:52 EDT
Hi Lina,

Yes, it works after that change.
The "rtl" flag in the code is got from text style and the text style is set by user.

If user set "rtl", all Hebrew/Arabic and associated characters need to be displayed reversal and the report also need to be start from right to left.

If user dont set "rtl", all Hebrew/Arabic and associated characters still need to be displayed reversal and the report does not need.

So here, whether user set "rtl" flag or not , the logic in the circulation is still needed. 
Comment 29 Lina Kemmel CLA 2009-06-09 11:25:43 EDT
Hi Jingwen,

Based on the patch from 2008-07-28 https://bugs.eclipse.org/bugs/attachment.cgi?id=108555&action=diff:

    boolean rtl = text instanceof TextArea ? ( ( (TextArea) text )
	.getRunLevel( ) & 1 ) != 0 : CSSConstants.CSS_RTL_VALUE 
        .equals( style.getProperty( IStyle.STYLE_DIRECTION ) );

-- for TextArea 'rtl' is got from the run level and not text style.

However, I can't locate this change in the current code base.
I will try to figure out why it is not present.

I totally agree with you that the Arabic/Hebrew characters should be usually reversed regardless of the paragraph direction; however, I think it would be more correct if we apply reverse to characters with an odd run level [which Arabic/Hebrew usually are]...
Comment 30 Gang Liu CLA 2009-06-10 03:44:54 EDT
fixed.
Comment 31 Xiaodan Wang CLA 2009-06-22 23:01:26 EDT
Verified in build (2.5.1.v20090623-0630), closed.