Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 77790

Summary: [Bidi] TextLayout's compliance with the Unicode BiDi algorithm
Product: [Eclipse Project] Platform Reporter: Pratik Shah <ppshah>
Component: SWTAssignee: Platform-SWT-Inbox <platform-swt-inbox>
Status: CLOSED WORKSFORME QA Contact: Felipe Heidrich <eclipse.felipe>
Severity: minor    
Priority: P3 CC: matial, nikita, peter, stori
Version: 3.0Keywords: triaged
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard:

Description Pratik Shah CLA 2004-11-03 19:19:53 EST
What underlying algorithm does TextLayout use?  How compliant does TextLayout 
intend to be with the Unicode BiDi algorithm?  Right now, there are several 
discrepancies.  Eg., "-10%  \ufeef\ufeeb\ufeec"
Comment 1 Felipe Heidrich CLA 2004-11-04 11:57:19 EST
In Windows we use Uniscribe and in GTK we use PangoLayout (which uses 
fribidi), both algorithm were exhautively tested against Bidi Reference Code. 
So I really don't think TextLayout can have "several discrepancies" as you say.

Also keep in mind that setting the orientation changes the behaviour of the 
algorithm, in TextLayout use setOrientation passing SWT.RIGHT_TO_LEFT, if you 
are testing with StyledText please pass the same flag for the constructor.


With the SWT.RIGHT_TO_LEFT flag on the reorder of the example you gave is:
-10% ABC => CBA %10-

Without SWT.RIGHT_TO_LEFT (base direction = left-to-right) the result should 
be:
-10% ABC => -10% CBA


Can I close this PR or do you have some other scenario you want me to look at ?
Comment 2 Pratik Shah CLA 2004-11-04 12:58:21 EST
With SWT.RIGHT_TO_LEFT, shouldn't the correct format be: CBA -10%
Comment 3 Felipe Heidrich CLA 2004-11-04 13:44:26 EST
I tested with both (uniscribe and fribidi) and the result is: CBA -10%
Comment 4 Randy Hudson CLA 2004-11-04 14:08:34 EST
Felipe, I'm seeing %10-.  Let's ignore the on-screen presentation.
The levels returned for the first 4 characters are 1221.  My understanding is 
that they would have to be 2222 to render "-10%".


public static void main(String[] args) {
	final Display display = new Display();
	TextLayout layout = new TextLayout(display);
	layout.setOrientation(SWT.RIGHT_TO_LEFT);
	String text = "-10%";
	layout.setText(text);
	for (int i = 0; i < text.length(); i++)
		System.out.println(layout.getLevel(i));
}
Comment 5 Felipe Heidrich CLA 2004-11-04 14:50:10 EST
be sure added that to bug 77790

Sorry, bad copy&paste.
The result is "CBA %10-"
The levels are 1221.
Comment 6 Randy Hudson CLA 2004-11-04 16:36:14 EST
See test #13:
http://crl.nmsu.edu/~mleisher/ucdata.html
Comment 7 Felipe Heidrich CLA 2004-11-04 17:04:27 EST
Okay, given the string "-10% ABC"
If LTR is set the output is "-10% CBA" on Windows and GTK.
If RTL is set the output is "CBA %10-" on Windows and GTK.
SWT results are consistent with the platform and are also consistent between 
GTK and Windows. What else can you need ?

If you want we can add a bidi expert to the CC list (like Mati Allouche).
Comment 8 Felipe Heidrich CLA 2004-11-04 17:07:33 EST
See http://fribidi.sourceforge.net/
Comment 9 Randy Hudson CLA 2004-11-04 18:10:33 EST
Felipe, we're not experts either.  Please see our reference URL for the reason 
we were expecting "-10%".  It is not a question of inconsistency across SWT 
platforms.

Also, note that the JDK reports level 2 for all characters:

java.text.Bidi bidi = new Bidi(message,  Bidi.DIRECTION_RIGHT_TO_LEFT);
for (int i = 0; i < message.length(); i++) {
	System.out.print(bidi.getLevelAt(i) +" ");
}

Results in:

2 2 2 2
Comment 10 Felipe Heidrich CLA 2004-11-04 18:16:22 EST
Hi Mati, Randy and I are trying to understand what is the "right bidi 
behavior", this time the problem is about the "Unicode BiDi algorithm".
Could you (or other bidi expert like Semion and Lina) help us to determine if 
you actually have a bug or not.
Thank you very much.
Comment 11 Matitiahu Allouche CLA 2004-11-05 05:16:45 EST
With SWT.RIGHT_TO_LEFT, the transformation "-10% ABC => CBA %10-" is certainly 
wrong.
The first problem is with the percent sign.  It is classified in Unicode as ET 
(European Terminator), which means that when adjacent to a number, it must be 
handled just like a digit.  In our case, it must be displayed on the right 
of "10".
The situation with the minus sign is more complicated, since its 
classification changed from Unicode 4.0 to Unicode 4.1.  Formerly, it was also 
classified ET, which means that "-10%" must display as "-10%" whether with 
orientation LTR or RTL.
In Unicode 4.1, the plus and minus signs were reclassified as ES (European 
Separator), which means that they will be handled as digits only if they have 
digits adjacent on both sides, which is not the case in our example. As a 
result, with orientation LTR, "-10%" must be displayed as "-10%" (unless 
preceded by a Hebrew word, in wich case it must be displayed as "10%-"), and 
with orientation RTL, "-10%" must be displayed as "10%-" (unless preceded by 
an English word, in which case it must displayed as "-10%").
Depending whether your implementation is designed to implement Unicode 4.1 or 
an older version, you should choose the proper classification for the minus 
sign.
Comment 12 Felipe Heidrich CLA 2004-11-05 11:52:46 EST
Thanks Mati.

Fribidi is Unicode 3.2, so they probably know about this problem, but I will 
report a bug against them anyway.
On Windows, I will play with some flags in Uniscribe but I'm not sure I can do 
anything more than that.
Comment 13 Pratik Shah CLA 2004-11-11 17:03:43 EST
Felipe, is the Uniscribe on Windows compliant with Unicode 4.0.1?

I'm noticing a couple of other discrepancies (on Windows):

1) Another case:
DID YOU SAY 'he said "car MEANS CAR"'?
I believe the correct format for this should be: 
?'he said "RAC SNAEM car"' YAS UOY DID

2) Also, the 4.0.1 Unicode BiDi algorithm mentions that a line should wrap 
whenever a paragraph separator is encountered.  However TextLayout does not do 
that.
Comment 14 Felipe Heidrich CLA 2004-11-11 17:11:31 EST
From MSDN:
"The rules governing the shaping and positioning of glyphs are specified and 
catalogued in The Unicode Standard: Worldwide Character Encoding, Version 2.0, 
Addison-Wesley Publishing Company. "

Comment 15 Matitiahu Allouche CLA 2004-11-12 04:31:23 EST
This is a remark on Comment #13.
It may not be obvious from the text of the Unicode Bidi Algorithm, but the 
example 
DID YOU SAY 'he said "car MEANS CAR"'?
mandates the use of directional formatting characters (LRE, RLE, PDF) at the 
right boundaries between directional runs.  More precisely, the logical stream 
must be:
<RLE>DID YOU SAY '<LRE>he said "<RLE>car MEANS CAR<PDF>"<PDF>'?<PDF>

Did you use the above string in your test?
Comment 16 Randy Hudson CLA 2004-11-12 11:11:15 EST
We didn't use embedding overrides but need to. Please ignore.
Comment 17 Felipe Heidrich CLA 2009-08-17 16:28:41 EDT
Your bug has been moved to triage, visit http://www.eclipse.org/swt/triage.php for more info.
Comment 18 Leo Ufimtsev CLA 2017-08-03 12:27:23 EDT
This is a one-off bulk update. (The last one in the triage migration).

Moving bugs from swt-triaged@eclipse to platform-swt-inbox@eclipse.org and adding "triaged" keyword as per new triage process:
https://wiki.eclipse.org/SWT/Devel/Triage

See Bug 518478 for details.

Tag for notification/mail filters:
@TriageBulkUpdate