| Summary: | [Bidi] TextLayout's compliance with the Unicode BiDi algorithm | ||
|---|---|---|---|
| Product: | [Eclipse Project] Platform | Reporter: | Pratik Shah <ppshah> |
| Component: | SWT | Assignee: | Platform-SWT-Inbox <platform-swt-inbox> |
| Status: | CLOSED WORKSFORME | QA Contact: | Felipe Heidrich <eclipse.felipe> |
| Severity: | minor | ||
| Priority: | P3 | CC: | matial, nikita, peter, stori |
| Version: | 3.0 | Keywords: | triaged |
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Windows XP | ||
| Whiteboard: | |||
|
Description
Pratik Shah
In Windows we use Uniscribe and in GTK we use PangoLayout (which uses fribidi), both algorithm were exhautively tested against Bidi Reference Code. So I really don't think TextLayout can have "several discrepancies" as you say. Also keep in mind that setting the orientation changes the behaviour of the algorithm, in TextLayout use setOrientation passing SWT.RIGHT_TO_LEFT, if you are testing with StyledText please pass the same flag for the constructor. With the SWT.RIGHT_TO_LEFT flag on the reorder of the example you gave is: -10% ABC => CBA %10- Without SWT.RIGHT_TO_LEFT (base direction = left-to-right) the result should be: -10% ABC => -10% CBA Can I close this PR or do you have some other scenario you want me to look at ? With SWT.RIGHT_TO_LEFT, shouldn't the correct format be: CBA -10% I tested with both (uniscribe and fribidi) and the result is: CBA -10% Felipe, I'm seeing %10-. Let's ignore the on-screen presentation.
The levels returned for the first 4 characters are 1221. My understanding is
that they would have to be 2222 to render "-10%".
public static void main(String[] args) {
final Display display = new Display();
TextLayout layout = new TextLayout(display);
layout.setOrientation(SWT.RIGHT_TO_LEFT);
String text = "-10%";
layout.setText(text);
for (int i = 0; i < text.length(); i++)
System.out.println(layout.getLevel(i));
}
be sure added that to bug 77790 Sorry, bad copy&paste. The result is "CBA %10-" The levels are 1221. See test #13: http://crl.nmsu.edu/~mleisher/ucdata.html Okay, given the string "-10% ABC" If LTR is set the output is "-10% CBA" on Windows and GTK. If RTL is set the output is "CBA %10-" on Windows and GTK. SWT results are consistent with the platform and are also consistent between GTK and Windows. What else can you need ? If you want we can add a bidi expert to the CC list (like Mati Allouche). Felipe, we're not experts either. Please see our reference URL for the reason
we were expecting "-10%". It is not a question of inconsistency across SWT
platforms.
Also, note that the JDK reports level 2 for all characters:
java.text.Bidi bidi = new Bidi(message, Bidi.DIRECTION_RIGHT_TO_LEFT);
for (int i = 0; i < message.length(); i++) {
System.out.print(bidi.getLevelAt(i) +" ");
}
Results in:
2 2 2 2
Hi Mati, Randy and I are trying to understand what is the "right bidi behavior", this time the problem is about the "Unicode BiDi algorithm". Could you (or other bidi expert like Semion and Lina) help us to determine if you actually have a bug or not. Thank you very much. With SWT.RIGHT_TO_LEFT, the transformation "-10% ABC => CBA %10-" is certainly wrong. The first problem is with the percent sign. It is classified in Unicode as ET (European Terminator), which means that when adjacent to a number, it must be handled just like a digit. In our case, it must be displayed on the right of "10". The situation with the minus sign is more complicated, since its classification changed from Unicode 4.0 to Unicode 4.1. Formerly, it was also classified ET, which means that "-10%" must display as "-10%" whether with orientation LTR or RTL. In Unicode 4.1, the plus and minus signs were reclassified as ES (European Separator), which means that they will be handled as digits only if they have digits adjacent on both sides, which is not the case in our example. As a result, with orientation LTR, "-10%" must be displayed as "-10%" (unless preceded by a Hebrew word, in wich case it must be displayed as "10%-"), and with orientation RTL, "-10%" must be displayed as "10%-" (unless preceded by an English word, in which case it must displayed as "-10%"). Depending whether your implementation is designed to implement Unicode 4.1 or an older version, you should choose the proper classification for the minus sign. Thanks Mati. Fribidi is Unicode 3.2, so they probably know about this problem, but I will report a bug against them anyway. On Windows, I will play with some flags in Uniscribe but I'm not sure I can do anything more than that. Felipe, is the Uniscribe on Windows compliant with Unicode 4.0.1? I'm noticing a couple of other discrepancies (on Windows): 1) Another case: DID YOU SAY 'he said "car MEANS CAR"'? I believe the correct format for this should be: ?'he said "RAC SNAEM car"' YAS UOY DID 2) Also, the 4.0.1 Unicode BiDi algorithm mentions that a line should wrap whenever a paragraph separator is encountered. However TextLayout does not do that. From MSDN: "The rules governing the shaping and positioning of glyphs are specified and catalogued in The Unicode Standard: Worldwide Character Encoding, Version 2.0, Addison-Wesley Publishing Company. " This is a remark on Comment #13. It may not be obvious from the text of the Unicode Bidi Algorithm, but the example DID YOU SAY 'he said "car MEANS CAR"'? mandates the use of directional formatting characters (LRE, RLE, PDF) at the right boundaries between directional runs. More precisely, the logical stream must be: <RLE>DID YOU SAY '<LRE>he said "<RLE>car MEANS CAR<PDF>"<PDF>'?<PDF> Did you use the above string in your test? We didn't use embedding overrides but need to. Please ignore. Your bug has been moved to triage, visit http://www.eclipse.org/swt/triage.php for more info. This is a one-off bulk update. (The last one in the triage migration). Moving bugs from swt-triaged@eclipse to platform-swt-inbox@eclipse.org and adding "triaged" keyword as per new triage process: https://wiki.eclipse.org/SWT/Devel/Triage See Bug 518478 for details. Tag for notification/mail filters: @TriageBulkUpdate |