Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 575820

Summary: Regex POSIX \p{IsPunct} and \p{Punct} behave differently with OpenJDK
Product: [Eclipse Project] JDT Reporter: Von Landon <landonvg>
Component: CoreAssignee: JDT-Core-Inbox <jdt-core-inbox>
Status: CLOSED NOT_ECLIPSE QA Contact:
Severity: normal    
Priority: P3 CC: twolf
Version: 4.18   
Target Milestone: ---   
Hardware: PC   
OS: Windows 10   
Whiteboard:

Description Von Landon CLA 2021-09-03 17:30:20 EDT
Using the default JRE installed with Eclipse, evaluating ">".matches("(\\p{IsPunct})") returns false, but it should be true.  Evaluating ">".matches("(\\p{Punct})") returns true as expected.  Switching to Amazon Corretto results in both evaluating to true, as expected.
Comment 1 Thomas Wolf CLA 2021-09-04 07:29:13 EDT
TL;DR: not a JDT problem; it's a difference in Java libraries and/or Unicode versions used by certain Java versions.

I see differences between the Java 1.8 libraries and Java >= 11 libraries.

Per the JLS:[1]: "Versions of the Java programming language prior to JDK 1.1 used Unicode 1.1.5. Upgrades to newer versions of the Unicode Standard occurred in JDK 1.1 (to Unicode 2.0), JDK 1.1.7 (to Unicode 2.1), Java SE 1.4 (to Unicode 3.0), Java SE 5.0 (to Unicode 4.0), Java SE 7 (to Unicode 6.0), Java SE 8 (to Unicode 6.2), Java SE 9 (to Unicode 8.0), Java SE 11 (to Unicode 10.0), Java SE 12 (to Unicode 11.0), Java SE 13 (to Unicode 12.1), and Java SE 15 (to Unicode 13.0)."

In Unicode, ">" is not in a punctuation category but in the "mathematical symbols" category. Don't know if that ever changed between Unicode versions.

$ cat RegexTest.java 
public class RegexTest {

  public static void main(String[] args) {
    boolean a = ">".matches("\\p{IsPunct}");
    boolean b = ">".matches("\\p{Punct}");
    boolean c = ">".matches("\\p{IsPunctuation}");
    System.out.println("POSIX Result: IsPunct=" + a + ", Punct=" + b + ", IsPunctuation=" + c);
    a = ">".matches("(?U)\\p{IsPunct}");
    b = ">".matches("(?U)\\p{Punct}");
    c = ">".matches("(?U)\\p{IsPunctuation}");
    System.out.println("Unicode Result: IsPunct=" + a + ", Punct=" + b + ", IsPunctuation=" + c);
	}

}

$ JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-16.jdk/Contents/Home javac RegexTest.java
$ JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-16.jdk/Contents/Home java RegexTest 
POSIX Result: IsPunct=false, Punct=true, IsPunctuation=false
Unicode Result: IsPunct=false, Punct=false, IsPunctuation=false

$ JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-11.jdk/Contents/Home javac RegexTest.java 
$ JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-11.jdk/Contents/Home java RegexTest 
POSIX Result: IsPunct=false, Punct=true, IsPunctuation=false
Unicode Result: IsPunct=false, Punct=false, IsPunctuation=false

$ JAVA_HOME=/Library/Java/JavaVirtualMachines/amazon-corretto-11.jdk/Contents/Home javac RegexTest.java 
$ JAVA_HOME=/Library/Java/JavaVirtualMachines/amazon-corretto-11.jdk/Contents/Home java RegexTest 
POSIX Result: IsPunct=false, Punct=true, IsPunctuation=false
Unicode Result: IsPunct=false, Punct=false, IsPunctuation=false

$ JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_191.jdk/Contents/Home javac RegexTest.java 
$ JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_191.jdk/Contents/Home java RegexTest 
POSIX Result: IsPunct=true, Punct=true, IsPunctuation=false
Unicode Result: IsPunct=true, Punct=false, IsPunctuation=false

[1] https://docs.oracle.com/javase/specs/jls/se16/html/jls-3.html#jls-3.1