Pattern 类字段应用

上面章节介绍了 Pattern 类中常用方法的具体用法和作用。本章节将介绍 Pattern 类提供的静态常亮字段，这些字段需要配合 compile() 方法进行使用，如下：

public static Pattern compile(String regex, int flags)

参数说明：

regex 正则表达式字符串
flags 匹配标志，可选值 CASE_INSENSITIVE、MULTILINE、DOTALL、UNICODE_CASE、 CANON_EQ、UNIX_LINES、LITERAL 和 COMMENTS，下面将逐一介绍其用法。

CASE_INSENSITIVE

启用不区分大小写的匹配。默认情况下，不区分大小写的匹配仅仅匹配 US-ASCII 字符集中的字符。我们可以通过指定 UNICODE_CASE 标志连同此标志来启用 Unicode 不区分大小写的匹配。也可以通过嵌入式标志表达式 (?i) 来启用不区分大小写的匹配。注意：指定此标志可能对性能产生一些影响。

实例：匹配 “hello world” 文本，不区分大小写。

import java.util.regex.Pattern;

public class Demo4 {

    public static void main(String[] args) {
        // 如果你要启动 UNICODE_CASE，则可以这样写
        // Pattern pattern = Pattern.compile("(hello)( )(world)", 
        //    Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
        // 使用 Pattern.CASE_INSENSITIVE 忽略大小写
        Pattern pattern = Pattern.compile("(hello)( )(world)", Pattern.CASE_INSENSITIVE);
        System.out.println(pattern.matcher("hello world").matches());
        System.out.println(pattern.matcher("HELLO WORLD").matches());
        System.out.println(pattern.matcher("Hello World").matches());
    }

}

UNICODE_CASE

启用 Unicode 的大小写匹配。此标志和 Pattern.CASE_INSENSITIVE 一起使用，不区分大小写的匹配将以符合 Unicode Standard 的方式完成。默认情况下，不区分大小写的匹配仅仅匹配 US-ASCII 字符集中的字符。你也通过嵌入式标志表达式 (?u) 启用 Unicode 的大小写匹配。注意：指定此标志可能对性能产生影响。

MULTILINE

启用多行模式。在多行模式中，表达式 ^ 和 $ 仅分别在行结束符前后匹配，或者在输入序列的结尾处匹配。默认情况下，这些表达式仅在整个输入序列的开头和结尾处匹配。你也可以通过嵌入式标志表达式 (?m) 启用多行模式。实例：

import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Demo13 {

    public static void main(String[] args) {
        String str = "Be honest rather clever\r\n" +
                "Being on sea, sail; being on land, settle\n\n" +
                "Be just to all, but trust not all\n\n" +
                "Believe not all that you see nor half what you hear\n\n" +
                "Be slow to promise and quick to perform\n\n" +
                "Between two stools one falls to the ground\n\n" +
                "Better an open enemy than a false friend";
        System.out.println("========== 普通模式 ==========");
        Pattern pattern = Pattern.compile("^Be\\s");
        Matcher matcher = pattern.matcher(str);
        while(matcher.find()) {
            System.out.println(matcher.group()
                    + "  位置：[" + matcher.start() + ", " + matcher.end() + "]");
        }

        System.out.println("========== 多行模式 ==========");
        Pattern patternMultiline = Pattern.compile("^Be\\s", Pattern.MULTILINE);
        Matcher matcherMultiline = patternMultiline.matcher(str);
        while(matcherMultiline.find()) {
            System.out.println(matcherMultiline.group()
                    + "  位置：[" + matcherMultiline.start() + ", " + matcherMultiline.end() + "]");
        }
    }

}

运行结果如下：

========== 普通模式 ==========
Be   位置：[0, 3]
========== 多行模式 ==========
Be   位置：[0, 3]
Be   位置：[68, 71]
Be   位置：[156, 159]

UNIX_LINES

启用 Unix 行模式。在此模式中，.（点）、^ 和 $ 的行为中仅识别 '\n' 行结束符。你也可以通过嵌入式标志表达式 (?d) 启用 Unix 行模式。实例：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Demo14 {

    public static void main(String[] args) {
        String str = "Be honest rather clever\n" +
                "Being on sea, sail; being on land, settle\n" +
                "Be just to all, but trust not all\n" +
                "Believe not all that you see nor half what you hear\n" +
                "Be slow to promise and quick to perform\n" +
                "Between two stools one falls to the ground\n" +
                "Better an open enemy than a false friend";
        // 同时指定多行模式和UNIX行模式
        // 注意：指定多个模式，你可以使用或操作符将多个标识进行取或
        Pattern patternMultiline = Pattern.compile("^Be\\s",
                Pattern.MULTILINE | Pattern.UNIX_LINES);
        Matcher matcherMultiline = patternMultiline.matcher(str);
        while(matcherMultiline.find()) {
            System.out.println(matcherMultiline.group()
                    + "  位置：[" + matcherMultiline.start() + ", " + matcherMultiline.end() + "]");
        }
    }

}

运行结果如下：

Be   位置：[0, 3]
Be   位置：[66, 69]
Be   位置：[152, 155]

LITERAL

启用模式的字面值解析。指定此标志后，输入的正则表达式中的元字符将被作为字面值字符序列来对待。输入序列中的元字符或转义序列不具有任何特殊意义。注意：标志 CASE_INSENSITIVE 和 UNICODE_CASE 在与此标志一起使用时将对匹配产生影响。其他标志都变得多余了。

实例：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Demo15 {

    public static void main(String[] args) {
        String str = "^Be\\s honest rather clever";
        Pattern pattern = Pattern.compile("^Be\\s", Pattern.LITERAL);
        Matcher matcher = pattern.matcher(str);
        while(matcher.find()) {
            System.out.println(matcher.group()
                    + "  位置：[" + matcher.start() + ", " + matcher.end() + "]");
        }
    }

}

运行结果如下：

^Be\s  位置：[0, 5]

DOTALL

启用 dotall 模式。在 dotall 模式中，表达式 . 可以匹配任何字符，包括行结束符。默认情况下，此表达式不匹配行结束符。你也可以通过嵌入式标志表达式 (?s) 启用 dotall 模式（s 是 "single-line" 模式的助记符，在 Perl 中也使用它）。实例：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Demo17 {

    public static void main(String[] args) {
        String str = "Be honest rather clever\r\n" +
                "Hello world";
        Pattern pattern = Pattern.compile("^Be.+", Pattern.DOTALL);
        Matcher matcher = pattern.matcher(str);
        while(matcher.find()) {
            System.out.println(matcher.group()
                    + "  位置：[" + matcher.start() + ", " + matcher.end() + "]");
        }
    }

}

运行结果如下：

Be honest rather clever
Hello world  位置：[0, 36]

默认情况下，.（点）是不会匹配行结束符（即\r\n）。但是指定了 Pattern.DOTALL 模式后，.（点）将会匹配行结束符，因此将“Be honest rather clever \r\n Hello world”当做一行处理。

COMMENTS

模式中允许空白和注释。此模式将忽略空白和在结束行之前以 # 开头的嵌入式注释。你也可以通过嵌入式标志表达式 (?x) 启用注释模式。实例：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Demo16 {

    public static void main(String[] args) {
        String str = "Be honest rather clever";
        Pattern pattern = Pattern.compile("^ B e\\s#我是注释", Pattern.COMMENTS);
        Matcher matcher = pattern.matcher(str);
        while(matcher.find()) {
            System.out.println(matcher.group()
                    + "  位置：[" + matcher.start() + ", " + matcher.end() + "]");
        }
    }

}

运行结果如下：

Be   位置：[0, 3]

CANON_EQ

启用规范等价。指定此标志后，当且仅当其完整规范分解匹配时，两个字符才可视为匹配。例如，当指定此标志时，表达式 "a\u030A" 将与字符串 "\u00E5" 匹配。默认情况下，匹配不考虑采用规范等价。注意：正则表达式不存在可以启用规范等价的嵌入式标志字符，并且指定此标志可能对性能产生影响。实例：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Demo12 {

    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("a\u030A", Pattern.CANON_EQ);
        // å == \u00E5
        Matcher matcher = pattern.matcher("\u00E5");
        while(matcher.find()) {
            System.out.println(matcher.group()
                    + "  位置：[" + matcher.start() + ", " + matcher.end() + "]");
        }
    }

}

运行结果如下：