Question

我希望能够同时输出＆＃34; ==＆＃34;和＆＃34; =＆＃34;作为代币。

例如，输入文本文件为：

biscuit==cookie apple=fruit+-()

输出：

biscuit
=
=
cookie
apple
=
fruit
+
-
(
)

我想要的输出是什么：

biscuit
==
cookie
apple
=
fruit
+
-
(
)

这是我的代码：

    Scanner s = null;
    try {
        s = new Scanner(new BufferedReader(new FileReader("input.txt")));
        s.useDelimiter("\\s|(?<=\\p{Punct})|(?=\\p{Punct})");

        while (s.hasNext()) {

            String next = s.next();
            System.out.println(next);
       }
    } finally {
        if (s != null) {
            s.close();
        }
    }

谢谢。

编辑：我希望能够保留当前的正则表达式。

Answer 1

根据以下正则表达式分割输入字符串。

String s = "biscuit==cookie apple=fruit"; 
String[] tok = s.split("\\s+|\\b(?==+)|(?<==)(?!=)");
System.out.println(Arrays.toString(tok));

<强>输出：

[biscuit, ==, cookie, apple, =, fruit]

<强>解释

\\s+匹配一个或多个空格字符。
|或
\\b(?==+)仅当字词边界后跟=符号时才匹配。
|或
(?<==)关注=符号。
(?!=)仅当边界后面没有=符号时才匹配边界。

<强>更新

String s = "biscuit==cookie apple=fruit+-()"; 
String[] tok = s.split("\\s+|(?<!=)(?==+)|(?<==)(?!=)|(?=[+()-])");
System.out.println(Arrays.toString(tok));

<强>输出：

[biscuit, ==, cookie, apple, =, fruit, +, -, (, )]

Answer 2

换句话说，你想分开

一个或多个空格
在其后面有=而在其前面没有=的地方（例如foo|= |代表这个地方）
其前面有=而后面有=的地方（例如=|foo |代表这个地方）

换句话说

s.useDelimiter("\\s+|(?<!=)(?==)|(?<==)(?!=)");
//             ^^^^^ ^^^^^^^^^^^ ^^^^^^^^^^^
//cases:         1)        2)        3)

由于看起来你正在构建解析器，我建议使用工具，它可以让你构建像http://www.antlr.org/这样的正确语法。但是如果你必须坚持使用正则表达式，那么可以让你更容易构建正则表达式的其他改进就是使用Matcher#find代替Scanner的分隔符。这样你的正则表达式和代码看起来像

    String data = "biscuit==cookie apple=fruit+-()";

    String regex = "<=|==|>=|[\\Q<>+-=()\\E]|[^\\Q<>+-=()\\E]+";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(data);

    while (m.find())
        System.out.println(m.group());

输出：

biscuit
==
cookie apple
=
fruit
+
-
(
)

您可以使用

String regex = "<=|==|>=|\\p{Punct}|\\P{Punct}+";
//                       ^^^^^^^^^^ ^^^^^^^^^^^-- standard cases
//              ^^ ^^ ^^------------------------- special cases

此方法还需要先从文件中读取数据，然后将其存储在您要解析的单个String中。您可以在此问题中找到许多如何从文件中读取文本的方法： Reading a plain text file in Java

所以你可以使用像

String data = new String(Files.readAllBytes(Paths.get("input.txt")));

您可以使用构造函数String(bytes, encoding)指定从文件读取字节时应使用的编码。因此，您可以将其写为new String(butes,"UTF-8")或者在选择编码时避免使用拼写错误，使用存储在StandardCharsets类new String(bytes, StandardCharsets.UTF_8)中的一个。

Answer 3

您可以通过一些额外的断言来限定这些标点符号。

 # "\\s|(?<===)|(?<=\\p{Punct})(?!(?<==)(?==))|(?=\\p{Punct})(?!(?<==)(?==))"

   \s 
|  (?<= == )
|  (?<= \p{Punct} )
   (?!
        (?<= = )
        (?= = )
   )
|  (?= \p{Punct} )
   (?!
        (?<= = )
        (?= = )
   )

信息更新

如果\p{Punct}中没有涵盖某些字符，只需将它们添加为单独的类标点子表达式。

对于在类中没有完成某些属性的引擎，请使用此 - ＆gt;

 #  Raw:   \s|(?<===)|(?<=\p{Punct}|[=+])(?!(?<==)(?==))|(?=\p{Punct}|[=+])(?!(?<==)(?==))

    \s 
 |  (?<= == )
 |  (?<= \p{Punct} | [=+] )
    (?!
         (?<= = )
         (?= = )
    )
 |  (?= \p{Punct} | [=+] )
    (?!
         (?<= = )
         (?= = )
    )

对于在类内部处理属性的引擎，这是一个更好的引擎 - ＆gt;

 #  Raw:   \s|(?<===)|(?<=[\p{Punct}=+])(?!(?<==)(?==))|(?=[\p{Punct}=+])(?!(?<==)(?==))

    \s 
 |  (?<= == )
 |  (?<= [\p{Punct}=+] )
    (?!
         (?<= = )
         (?= = )
    )
 |  (?= [\p{Punct}=+] )
    (?!
         (?<= = )
         (?= = )
    )

Answer 4

(?===)|(?<===)|\s|(?<!=)(?==)|(?<==)(?!=)|(?=\p{P})|(?<=\p{P})|(?=\+)

你可以尝试这个。演示。

http://regex101.com/r/wQ1oW3/18

如何在阅读时在Java中界定“=”和“==”

4 个答案: