Question

我有一个格式为"[(1, 2), (2, 3), (3, 4)]"的字符串，其中包含任意数量的元素。我正在尝试将其拆分为分隔坐标的逗号，即检索(1, 2)，(2, 3)和(3, 4)。

我可以在Java正则表达式中完成吗？我是一个完整的菜鸟，但希望Java正则表达式足够强大。如果不是，你能建议一个替代方案吗？

Answer 1

来自Java 5

Scanner sc = new Scanner();
sc.useDelimiter("\\D+"); // skip everything that is not a digit
List<Coord> result = new ArrayList<Coord>();
while (sc.hasNextInt()) {
    result.add(new Coord(sc.nextInt(), sc.nextInt()));
}
return result;

编辑：我们不知道在字符串coords中传递了多少坐标。

Answer 2

您可以使用String#split()。

String string = "[(1, 2), (2, 3), (3, 4)]";
string = string.substring(1, string.length() - 1); // Get rid of braces.
String[] parts = string.split("(?<=\\))(,\\s*)(?=\\()");
for (String part : parts) {
    part = part.substring(1, part.length() - 1); // Get rid of parentheses.
    String[] coords = part.split(",\\s*");
    int x = Integer.parseInt(coords[0]);
    int y = Integer.parseInt(coords[1]);
    System.out.printf("x=%d, y=%d\n", x, y);
}

(?<=\\)) positive lookbehind表示必须以)开头。 (?=\\() positive lookahead表示必须被(取代。 (,\\s*)表示必须在,以及之后的任何空格上进行拆分。 \\只是为了逃避特定于正则表达式的字符。

也就是说，特定字符串可以被识别为List#toString()的结果。你确定你做得对吗？ ;）

更新根据评论，您确实可以采取相反的方式并摆脱非数字：

String string = "[(1, 2), (2, 3), (3, 4)]";
String[] parts = string.split("\\D.");
for (int i = 1; i < parts.length; i += 3) {
    int x = Integer.parseInt(parts[i]);
    int y = Integer.parseInt(parts[i + 1]);
    System.out.printf("x=%d, y=%d\n", x, y);
}

此处\\D表示必须在任何非 -digit上分割（\\d代表数字）。 .之后表示它应该消除数字后的任何空白匹配。但我必须承认，我不确定如何在数字之前消除空白匹配。我还不是一个训练有素的正则表达大师。 嘿，Bart K，你能做得更好吗？

毕竟，为此最好使用解析器。 See Huberts answer on this topic

Answer 3

如果您不需要表达式验证围绕坐标的语法，那么应该这样做：

\(\d+,\s\d+\)

此表达式将返回多个匹配项（三个与您示例中的输入相匹配）。

在您的问题中，您声明要“撤消(1, 2)，(2, 3)和(3, 4)。如果您确实需要与每个坐标关联的值对，你可以删除括号并修改正则表达式来做一些捕获：

(\d+),\s(\d+)

Java代码看起来像这样：

import java.util.regex.*;

public class Test {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("(\\d+),\\s(\\d+)");
        Matcher matcher = pattern.matcher("[(1, 2), (2, 3), (3, 4)]");

        while (matcher.find()) {
            int x = Integer.parseInt(matcher.group(1));
            int y = Integer.parseInt(matcher.group(2));
            System.out.printf("x=%d, y=%d\n", x, y);
        }
    }
}

Answer 4

是否总会有3组坐标需要分析？

你可以尝试：

\[(\(\d,\d\)), (\(\d,\d\)), (\(\d,\d\))\]

Answer 5

如果你使用正则表达式，你将会得到糟糕的错误报告，如果你的需求发生变化，事情会变得更加复杂（例如，如果你必须将不同方括号中的集合解析成不同的组）。

我建议您只需手动编写解析器，它就像10行代码，不应该非常脆弱。跟踪你正在做的一切，打开parens，关闭parens，打开括号和＆amp;关闭括号。它就像一个带有5个选项（和默认值）的switch语句，真的没那么糟糕。

对于最小化方法，可以忽略开放的parens和开括号，因此实际上只有3种情况。

这将是熊的最低限度。

// Java-like psuedocode
int valuea;
String lastValue;
tokens=new StringTokenizer(String, "[](),", true);

for(String token : tokens) {  

    // The token Before the ) is the second int of the pair, and the first should
    // already be stored
    if(token.equals(")"))
        output.addResult(valuea, lastValue.toInt());

    // The token before the comma is the first int of the pair
    else if(token.equals(",")) 
        valuea=lastValue.toInt();

    // Just store off this token and deal with it when we hit the proper delim
    else
        lastValue=token;
}

这并不比最小的基于正则表达式的解决方案更好，除了它将更容易维护和增强。（添加错误检查，为paren＆amp;方括号匹配添加堆栈，并检查错误的逗号和其他无效语法）

作为可扩展性的一个例子，如果你不得不将不同的方括号分隔组放到不同的输出集中，那么添加就像这样简单：

    // When we close the square bracket, start a new output group.
    else if(token.equals("]"))
        output.startNewGroup();

检查parens就像创建一堆字符并推送每个[或（在堆栈中，然后当你得到]或）时一样简单，弹出堆栈并声明它匹配。此外，完成后，请确保您的stack.size（）== 0。

Answer 6

在正则表达式中，您可以在使用Positive Lookbehind的(?<=\)),上进行拆分：

string[] subs = str.replaceAll("\[","").replaceAll("\]","").split("(?<=\)),");

在simpe字符串函数中，您可以删除[和]并使用string.split("),")，然后返回)。

正则表达式分割嵌套的坐标字符串

6 个答案: