Question

我知道之前有类似的问题，但我想做一个自定义操作，我不知道如何去做。我想用正则表达式分割一串数据，但这次我知道起始字符和结束字符如：

String myString="Google is a great search engine<as:...s>";

＆lt; as：和s＆gt;是开始和结束的字符 ......是动态的，我无法预测它的价值

我希望能够从开头拆分字符串＆lt; as：到结束s＆gt; 用动态字符串。

像：

myString.split("<as:/*s>");

这样的事情。我也希望得到＆lt; as：.. s＆gt;的所有出现。在字符串中。我知道这可以用正则表达式完成，但我以前从未这样做过。我需要一个简单而巧妙的方法来做到这一点。提前致谢

Answer 1

我只使用.split()和Pattern提取，而不是使用Matcher。此方法查找<as:和s>之间的所有内容，并将其提取到捕获组。然后第1组有你想要的文字。

public static void main(String[] args)
{
    final String myString="Google is a great search engine<as:Some stuff heres>";

    Pattern pat = Pattern.compile("^[^<]+<as:(.*)s>$");

    Matcher m = pat.matcher(myString);
    if (m.matches()) {
        System.out.println(m.group(1));
    }
}

输出：

这里的一些东西

如果您需要开头的文字，也可以将其放在捕获组中。

编辑：如果输入中有多个<as...s>，则以下内容将收集所有这些内容。编辑2：增加了逻辑。增加了对空虚的检查。

public static List<String> multiEntry(final String myString)
{
    String[] parts = myString.split("<as:");

    List<String> col = new ArrayList<>();
    if (! parts[0].trim().isEmpty()) {
        col.add(parts[0]);
    }

    Pattern pat = Pattern.compile("^(.*?)s>(.*)?");        
    for (int i = 1; i < parts.length; ++i) {
        Matcher m = pat.matcher(parts[i]);
        if (m.matches()) {
            for (int j = 1; j <= m.groupCount(); ++j) {
                String s = m.group(j).trim();
                if (! s.isEmpty()) {
                    col.add(s);
                }
            }
        }
    }

    return col;
}

输出：

[谷歌是一个很棒的搜索引擎，有些东西是heress，这里是Facebook，更多东西，最后还有别的东西]

编辑3：此方法使用查找和循环来进行解析。它也使用可选的捕获组。

public static void looping()
{
    final String myString="Google is a great search engine"
            + "<as:Some stuff heresss>Here is Facebook<as:More Stuffs>"
            + "Something else at the end" +
            "<as:Stuffs>" +
            "<as:Yet More Stuffs>";

    Pattern pat = Pattern.compile("([^<]+)?(<as:(.*?)s>)?");

    Matcher m = pat.matcher(myString);
    List<String> col = new ArrayList<>();

    while (m.find()) {
        String prefix = m.group(1);
        String contents = m.group(3);

        if (prefix != null) { col.add(prefix); }
        if (contents != null) { col.add(contents); }
    }

    System.out.println(col);
}

输出：

[谷歌是一个很棒的搜索引擎，有些东西是heress，这里是Facebook，更多的东西，最后的东西，东西，还有更多的东西]

附加编辑：编写了一些快速测试用例（使用超级黑客帮助类）来帮助验证。这些全部通过（更新）multiEntry：

public static void main(String[] args)
{
    Input[] inputs = {
            new Input("Google is a great search engine<as:Some stuff heres>", 2),
            new Input("Google is a great search engine"
                    + "<as:Some stuff heresss>Here is Facebook<as:More Stuffs>"
                    + "Something else at the end" +
                    "<as:Stuffs>" +
                    "<as:Yet More Stuffs>" +
                    "ending", 8),
            new Input("Google is a great search engine"
                            + "<as:Some stuff heresss>Here is Facebook<as:More Stuffs>"
                            + "Something else at the end" +
                            "<as:Stuffs>" +
                            "<as:Yet More Stuffs>", 7),
            new Input("No as here", 1),       
            new Input("Here is angle < input", 1),
            new Input("Angle < plus <as:Stuff in as:s><as:Other stuff in as:s>", 3),
            new Input("Angle < plus <as:Stuff in as:s><as:Other stuff in as:s>blah", 4),
            new Input("<as:To start with anglass>Some ending", 2),
    };


    List<String> res;
    for (Input inp : inputs) {
        res = multiEntry(inp.inp);
        if (res.size() != inp.cnt) {
            System.err.println("FAIL: " + res.size() 
            + " did not match exp of " + inp.cnt
            + " on " + inp.inp);
            System.err.println(res);
            continue;
        }
        System.out.println(res);
    }
}

正则表达式WildCard匹配使用java split方法进行拆分

1 个答案: