通过正则表达式

时间:2015-04-28 11:41:04

标签: java regex hyperlink wiki wikitext

有两种不同的wiki文字超链接:

[[stack]]
[[heap (memory region)|heap]]

我想删除超链接,但保留文字:

stack
heap

目前,我正在运行两个阶段,使用两个不同的正则表达式:

public class LinkRemover
{
    private static final Pattern
    renamingLinks = Pattern.compile("\\[\\[[^\\]]+?\\|(.+?)\\]\\]");

    private static final Pattern
    simpleLinks = Pattern.compile("\\[\\[(.+?)\\]\\]");

    public static String removeLinks(String input)
    {
        String temp = renamingLinks.matcher(input).replaceAll("$1");
        return simpleLinks.matcher(temp).replaceAll("$1");
    }
}

有没有办法将两个正则表达式“融合”成一个,实现相同的结果?

如果您想检查建议的解决方案的正确性,这里有一个简单的测试类:

public class LinkRemoverTest
{
    @Test
    public void test()
    {
        String input = "A sheep's [[wool]] is the most widely used animal fiber, and is usually harvested by [[Sheep shearing|shearing]].";
        String expected = "A sheep's wool is the most widely used animal fiber, and is usually harvested by shearing.";
        String output = LinkRemover.removeLinks(input);
        assertEquals(expected, output);
    }
}

1 个答案:

答案 0 :(得分:2)

您可以制作直到管道可选的部分:

\\[\\[(?:[^\\]|]*\\|)?([^\\]]+)\\]\\]

为了确保你总是在方括号之间,使用字符类。

fiddle(点击Java按钮)

模式细节:

\\[\\[         # literals opening square brackets
(?:            # open a non-capturing group
    [^\\]|]*   # zero or more characters that are not a ] or a |
    \\|        # literal |
)?             # make the group optional
([^\\]]+)      # capture all until the closing square bracket
\\]\\]