描述

Question

我想编写一个可以删除[cent]

周围括号的正则表达式

String input1 = "this is a [cent] and [cent] string" 
String output1 = "this is a cent and cent string"

但如果它嵌套为：

String input2="this is a [cent[cent] and [cent]cent] string"
String output2="this is a cent[cent and cent]cent string"

我只能对字符串使用replaceAll，那么如何在下面的代码中创建模式呢？更换字符串应该是什么？

Pattern rulerPattern1 = Pattern.compile("", Pattern.MULTILINE);
System.out.println(rulerPattern1.matcher(input1).replaceAll(""));

更新：嵌套括号格式正确，只有两个级别，如情况2。

编辑：如果这是字符串"[<centd>[</centd>]purposes[<centd>]</centd>]";然后OUPTUT应该是<centd>[</centd> purposes <centd>]</centd> ..基本上如果括号在centd开头和结尾之间留在那里或者删除

Answer 1

描述

此正则表达式将替换括号，仅基于括号的一侧有空格。

正则表达式：(?<=\s)[\[\]](?=\S)|(?<=\S)[\[\]](?=\s)

替换为空字符串

enter image description here

摘要

样本1
- 输入：this is a [cent[cent] and [cent]cent] string
- 输出this is a cent[cent and cent]cent string
样本2
- 输入：this is a [cent[cent] and [cent]cent] string
- 输出this is a cent[cent and cent]cent string
样本3
- 输入：[<cent>[</cent>] and [<cent>]Chemotherapy services.</cent>]
- 输出[<cent>[</cent> and <cent>]Chemotherapy services.</cent>]

要解决问题的编辑，该表达式将找到：

[<centd>[</centd>]并将其替换为<centd>[</centd>
[<centd>]或[</centd>]，只删除外方括号
保留所有其他方括号

正则表达式：\[(<centd>[\[\]]<\/centd>)\]|\[(<\/?centd>)\]

替换为：$1$2

enter image description here

样本4
- 输入：[<centd>[</centd>]purposes[<centd>]</centd>]
- 输出<centd>[</centd>pur [T] poses<centd>]</centd>

Answer 2

如果真的只是找到围绕“分”的括号，你可以使用以下方法（使用lookbehind，lookahead）：

编辑按照预期输出留下一些括号：现在这是正面和负面的外观和前瞻的组合。换句话说，正则表达式不太可能是解决方案，但确实可以使用提供的文字然后使用。

// surrounding
String test1 = "this is a [cent] and [cent] string";
// pseudo-nested
String test2 = "this is a [cent[cent] and [cent]cent] string";
// nested
String test3 = "this is a [cent[cent]] and [cent]cent]] string";
Pattern pattern = Pattern.compile("((?<!cent)\\[+(?=cent))|((?<=cent)\\]+(?!cent))");
Matcher matcher = pattern.matcher(test1);
if (matcher.find()) {
    System.out.println(matcher.replaceAll(""));
}
matcher = pattern.matcher(test2);
if (matcher.find()) {
    System.out.println(matcher.replaceAll(""));
}
matcher = pattern.matcher(test3);
if (matcher.find()) {
    System.out.println(matcher.replaceAll(""));
}

输出：

this is a cent and cent string
this is a cent[cent and cent]cent string
this is a cent[cent and cent]cent string

Answer 3

在一般情况下，正则表达式不适用于此目的。嵌套结构是一种递归语法，而不是常规语法。（这就是为什么你don't parse HTML with regular expressions，BTW。）

如果只有有限深度的括号嵌套，可以为其编写正则表达式。购买你需要首先陈述你的嵌套深度，正则表达式不会那么漂亮。

Answer 4

假设

从这个问题来看，假设嵌套括号不超过2级。还假设括号是平衡的。

我进一步假设您不允许转义[]。

我还假设当有嵌套括号时，只保留内括号的第一个开口[和最后一个关闭]括号。其余部分，即顶级支架和内部支架的其余部分将被移除。

例如：

only[single] [level] outside[text more [text] some [text]moreeven[more]text[bracketed]] still outside

更换后将成为：

onlysingle level outsidetext more [text some textmoreevenmoretextbracketed] still outside

除上述假设外，没有其他假设。

如果您可以对括号前后的间距进行假设，那么您可以使用simpler solution by Denomales。否则，我的解决方案将无需这样的假设。

解决方案

private static String replaceBracket(String input) {
    // Search for singly and doubly bracketed text
    Pattern p = Pattern.compile("\\[((?:[^\\[\\]]++|\\[[^\\[\\]]*+\\])*+)\\]");
    Matcher matcher = p.matcher(input);

    StringBuffer output = new StringBuffer(input.length());

    while (matcher.find()) {
        // Take the text inside the outer most bracket
        String innerText = matcher.group(1);
        int startIndex = innerText.indexOf("[");
        int endIndex;

        String replacement;

        if (startIndex != -1) {
            // 2 levels of nesting
            endIndex = innerText.lastIndexOf("]");

            // Remove all [] except for first [ and last ]
            replacement = 
                // Text before and including first [
                innerText.substring(0, startIndex + 1) + 
                // Text inbetween, stripped of all the brackets []
                innerText.substring(startIndex + 1, endIndex).replaceAll("[\\[\\]]", "") +
                // Text after and including last ]
                innerText.substring(endIndex);
        } else {
            // No nesting
            replacement = innerText;
        }

        matcher.appendReplacement(output, replacement);
    }

    matcher.appendTail(output);

    return output.toString();
}

说明

唯一值得解释的是正则表达式。其余的你可以查看Matcher类的文档。

"\\[((?:[^\\[\\]]++|\\[[^\\[\\]]*+\\])*+)\\]"

以RAW格式（打印出字符串时）：

\[((?:[^\[\]]++|\[[^\[\]]*+\])*+)\]

让我们分手（空间无关紧要）：

\[                    # Outermost opening bracket
(                     # Capturing group 1
  (?:
    [^\[\]]++         # Text that doesn't contain []
    |                 # OR
    \[[^\[\]]*+\]     # A nested bracket containing text without []
  )*+
)                     # End of capturing group 1
\]                    # Outermost closing bracket

我使用了占有量词*+和++，以防止正则表达式引擎回溯。具有正常贪婪量词\[((?:[^\[\]]+|\[[^\[\]]*\])*)\]的版本仍然有效，但效率会稍低，并且可能会导致StackOverflowError足够大的输入。

Answer 5

您可以使用java matcher转换括号。我在下面为你做了一个：

         String input = "this is a [cent[cent] and [cent]cent] string";
         Pattern p = Pattern.compile("\\[((?:[^\\[\\]]++|\\[[^\\[\\]]*+\\])*+)\\]");
         Matcher m = p.matcher(input);

正则表达式，用于在标记内部转换括号和嵌套括号

5 个答案:

描述

摘要

假设

解决方案

说明