Question

我正在尝试使用正则表达式来匹配以标记开头并且some specific content的字符串。然后，我想要将特定段落标记中的所有内容替换为页面末尾。

我已尝试使用表达式<p.*?some content.*</html>，但它会抓取它看到的第一个标记，然后一直到最后。我希望它只识别内容前面的段落标记，允许段落标记和内容之间的其他内容和标记。

如何使用正则表达式访问some specific content，然后回溯到它在内容之前看到的第一个段落标记，然后选择从那里到结尾的所有内容？

如果有帮助，我正在使用EditPad Pro的“搜索和替换”功能（尽管这可能适用于使用正则表达式的任何内容）。

Answer 1

对于简单输入使用正则表达式

<p[^<]*some content.*<\/html>

但更安全的是使用正则表达式

<p(?:[^<]*|<(?!p\b))*some content.*<\/html>

Answer 2

首先，这是Java代码，但我想它可以很容易地适应其他正则表达式引擎/编程语言。

因此，根据我的理解，您需要一种情况，其中给定输入的部分以开头，后面紧跟一些目标内容/短语。然后，您想要将初始标记后面的所有内容替换为其他内容吗？

如果这是正确的，你可以这样做：

String input; // holds your input text/html
String targetPhrase = "some specific content"; // some target content/phrase
String replacement; // holds the replacement value

Pattern p = Pattern.compile("<p[^>]*>(" + targetPhrase + ".*)$", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(input);
m.replaceFirst(replacement);

当然，正如上面的评论中所提到的，你真的不想使用HTML正则表达式。

或者，如果你知道如果标签就是那个，没有属性或任何东西，你可以尝试使用子字符串。

例如，如果您正在寻找"some specific content"，您可以尝试以下内容：

String input; // your input text/html
String replacement; // the replacement value(s)

int index = input.indexOf("<p>some specific content");
if (index > -1) {
    String output = input.substring(0, index);
    output += "<p>" + replacement;

    // now output holds your modified text/html
}

正则表达式查找内容，然后回溯到初始HTML标记

2 个答案: