正则表达式查找带有起始词和结束词的最短字符串

时间:2019-06-02 13:01:26

标签: regex

我想找到一种写正则表达式的方法来搜索出现的字符串,该字符串以指定的开始子字符串开头,以另一个指定的结束字符串结尾,但总长度最小。例如,如果在搜索字符串book_stash = {} with open(...) as raw_data: for item in raw_data: if ':' in item: key,value = item.split(':', 1) if key == 'Book_ID': book_data = {} book_stash[value] = book_data else: book_data[key] = value 时我的起始字符串是bar,而我的结束字符串是foo,那么我希望它返回barbazbarbazfoobazfoo

我知道如果它只是一端或另一端的单个字符,该怎么做,例如,用我可以使用barbazfoo搜索以查找字符串的字符替换上面的单词a[^a].*?b在字符串axb中,但是由于我要查找单词而不是字符,因此我不能简单地说我不需要任何特定的字母,因为该字母可以出现在字母之间。

对于上下文,我试图读取服务器上的日志,并希望查找例如哪些用户遇到了特定错误,但是在用户名出现和异常信息发生之间还有其他信息。因此,我没有在寻找一种解决方案,该解决方案使用以上示例中的axaxbxb仅出现字母foof的事实。


其他示例:摘自this regex tutorial about lookahead and lookbehind的第一段

文字为:

o

如果我的起始单词是Lookahead and lookbehind, collectively called "lookaround", are zero-length assertions just like the start and end of line, and start and end of word anchors explained earlier in this tutorial. The difference is that lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. That is why they are called "assertions". They do not consume characters in the string, but only assert whether a match is possible or not. Lookaround allows you to create regular expressions that are impossible to create without them, or that would get very longwinded without them.,而我的结束单词是lookaround,那么我希望找到子字符串match,并指出目标单词可能会多次出现,并且在可能与目标单词共享字符之间的未知数量的单词和字符。在上面的示例中,例如lookaround actually match回来了,因为语法似乎是在避免避免每个字母lookaround[^lookaround]*?matchlo都没有找到匹配项, ...个别我想看看如何避免子字符串而不是单个字母。

1 个答案:

答案 0 :(得分:1)

您必须使用脾气暴躁的令牌:

First (with word boundaries)

\blookaround\b(?:(?!\b(?:match|lookaround)\b).)*\bmatch\b

匹配lookaround actually matches characters, but then gives up the match

Second (without)

lookaround(?:(?!(?:match|lookaround)).)*match

匹配lookaround actually match