我想找到一种写正则表达式的方法来搜索出现的字符串,该字符串以指定的开始子字符串开头,以另一个指定的结束字符串结尾,但总长度最小。例如,如果在搜索字符串book_stash = {}
with open(...) as raw_data:
for item in raw_data:
if ':' in item:
key,value = item.split(':', 1)
if key == 'Book_ID':
book_data = {}
book_stash[value] = book_data
else:
book_data[key] = value
时我的起始字符串是bar
,而我的结束字符串是foo
,那么我希望它返回barbazbarbazfoobazfoo
。
我知道如果它只是一端或另一端的单个字符,该怎么做,例如,用我可以使用barbazfoo
搜索以查找字符串的字符替换上面的单词a[^a].*?b
在字符串axb
中,但是由于我要查找单词而不是字符,因此我不能简单地说我不需要任何特定的字母,因为该字母可以出现在字母之间。
对于上下文,我试图读取服务器上的日志,并希望查找例如哪些用户遇到了特定错误,但是在用户名出现和异常信息发生之间还有其他信息。因此,我没有在寻找一种解决方案,该解决方案使用以上示例中的axaxbxb
仅出现字母foo
和f
的事实。
其他示例:摘自this regex tutorial about lookahead and lookbehind的第一段
文字为:
o
如果我的起始单词是Lookahead and lookbehind, collectively called "lookaround", are zero-length assertions just like the start and end of line, and start and end of word anchors explained earlier in this tutorial. The difference is that lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. That is why they are called "assertions". They do not consume characters in the string, but only assert whether a match is possible or not. Lookaround allows you to create regular expressions that are impossible to create without them, or that would get very longwinded without them.
,而我的结束单词是lookaround
,那么我希望找到子字符串match
,并指出目标单词可能会多次出现,并且在可能与目标单词共享字符之间的未知数量的单词和字符。在上面的示例中,例如lookaround actually match
回来了,因为语法似乎是在避免避免每个字母lookaround[^lookaround]*?match
,l
,o
都没有找到匹配项, ...个别我想看看如何避免子字符串而不是单个字母。
答案 0 :(得分:1)
您必须使用脾气暴躁的令牌:
\blookaround\b(?:(?!\b(?:match|lookaround)\b).)*\bmatch\b
匹配lookaround actually matches characters, but then gives up the match
lookaround(?:(?!(?:match|lookaround)).)*match
匹配lookaround actually match