Question

已更新

我想在大文本中找到一个字符串

 ..."img good img two_apple.txt"

想要从文本中提取two_apples.txt，但它可以更改为one_apple，three_apple..so on ... 当我尝试使用lookbehinds时，它从一开始就匹配文本。

Answer 1

你误用了外观。看起来你甚至不需要一个环顾：

pattern = r'src="images/(.+?.png")'

应该适合你。正如我的评论所暗示的那样，建议不要使用正则表达式来解析HTML / XML样式文档，但是你可以这样做。

编辑 - 适应您的编辑：

现在我更了解你的问题了，我明白你为什么要使用环顾四周。但是，由于您正在查找文件名，因此您知道名称中没有任何空格，因此您可以确保捕获令牌不包含空格：

pattern = r'src="img (\w+?.png")'
                    ^ ensure there is a space HERE because of how your text is
                      \w - \w is equivalent to [a-zA-Z0-9_] (any letters, numbers or underscore)

这消除了捕获弹出的第一个'img '字符串的贪婪，并确保您的捕获组没有任何空格。

使用\w，我假设您只期待_和字母字符。要包含其他内容，请使用[any characters you want to capture in here]

创建自己的角色组

Answer 2

" ([^ ]+_apple\.txt)"

以空格开头，以_apple.txt结尾。中间位是任何东西 - 除了空间，它阻止它匹配“好img二”。用括号捕捉你关心的位。

在此处试试：https://regex101.com/r/wO7lG3/2

正则表达式python - 使用lookbehinds查找我的特定文本

2 个答案: