Question

我的代码就像：

string = "title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht color=red title=xxxy red=anything title=xxxyyy color=red"
pattern = r'title=(.*?) color=red'
print re.compile(pattern).search(string).group(0)

我得到了

"title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht color=red title=xxxy red=anything title=xxxyyy color=red"

但我想找到“标题”的所有内容，紧接着是“color = red”

Answer 1

你想要color=red之前的内容吗？然后使用

.*title=(.*?) color=red

演示：https://regex101.com/r/sR4kN2/1

这会贪婪地匹配color=red之前的所有内容，以便只显示所需的标题。

或者，如果您知道标题中没有字符，则可以通过使用字符类排除来简化。例如，如果您知道=不会出现：

title=([^=]*?) color=red

或者，如果你知道空格不会出现：

title=([^\s]*?) color=red

第三个选项，使用一些代码查找所有红色标题（假设输入总是替换标题，颜色）：

for title, color in re.findall(r'title=(.*?) color=(.*?)\( |$\)'):
    if color == 'red':
        print title

Answer 2

如果你想在某个正则表达式之前获得子正则表达式的最后一个匹配，那么解决方案就是使用一个贪婪的队长。例如：

>>> pattern = '.*title="([^"]*)".*color="#123"'
>>> text = 'title="123" color="#456" title="789" color="#123"'
>>> print(re.match(pattern, s).groups(1))

第一个.*是贪婪的，它会尽可能地跳过（因此跳过第一个title）备份到允许匹配所需颜色的那个。

作为一个更简单的例子，请考虑

a(.*)b(.*)c

在

上处理

a1111b2222b3333c

将匹配第一组中的1111b2222和第二组中的3333。

Answer 3

为什么不跳过正则表达式，而是使用一些拆分功能：

search_title = False
found = None
string = "title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht colo\
r=red title=xxxy red=anything title=xxxyyy color=red"
parts = string.split()
for part in parts:
    key, value = part.split('=', 1)
    if search_title:
        if key == 'title':
            found = value
        search_title = False
    if key == 'color' and value == 'red':
        search_title = True
print(found)

结果

xxxy

正则表达式很好，但有时会引起头痛。

Answer 4

使用re模块

尝试此操作

>>>string = 'title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht color=red'
>>>import re
>>>re.search('(.*title=?)(.*) color=red', string).group(2)
'whatIwaht'

>>>re.search('(.*title=?)(.*) color=red', string).group(2)
'xyxyx'

如何获取字符串中键的值，后跟另一个特定键=值集

4 个答案: