在特定短语之前查找字符串

时间:2018-07-05 17:42:26

标签: python regex string extract

让我们说代表短语的字符串是"Holy it is changing again and again"

我想在"changing"之前打印出单词"again and again",但是这个单词可能每次都不同。因此,我需要提取词组"again and again"之前的词。不应提取短语"holy it is"

如何使用Python做到这一点?

我曾考虑过像这里Python regex to match word before <这样使用Regex,但是我不太确定如何正确编写代码。

2 个答案:

答案 0 :(得分:1)

要匹配后跟"again and again"任何单词,请使用此正则表达式:

  • ([\w]*) again and again

如果要包括更多字符,例如撇号,请将[\w]替换为[\w'],并类似地将方括号内的其他字符替换掉(某些字符需要转义)。

  • 圣洁,这是一次又一次的改变
  • 我们要再次玩,然后一次又一次地玩
  • 一次又一次的OMG
  • 让我们一次又一次。我们一次又一次地走!
  • 我一次又一次 roomba (需要添加')
  • Foo一次又一次成为 A-B-C ,Bar和Baz。 (需要添加转义的连字符)
  • More sample regexes!

要查找该模式的所有匹配项,请使用

正则表达式match = re.findall("([\w']*) again and again", phrase),其中([\w']*)是任何单词(单词字符的序列,包括撇号。它返回所有单词的列表,后跟“再次”。

phrase = "Holy it is changing again and again!"
match = re.findall("([\w']*) again and again", phrase)
# match is ['changing']

phrase = "Going again, going again and again, and finishing again and again!"
match = re.findall("([\w']*) again and again", phrase)
# match is ['going', 'finishing']

phrase = "Defeated again and again! I got ninja'd again and again!"
match = re.findall("([\w']*) again and again", phrase)
# match is ['Defeated', "ninja'd"]

答案 1 :(得分:0)

import re

text = '''

Holy it is changing again and again
Holy it is not changing again and again
Holy it has changed again and again
Holy it has changed once
Holy it used to change again and again
'''

prog = re.compile(r'(\w+) again and again');
for line in text.splitlines():
  x = prog.search(line)
  if(x): print(x.group(1))

这将输出:

changing
changing
changed
change