Question

说我有这个字符串

“输入：我们可以在拉各斯预订酒店吗？Parse：预订VB ROOT + - 可以MD aux + - 我们PRP nsubj + - 酒店NN dobj | + - 一个DT det | + - in IN prep | + - Lagos NNP pobj + - ？。punct“

我希望得到一个像这样的列表

['book VB ROOT', 'Can MD aux',..., '? . punct']

使用正则表达式。

我尝试过做

result = re.findall('\||\+-- (.*?)\+--|\| ', result, re.DOTALL)

任何帮助将不胜感激

Answer 1

没有使用内置函数和方法的正则表达式：

>>> filter(bool, map(str.strip, s.replace('+--', '|').split('Parse:')[1].split('|')))
['book VB ROOT', 'Can MD aux', 'we PRP nsubj', 'hotel NN dobj', 'an DT det', 'in IN prep', 'Lagos NNP pobj', '? . punct']

Answer 2

我会使用re.split ..

>>> s = 'Can we book an hotel in Lagos ? Parse: book VB ROOT  +-- Can MD aux  +-- we PRP nsubj  +-- hotel NN dobj  |   +-- an DT det  |   +-- in IN prep  |       +-- Lagos NNP pobj  +-- ? . punct'
>>> re.split(r'\s*\|?\s*\+\s*--\s*', s.split('Parse:')[1].strip())
['book VB ROOT', 'Can MD aux', 'we PRP nsubj', 'hotel NN dobj', 'an DT det', 'in IN prep', 'Lagos NNP pobj', '? . punct']

Answer 3

这是一个使用正则表达式的版本，但并不需要在所有部分上循环两次：

def extract(line):
    _, _, parts = line.strip().partition(' Parse: ')
   return re.split('(?: \|)? \+-- ', parts)

line = "Input:Can we book an hotel in Lagos ? Parse: book VB ROOT +-- Can MD aux +-- we PRP nsubj +-- hotel NN dobj | +-- an DT det | +-- in IN prep | +-- Lagos NNP pobj +-- ? . punct "
print(extract(line))
>>> ['book VB ROOT', 'Can MD aux', 'we PRP nsubj', 'hotel NN dobj', 'an DT det', 'in IN prep', 'Lagos NNP pobj', '? . punct']

使用Regex Python在字符串中提取String

3 个答案: