Question

问题措辞有点奇怪，但我不知道怎么回答它。

我正在使用wordnet来提取一些定义，我需要使用正则表达式来拉动词性和输出中的定义，如果我查找单词study

Overview of verb study

1. reading, blah, blah (to read a book with the intent of learning)
2. blah blah blah (second definition of study)

Overview of noun study

1. blah blah blah (the object of ones study)
2. yadda yadda yadda (second definition of study)

我希望得到这个......

[('verb', 'to read a book with the intent of learning'), ('verb', 'second definition of study'), ('noun', 'the object of ones studying'), ('noun','second definition of study')]

我有两个与我想要的匹配的正则表达式，但我无法弄清楚如何通过数据来获得我想要的数据结构。有什么想法吗？

编辑：

添加正则表达式

stripped_defs = re.findall('^\s*\d+\..*\(([^)"]+)', definitions, re.M)
pos = re.findall('Overview of (\w+)', definitions)

Answer 1

我的方式是（text是文字）：

按Overview of...：

拆分它们

>>> re.split('Overview of (\w+) study', text)[1:]
['verb', 
'\n\n1. reading, blah, blah (to read a book with the intent of learning)\n2. blah blah blah (second definition of study)\n\n',
'noun',
'\n\n1. blah blah blah (the object of ones study)\n2. yadda yadda yadda (second definition of study)']

>>> l = re.split('Overview of (\w+) study', text)[1:]

将这个列表拆分为：

>>> [l[i:i+2] for i in range(0, len(l), 2)]
[['verb', 
  '\n\n1. reading, blah, blah (to read a book with the intent of learning)\n2. blah blah blah (second definition of study)\n\n'], 
 ['noun', 
  '\n\n1. blah blah blah (the object of ones study)\n2. yadda yadda yadda (second definition of study)']]

>>> l = [l[i:i+2] for i in range(0, len(l), 2)]

然后我们可以做到：

>>> [[(i, k) for k in re.findall('\((.+?)\)', j)] for i, j in l]
[[('verb', 'to read a book with the intent of learning'),
  ('verb', 'second definition of study')],

 [('noun', 'the object of ones study'),
  ('noun', 'second definition of study')]]

获得期望的输出：

final_list = []
for i in [[(i, k) for k in re.findall('\(.+?\)', j)] for i, j in l]:
    final_list.extend(i)

print(final_list)

给出了：

[('verb', 'to read a book with the intent of learning'),
 ('verb', 'second definition of study'),

 ('noun', 'the object of ones study'),
 ('noun', 'second definition of study')]

代码：

l = re.split('Overview of (\w+) study', text)[1:]
l = [l[i:i+2] for i in range(0, len(l), 2)]

# or just `final_list = l` if it doesn't matter
final_list = []

for i in [[(i, k) for k in re.findall('\(.+?\)', j)] for i, j in l]:
    final_list.extend(i)

如何让regex一行一行地同时匹配两个字符串？

1 个答案: