Question

如果我有这一行：

＆＃34;他是个好人。他有一个[好妻子]。＆＃34;

我想提取线条，除了[好妻子] + [好妻子]的第一个字的索引。

所以输出将是：

＆＃34;他是个好人。他有一个[好妻子]，好妻子：12岁

我试过这个

fi = codecs.open('file', 'r', 'utf-8')
regex = re.compile(r"\[(.*?)\]")
for line in fi.readlines():
    line2= line.split()
    mw = re.findall(regex, line2)
    print (line, mw, line2.index(mw[0]))

但它确实提供了想要的东西

有人可以帮忙吗？

Answer 1

您可以使用re.search：

>>> def find(s):
...   try:
...     sub=re.search(r"\[(.*?)\]",s).group(1)
...     return sub,s.split().index('['+sub.split()[0])
...   except AttributeError:
...     return '[]'
... 
>>> print find('He is a very good man. He has a [good wife].')
('good wife', 9)
>>> print find('He is a very good man. He has a good wife.')
[]

请注意，re.search的结果是'good wife'对于grub，您需要将[与good连接起来的第一个单词的索引不是单独的单词你的字符串。

获取单词索引

1 个答案: