Question

我正在尝试使用python在以下文本中找到单词'the'的索引

sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']

如果我sent3.index('the')我得到1，这是第一次出现这个词的索引。我不确定的是如何找到“the”出现的其他时间的索引。有谁知道我该怎么做呢？

谢谢！

Answer 1

[i for i, item in enumerate(sent3) if item == wanted_item]

演示：

>>> sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']
>>> [i for i, item in enumerate(sent3) if item == 'the']
[1, 5, 8]

enumerate只是从迭代构造一个list个元组，由它们的值及其相应的索引组成。我们可以使用它来检查值是否是我们想要的，如果是，则从中拉出索引。

Answer 2

>>> from collections import defaultdict
>>> sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']
>>> idx = defaultdict(list)
>>> for i,j in enumerate(sent3):
...     idx[j].append(i)
... 
>>> idx['the']
[1, 5, 8]

具有多次出现的单词的NLTK索引

2 个答案: