我目前有一个包含类似
的列表的文件example = ['Mary had a little lamb' ,
'Jack went up the hill' ,
'Jill followed suit' ,
'i woke up suddenly' ,
'it was a really bad dream...']
我想通过示例找到带有“醒来”一词的句子的索引。 在这个例子中,答案应该是f(“醒来”)= 3。 F是一个函数。
我试图将每个句子标记为首先找到单词的索引:
>>> from nltk.tokenize import word_tokenize
>>> example = ['Mary had a little lamb' ,
... 'Jack went up the hill' ,
... 'Jill followed suit' ,
... 'i woke up suddenly' ,
... 'it was a really bad dream...']
>>> tokenized_sents = [word_tokenize(i) for i in example]
>>> for i in tokenized_sents:
... print i
...
['Mary', 'had', 'a', 'little', 'lamb']
['Jack', 'went', 'up', 'the', 'hill']
['Jill', 'followed', 'suit']
['i', 'woke', 'up', 'suddenly']
['it', 'was', 'a', 'really', 'bad', 'dream', '...']
但我不知道如何最终得到单词的索引以及如何将它链接到句子的索引。有人知道怎么做吗?
答案 0 :(得分:1)
您可以迭代列表中的每个字符串,在空白处拆分,然后查看您的搜索字是否在该字列表中。如果在列表推导中执行此操作,则可以将索引列表返回到满足此要求的字符串。
def f(l, s):
return [index for index, value in enumerate(l) if s in value.split()]
>>> f(example, 'woke')
[3]
>>> f(example, 'foobar')
[]
>>> f(example, 'a')
[0, 4]
如果您更喜欢使用nltk
库
def f(l, s):
return [index for index, value in enumerate(l) if s in word_tokenize(value)]
答案 1 :(得分:0)
for index, sentence in enumerate(tokenized_sents):
if 'woke' in sentence:
return index
对于所有句子:
return [index for index, sentence in enumerate(tokenized_sets) if 'woke' in sentence]
答案 2 :(得分:0)
如果要求返回带有该单词出现的第一个句子,您可以使用类似的内容 -
def func(strs, word):
for idx, s in enumerate(strs):
if s.find(word) != -1:
return idx
example = ['Mary had a little lamb' ,
'Jack went up the hill' ,
'Jill followed suit' ,
'i woke up suddenly' ,
'it was a really bad dream...']
func(example,"woke")