遍历字符串项列表并返回python中包含子字符串的项

时间:2018-07-15 14:13:36

标签: python string list nltk

我试图遍历句子列表,仅拉出包含子字符串(关键字)的列表中的项,当在函数中使用return而不是yield时,我得到了一个字符对vs yield的列表完整的句子,但我知道这是一个生成器,并且想要包含该单词的每个句子的完整列表。是.find()引起问题,还是从字符串项目列表中提取更好的方法?

import nltk
from nltk import *
import pandas as pd
f= open("filename.txt").read()
sent_list = sent_tokenize(f)

hunt = "youth" #keyword i'm searching for
def hunter(sent):
    for term in sent:
        if term.find(hunt) is not -1:
            yield term

complete_lst = [term for term in hunter(sent_list)]
df = pd.DataFrame({'key_term_sentences':complete_lst})

2 个答案:

答案 0 :(得分:1)

您的代码中有两个错误,其中一个不使用split的错误。解决此问题后,一切正常。下面是一个工作示例:

In [31]: sent_list = ['this is first sentence for demo purposes', 
                      'this is second sentence containing youth and youthful', 
                      'this is 3rd sentence which is dummy one btw']

In [32]: hunt = 'youth'

# note that we need two `for` loops since the function takes list of sentences
In [33]: def hunter(sent_list):
    ...:     for sent in sent_list:
    ...:         for term in sent.split():
    ...:             if hunt in term:
    ...:                 yield term
    ...:                 

In [34]: list(hunter(sent_list))
Out[34]: ['youth', 'youthful']

只是为了证明您也可以在使用term.find(hunt)的同时使用它:

In [35]: def hunter(sent_list):
    ...:     for sent in sent_list:
    ...:         for term in sent.split():
    ...:             if term.find(hunt) is not -1:
    ...:                 yield term
    ...:                 

In [36]: list(hunter(sent_list))
Out[36]: ['youth', 'youthful']

答案 1 :(得分:0)

一种更简单的方法是将每个句子.split放入单个句子列表中。从那里可以遍历每个单词,将其拆分,然后检查单词是否在句子中。

hunt = "youth"
def hunter(sent):
    sentences = sent.split('.')
    for each in sentences:
        check = each.split(' ')
        for word in check:
            if word = hunt:
                print each