在Python中更改文本文件中句子的排列

时间:2018-10-08 14:18:23

标签: python nlp

我是Python新手,有一个带句子的文本文件“ in_file.txt”

in_file = ['sentence one',
           'sentence two', 
           'sentence has the word bad one', 
           'sentence four', 
           'sentence five', 
           'sentence six', 
           'sentence seven', 
           'sentence has the word bad two', 
           'sentence nine']

在这些词中,有一个句子中只有“坏”一词。我想在其中包含“坏”一词的任何一行的上述5个句子中,并按照以下内容制作一个段落(开头时可能不存在5个句子除外):

out_file = ['sentence one sentence two',
            'sentence has the word bad sentence four sentence five sentence six sentence seven']

然后将其保存在文件“ out_file.txt”中。感谢您的帮助,如果我没有提供足够的解释,请告诉我。请注意,也许输入文件中的所有句子都没有到达输出文件中的最终选择。我只对那些句子中位于另一个句子上方且在其内5个句子以内的单词“坏”感兴趣。

仅仅是一个起点:

with open("in_file.txt", "r") as lines:
    for line in lines
    # maybe there is an index counter here!
    for word in line
    if word = bad
    # then take the above 5 lines
    # add to the out_file
    # return out_file

2 个答案:

答案 0 :(得分:1)

IUCC下面是适合您的代码

with open("in_file.txt", "r") as f:
    l = f.readlines()
# l = ['sentence one',
#     'sentence two',
#      'sentence has the word bad one',
#      'sentence four',
#      'sentence five',
#      'sentence six',
#      'sentence seven',
#      'sentence has the word bad two',
#      'sentence nine']
final_para=[]
previous_index=0
for index,value in enumerate(l):
    if "bad" in value:
        final_para.append(' '.join(l[previous_index:min(index,previous_index+5)]))
        previous_index=index

print(final_para)#['sentence one sentence two', 'sentence has the word bad one sentence four sentence five sentence six sentence seven']

with open('out_file.txt', 'w') as f:
    for item in final_para:
        f.write("%s\n" % item)

答案 1 :(得分:-1)

with open("in_file.txt", "r") as f:
    l = f.readlines()

# where l is 

l = ['sentence has the word bad one',
         'sentence four',
         'sentence five',
         'sentence six',
         'sentence seven',
         'sentence has the word bad two',
         'sentence nine']

# sentences with "bad"
" ".join(filter( lambda x: x.find("bad") > -1, l))
## -> 'sentence has the word bad one sentence has the word bad two'

# sentences without "bad"
" ".join(filter( lambda x: x.find("bad") == -1, l))
## -> 'sentence four sentence five sentence six sentence seven sentence nine'