Question

我正在使用Python 2.7

我想浏览一个.txt文件，只保留包含一个或多个关键字列表的句子。

之后，我想再次使用另一个关键字列表浏览剩余的文本并重复处理。

我想在.txt中保存的结果，其余的都可以删除。

我是Python新手（但很喜欢它！）所以不要担心伤害我的感受，你可以自由地承担我的一点知识并且稍微愚蠢一点：）

这是我到目前为止所做的：

import re

f = open('C:\\Python27\\test\\A.txt')

text = f.read()
define_words = 'contractual'
print re.findall(r"([^.]*?%s[^.]*\.)" % define_words,text)

到目前为止，它可以过滤掉任何带有“契约”的句子。如果我将“合同义务”放在那里，它将过滤掉那两个单词彼此相邻的句子。

我所坚持的是如何将其改为一系列单词，这些单词将彼此分开考虑？像“契约”，“义务”，“法律”，“雇主”等等

编辑关于applepi的回答：

我做了一些小测试的测试：

“快速的棕色狐狸跳过懒狗。

新行。

又一个不错的新线。“

如果我在字符串中放入2个单词，我只会得到一个句子。喜欢['quick'，'brown']

输出：['T'，'h'，'e'，''，'q'，'u'，'i'，'c'，'k'，''，'b'，'r '，'o'，'w'，'n'，''，'f'，'o'，'x'，'y'，''，'j'，'u'，'m'，'p '，'s'，''，'o'，'v'，'e'，'r'，''，'t'，'h'，'e'，''，'l'，'a' ，'z'，'y'，''，'d'，'o'，'g'，'。']

所以['quick'，'another']没有任何结果。

['还是'，'另一个']将提出：

输出：[''，'\ n'，'\ n'，'Y'，'e'，'t'，''，'a'，'n'，'o'，'t'， 'h'，'e'，'r'，''，'n'，'i'，'c'，'e'，''，'n'，'e'，'w'，''，' l'，'我'，'n'，'e'，'。']

Answer 1

为什么不使用列表理解？

print [sent for sent in text.split('.') 
        if any(word in sent for word in define_words.split()) ]

或者如果您更改了字符串列表的define_words：

# define_words = ['contractual', 'obligations']
define_words = 'contractual obligations'.split()

print [sent for sent in text.split('.') 
        if any(word in sent for word in define_words) ]

Answer 2

def init_contains_useful_word(words_to_search_for):

    def contains_useful_word(sentence):
        return any(map(lambda x: x in sentence, words_to_search_for))

with open(filename, 'r') as f:
    text = f.read()

sentences = text.split(".")

for words in list_of_lists:
    contains_useful_word = init_contains_useful_word(words)

    sentences = filter(contains_useful_word, sentences)

with open(filename, 'w') as f:
    f.write(sentences.join(" "))

实际上，如果您愿意，可以用您的re运算符替换包含有用的单词。

Answer 3

我无法发表评论（我没有足够的声誉），所以这个答案在技术上并不是一个答案。

我对正则表达式不太熟悉，但假设re.findall()成功，您可以使用以下代码：

import re, itertools
from collections import Counter
f = open('C:\\Python27\\test\\A.txt')

text = f.read()
everything = []
define_words = ['contractual', 'obligation', 'law', 'employer']
for k in define_words:
    everything.append(re.findall(r"([^.]*?%s[^.]*\.)" % k,text))

everything = list(itertools.chain(*everything))
counts = Counter(everything)
everything = [value for value, count in counts.items() if count > 1]
everything = list(itertools.chain(*everything))
print everything

这会遍历数组列表并将值添加到列表中，从而生成列表列表。然后我只保留重复项（好的值），并将列表列表转换为一个列表。

错误：真正的错误是所有内容都是Counter(everything)不允许的列表列表。因此，我在Counter()之前删除了它。

使用Python查找包含一组关键字的句子

3 个答案: