在文本文件中搜索列表中的每个单词并打印行

时间:2015-09-02 19:28:59

标签: python list

我想在.txt文件中搜索"列表"单词并打印txt中包含单词表中任何单词的任何行。

我首先使用.split()拆分raw_input(称为userInput)并获得了一个单词列表。之后,我用另一个黑名单wordlist过滤了当前的wordlist,得到了最终过滤的wordlist。在这种情况下,我想在文本文件中搜索任何单词。

exWords = ['Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!']
while True:
    userInput = raw_input("> ")
    uqWords = userInput.split()
    fqWords = [word for word in uqWords if not any(bad in word for bad in exWords)]

我将userInput分开.split()并将其称为uqWords后,我将其从exWords列表中的任何字词中过滤掉,并调用输出fqWords。现在,我想在Database.txt列表中搜索fqWords列表中的任何字词并打印行。

指定;我的完整代码是:

import time
import random

Error = ["Sorry, I don't understand.", "I don't get it"]
exWords = ['Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!']
R = "Rel > "

while True:
    userInput = raw_input("> ")
    uqWords = userInput.split()
    fqWords = [word for word in uqWords if not any(bad in word for bad in exWords)]
    DB = open("Database.txt")
    for line in DB:
        if fqWords in line:
            print (R + line[:-1])
    CDB = open("CodeDB.txt")
    for code in CDB:
        if fqWords in code:
            print (R + code[:-1])
            break
        if fqWords not in (code and line):
            randomError = random.choice(Error)
            print (R + (randomError))

2 个答案:

答案 0 :(得分:3)

尝试使用此功能:

def search_for_lines(filename, words_list):
    words_found = 0
    with open(filename) as db_file:
        for line_no, line in enumerate(db_file):
            if any(word in line for word in words_list):
                print(line_no, ':', line)
                words_found += 1
    return words_found

只需传递您要搜索的文件名和单词列表,它就会打印行号以及行内容,并返回与任何单词一起找到的行数。当文件遍历每一行时,enumerate将为您提供行号和行本身的元组。

要将此添加到现有代码并搜索两个文件,您需要先声明它,然后在分配fqWords之后立即调用它:

import random

def search_for_lines(filename, words_list):
    words_found = 0
    with open(filename) as db_file:
        for line_no, line in enumerate(db_file):
            if any(word in line for word in words_list):
                print(line_no, ':', line)
                words_found += 1
    return words_found

Error = ["Sorry, I don't understand.", "I don't get it"]
exWords = ['Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!']
R = "Rel > "

while True:
    userInput = raw_input("> ")
    uqWords = userInput.split()
    fqWords = [word for word in uqWords if not any(bad in word for bad in exWords)]
    search_for_lines("Database.txt", fqWords)

    words_found = search_for_lines("CodeDB.txt", fqWords)

    if words_found > 0:
        break
    else:
        randomError = random.choice(Error)
        print (R + (randomError))

答案 1 :(得分:0)

如果您不需要修改列表,请使用tuple。对于命名标识符,请参阅PEP 8 要获得序列的差异,请使用set,f.e。 {1,2,3} - {2,3}{1} 如果你在循环中open个相同的文件,它会在每次迭代中打开,所以最好将它们移出循环。

import random

def get_line_with_words(lines, words):

    """returns list of lines if any of the words
       in any of the lines
    """
    return [(i, line.strip()) for i, line in enumerate(lines,1) if any(word in line for word in words)]

errors = ("Sorry, I don't understand.", "I don't get it")
ex_words = ('Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!')
prefix = "Rel > "

with open("Database.txt") as db, open("CodeDB.txt") as cdb:
    while True:
        user_input = raw_input("> ")
        uq_words = user_input.split()
        fq_words = frozenset(uq_words) - frozenset(ex_words)

        res1 = get_line_with_words(db, fq_words)
        res2 = get_line_with_words(cdb, fq_words)

        if res1 and res2:
            for n, line in res1 + res2:
                print('{} {} {}'.format(prefix, n, line)
            break

        print('{} {}'.format(prefix, random.choice(errors)))
        db.seek(0)
        cdb.seek(0)