如何创建一个过滤掉python中某些行的黑名单?

时间:2017-01-20 03:13:06

标签: python

我试图创建一个函数,在一个以(不需要显式)开头的文件中找到一些行,例如:" aaa一行继续句子&# 34;或者" iii另一个继续判决"并在另一个名为blacklist的文件中写下它找到的确切行。

例如,让我们说我的文件来自这个功能:

def writeletters(self):
    outf = "xfile.txt"
    alphabet = ['a','b','c','d','e','f', 'g', 'h' ,'i']
    with open(outf, "w") as a:
        i = 0
        b = 5
        while i < len(alphabet):
            a.write((alphabet[i] * b) + '\n')
            i += 1

输出结果为:

aaaaa
bbbbb
ccccc
ddddd
eeeee
fffff
ggggg
hhhhh
iiiii

我怎么才能得到以&#34; aaa&#34;开头的行输出?或&#34; iii&#34;发送或写入另一个文件?

bbbbb
ccccc
ddddd
eeeee
fffff
ggggg
hhhhh

为了尝试实现我想要的东西,我编写了黑名单功能,但显然不起作用

  def blackList(self):
        filep = "xfile.txt"
        blacklist = ['aaa', 'iii']
        i = 0
        with open(filep) as bl:
            for line in bl:
                i + 1
                if any(s in line for s in blacklist):
                    print blacklist[i]

3 个答案:

答案 0 :(得分:2)

你可以大大简化这个

def blackList(self):
    filep = "xfile.txt"
    output = "output.txt"
    blacklist = ['aaa', 'iii']
    with open(filep, "r") as in_fh, open(output, "w") as out_fh:
        to_write = []
        for line in in_fh.readlines():
            for bad_entry in blacklist:
                if line.startswith(bad_entry):  # keep bad lines
                    to_write.append(line)
        out_fh.writelines(to_write)

对于一种尖锐但不太明显的方法,请尝试以下方法:

def blacklist_writer(input_file, output_file, blacklist):
    with open(input_file, "r") as in_fh, open(output_file, "w") as out_fh:
        # check l against blacklist in a nested generator
        out_fh.write("".join(l for l in in_fh.readlines() if [b for b in blacklist if l.startswith(b)]))

它创建了一个生成器,用于检查input_file中每一行与另一个生成器的对应关系,该生成器生成与黑名单匹配的每一行的列表。如果没有匹配项,则列表将为空,并且&#34; falsey&#34;。

答案 1 :(得分:0)

您可以使用正则表达式,但其上的模式将根据您尝试过滤的内容而有所不同。如果你真的只想过滤掉以3 a或3 i开头的行,你可以使用re.match()

import re

regex_pattern = 'a{3}|i{3}'

def writeletters(regex_pattern):
    with open('xfile.txt', 'r') as file:
        for line in file:
            if re.match(regex_pattern, line):
                print line #replace this line with code to write to file

regex_pattern说“连续3个或者是我的”。 re.match()将使用给定的正则表达式模式匹配任何字符串开始

答案 2 :(得分:0)

我意识到我解决这个问题的原始尝试很接近。我只需要打印我的行而不是我的黑名单列表,所以我也会发布我的解决方案。 (愚蠢的半禁区错误)

def blackList(self):
    filep = "xfile"
    blacklist = ['aaa', 'iii']
    out = "blacklist.txt"

    with open(filep) as bl, open(out, "w") as output:
        for line in bl:
            if any(s in line for s in blacklist):
                output.writelines(line)

实际写入没有列入黑名单行的原始文件的黑名单如下

def blackList(self):
    filep = "xfile"
    blacklist = ['aaa', 'iii']
    out = "blacklist.txt"

    with open(filep) as bl, open(out, "w") as output:
        for line in bl:
            if not any(s in line for s in blacklist):
                output.writelines(line)