Question

我有一个包含很多行的文件，我需要创建一个新文件，不包括包含一些单词的行。

创建了一个有效的代码，但是有很多单词，所以，最好将这些单词存储在列表中并验证该列表的项目。遵循代码：

infile = file('./infile_test.txt')
newopen = open('./newfile.txt', 'w')

for line in infile:
    if 'ssh' not in line and 'snmp' not in line and 'etc' not in line:
        newopen.write(line)

这是一个例子，但假设infile_test.txt包含以下行，将创建一个新文件，不包括第2,4和6行：

line 1: this is a file test
line 2: ssh, snmp
line 3: the idea is to iterate in each line of this file
line 4: if the list of words (ssh,etc) does not appears in any of the line
line 5: then write the line in another file
line 6: etc
line 7: itens have been removed or not ?

我相信创建一个列表：

list = ['ssh', 'snmp', 'etc']

然后迭代比较每个列表itens可能会更好，尝试做一个新的“for”，使用“所有”和“任何”功能，但不能很好地工作。

任何人都知道实现这一目标的更好方法吗？

Answer 1

infile = open('./infile_test.txt')
newopen = open('./newfile.txt', 'w')
words = ['ssh', 'snmp', 'etc']
for line in infile:
    found = True
    for word in words:
        if word in line:
            found = False
    if not found:
        newopen.write(line)

Answer 2

infile = file('./infile_test.txt')
outfile = open('./newfile.txt', 'w')

ignore_list = ['ssh', 'snmp', 'etc']

for line in infile:
    if not any(word in line for word in ignore_list):
        outfile.write(line)

Answer 3

infile = file('./infile_test.txt')
newopen = open('./newfile.txt', 'w')
ignoreList = ['ssh', 'snmp', 'etc']
for line in infile:
    showLine = True
    for i in ignoreList:
        if i in line:
            showLine = False

    if showLine:
        newopen.write(line)

 # Don't forget to close the files
 infile.close()
 newopen.close()

Answer 4

尝试一下：

word_list = ['ssh', 'snmp', 'etc']
result_lines = []
for line in infile:
    if all(line.lower().find(word.lower()) < 0 for word in word_list):
        result_lines.append(line)
newopen.writelines(result_lines)

Answer 5

完整脚本：

my_unwanted_words = set(['ssh', 'snmp', 'etc'])
with open("infile_test.txt", 'r') as infile, open("newfile.txt", 'w') as newopen:
    lines = infile.readlines()
    [newopen.write(line) for line in lines if not (set(line.split()) & my_unwanted_words)]

第一行：

my_unwanted_words = set(['ssh', 'snmp', 'etc'])

使用一组来收集不需要的单词。设置仅允许唯一值，因此如果您要从文件中读取这些值或以其他方式收集它们的大量集合，则您没有重复项。此外，您还可以使用交集设置运算符＆＃39;＆amp;＆＃39;稍后在剧本中。

第二行：

with open("infile_test.txt", 'r') as infile, open("newfile.txt", 'w') as newopen:

使用＆＃39;打开文件被认为是一种很好的做法。因为它会进行额外的内务管理，例如在您完成文件后自动关闭文件。请注意，您可以在这一行中打开这两个文件。

第三行：

    lines = infile.readlines()

行现在是一个字符串列表，每个字符串代表原始文件中的一行。

第四行和最后一行：

    [newopen.write(line) for line in lines if not (set(line.split()) & my_unwanted_words)]

这是真正工作的地方。这是一个列表理解，如果当前行newopen.write(line)中的单词集与您的不需要的单词集{{}之间没有交集&，则只返回要写入的行set(line.split()) my_unwanted_words 1}}。

我上面的脚本有点懒，给你最后的解决方案。如果没有进一步的说明，split（）只会根据空格将您的行分成单词。因此，如果您将一个不需要的单词隐藏在parens中或与其他标点符号相邻，就像使用输入文件的第4行一样，split（）将返回一个麻烦的单词......

(ssh,etc)

...与您不需要的列表中的任何内容都不匹配，因此会传递给newfile.txt。使用split（）的参数来解决这个问题。您还可以查看Python的re模块，将line.split（）替换为某种正则表达式。

祝你好运！

检查列表中的任何项是否在文件的行中，如果没有，则将该行写入新文件中

5 个答案: