用于从包含数组中的单词的文件中删除行的Python脚本

时间:2010-06-15 06:00:06

标签: python

我有以下脚本,它根据数组标识我要删除的文件中的行,但不删除它们。

我应该改变什么?

sourcefile = "C:\\Python25\\PC_New.txt" 
filename2 = "C:\\Python25\\PC_reduced.txt"

offending = ["Exception","Integer","RuntimeException"]

def fixup( filename ): 
    print "fixup ", filename 
    fin = open( filename ) 
    fout = open( filename2 , "w") 
    for line in fin.readlines(): 
        for item in offending: 
                print "got one",line 
                line = line.replace( item, "MUST DELETE" ) 
                line=line.strip()
                fout.write(line)  
    fin.close() 
    fout.close() 

fixup(sourcefile)

4 个答案:

答案 0 :(得分:5)

sourcefile = "C:\\Python25\\PC_New.txt" 
filename2 = "C:\\Python25\\PC_reduced.txt"

offending = ["Exception","Integer","RuntimeException"]

def fixup( filename ): 
    fin = open( filename ) 
    fout = open( filename2 , "w") 
    for line in fin: 
        if True in [item in line for item in offending]:
            continue
        fout.write(line)
    fin.close() 
    fout.close() 

fixup(sourcefile)

编辑:甚至更好:

for line in fin: 
    if not True in [item in line for item in offending]:
        fout.write(line)

答案 1 :(得分:2)

基本策略是将输入文件的副本写入输出文件,但需要进行更改。在您的情况下,更改非常简单:您只需省略您不想要的行。

安全地写入副本后,可以删除原始文件并使用“os.rename()”将临时文件重命名为原始文件名。我喜欢将temp文件写在与原始文件相同的目录中,以确保我有权在该目录中写入,因为我不知道os.rename()是否可以将文件从一个卷移动到另一个卷。

您无需说for line in fin.readlines();这足以说for line in fin。当你使用.readlines()时,你告诉Python将输入文件的每一行全部读入内存;当你只使用fin时,你一次只读一行。

以下是您的代码,经过修改后可以进行这些更改。

sourcefile = "C:\\Python25\\PC_New.txt" 
filename2 = "C:\\Python25\\PC_reduced.txt"

offending = ["Exception","Integer","RuntimeException"]

def line_offends(line, offending):
    for word in line.split():
        if word in offending:
            return True
    return False

def fixup( filename ): 
    print "fixup ", filename 
    fin = open( filename ) 
    fout = open( filename2 , "w") 
    for line in fin:
        if line_offends(line, offending):
            continue
        fout.write(line)
    fin.close()
    fout.close()
    #os.rename() left as an exercise for the student

fixup(sourcefile)

如果line_offends()返回True,我们执行continue并继续循环而不执行下一部分。这意味着该行永远不会被写入。对于这个简单的例子,这样做真的很好:

    for line in fin:
        if not line_offends(line, offending):
            fout.write(line)

我用continue写了它,因为通常在主循环中完成了非平凡的工作,如果测试为真,你想要避免所有这些工作。恕我直言,有一个简单的“如果这条线是不需要的,继续”更好,而不是在if内缩进一大堆东西,因为这种情况可能非常罕见。

答案 2 :(得分:0)

您没有将其写入输出文件。另外,我会使用“in”来检查行中存在的字符串。请参阅下面的修改后的脚本(未测试):

sourcefile = "C:\\Python25\\PC_New.txt" 
filename2 = "C:\\Python25\\PC_reduced.txt"

offending = ["Exception","Integer","RuntimeException"]

def fixup( filename ): 
    print "fixup ", filename 
    fin = open( filename ) 
    fout = open( filename2 , "w") 

    for line in fin.readlines(): 
        if not offending in line:
            # There are no offending words in this line
            # write it to the output file
            fout.write(line)

    fin.close() 
    fout.close() 

fixup(sourcefile)

答案 3 :(得分:0)

'''这是一个相当简单的实现,但应该做你正在搜索的'''

sourcefile = "C:\\Python25\\PC_New.txt"

filename2 = "C:\\Python25\\PC_reduced.txt"

offending = ["Exception","Integer","RuntimeException"]

def fixup( filename ): 

    print "fixup ", filename 
    fin = open( filename ) 
    fout = open( filename2 , "w") 
    for line in fin.readlines(): 
        for item in offending: 
                print "got one",line 
                line = line.replace( item, "MUST DELETE" ) 
                line=line.strip()
                fout.write(line)  
    fin.close() 
    fout.close() 

fixup(sourcefile)