多个文件(从开始和结束行删除相同的字符串)

时间:2014-03-05 17:33:00

标签: python python-2.7

我有多个文件。每个文件的格式如下所示:

<float> <int> <stringSAME>
<float> <int> <stringSAME>
<float> <int> <string>
......
<float> <int> <stringSAME>
......
......
<float> <int> <string>
<float> <int> <stringSAME>
<float> <int> <stringSAME>

这里,第1行和第2行的字符串相同,而最后几行的字符串也相同。它表示为stringSAME。现在我想从文件的开头和结尾删除这个stringSame。但保持在stringSame之间完好无损。该过程用于具有相同格式的多个文件。 请提出一些解决方案。我使用python作为我的编程语言。

1 个答案:

答案 0 :(得分:0)

有几种方法可以执行此操作,具体取决于您以后如何使用它。

如果您只是尝试将这些行用作DATA,我们可以这样做:

with open("path/to/file") as f:
    data = (line for line in f if not line.endswith("<stringSAME>"))

如果您需要更改文件本身,我会考虑在此之后覆盖它。

with open("path/to/file","w") as f: 
    # opening in "w" mode BLANKS the file, so make sure
    # that you've saved that original data somewhere
    for line in data:
        f.write(line+"\n")

您也可以将其写入文件的新副本:

with open("path/to/sanitized_file","w") as f:
    for line in data:
        f.write(line+"\n")

如果您尝试在多个文件上执行此操作,请先构建一个列表。

import os

list_of_files = ["file1.txt","file2.txt","file3.txt"]
for file in list_of_files:
    in_file = os.path.join("path","to",file)
    out_file = os.path.join("path","to","post proc",file)

# if you have to do it to a whole directory worth of files, try this instead
## import glob
## list_of_files = glob.glob("path/to/dir/*")
## for file in list_of_files:
##     in_file = file
##     head,tail = os.path.split(file)
##     out_file = os.path.join(head,"post proc",tail)
# which simplifies a lot of the following, since none of the FileNotFoundErrors
# should ever trigger, other than the one in case the post proc directory
# doesn't exist

    try:
        with open(in_file, 'r') as f:
            data = (line for line in f if not line.endswith("<stringSAME>"))
    except IOError as e:
        # log and handle error if you can't open file
    except FileNotFoundError as e:
        # log and handle error if the file isn't there
    try:
        with open(out_file,'w') as f:
            for line in data:
                f.write(line+"\n")
    except IOError as e:
        # log and handle error if you can't write to file
    except FileNotFoundError as e:
        # log and handle error if the directory doesn't exist