文件读写会增加额外的最后一个数字

时间:2015-12-05 05:09:29

标签: python python-2.7

我为爸爸写了一个快速而邋py的python脚本,以便读取给定文件夹中的文本文件并用特定格式替换顶行。对于任何加号(+)和逗号(,)的混合,我深表歉意。目的是取代这样的东西:

Sounding: BASF CPT-1          
   Depth:   1.05 meter(s)

有这样的事情:

Tempo(ms); Amplitude(cm/s)      Valores provisorios da Sismica; Profundidade[m] =  1.05

我以为我已经解决了这个问题,直到我父亲提到所有文本文件都在最后一行中重复了最后一个数字。以下是输出的一些示例:

output sample links - 发布超过2个链接的声誉不够,抱歉

这是我的代码:

TIME    AMPLITUDE  
(ms)


#imports
import glob, inspect, os, re
from sys import argv

#work
is_correct = False
succeeded = 0
failed = 0

while not is_correct:
    print "Please type the folder name: "
    folder_name = raw_input()
    full_path = os.path.dirname(os.path.abspath(__file__)) + "\\" + folder_name + "\\"
    print "---------Looking in the following folder: " + full_path
    print "Is this correct? (Y/N)"
    confirm_answer = raw_input()

    if confirm_answer == 'Y':
        is_correct = True
    else:
        is_correct = False

files_list = glob.glob(full_path + "\*.txt")
print "Files found: ", files_list

for file_name in files_list:
    new_header = "Tempo(ms); Amplitude(cm/s)      Valores provisorios da Sismica; Profundidade[m] ="
    current_file = open(file_name, "r+")
    print "---------Looking at: " + current_file.name
    file_data = current_file.read()
    current_file.close()

    match = re.search("Depth:\W(.+)\Wmeter", file_data)
    if match:
        new_header = new_header + str(match.groups(1)[0]) + "\n"
        print "Depth captured: ", match.groups()
        print "New header to be added: ", new_header
    else:
        print "Match failed!"

    match_replace = re.search("(Sounding.+\s+Depth:.+\s+TIME\s+AMPLITUDE\s+.+\s+)   \d", file_data)
    if match_replace:
        print "Replacing text ..."
        text_to_replace = match_replace.group(1)
        print "SANITY CHECK - Text found: ", text_to_replace
        new_data = file_data.replace(text_to_replace, new_header)
        current_file = open(file_name, "r+")
        current_file.write(new_data)
        current_file.close()
        succeeded = succeeded + 1
    else:
        print "Text not found!"
        failed = failed + 1

    # this was added after I noticed the mysterious repeated number (quick fix)
    # why do I need this?
    lines = file(file_name, 'r').readlines() 
    del lines[-1] 
    file(file_name, 'w').writelines(lines) 

print "--------------------------------"
print "RESULTS"
print "--------------------------------"
print "Succeeded: " , succeeded
print "Failed: ", failed
    #template -- new_data = file_data.replace("Sounding: BASF CPT-1\nDepth:  29.92 meter(s)\nTIME    AMPLITUDE  \n(ms)\n\n")

我到底做错了什么?我不确定为什么最后添加额外的数字(正如您在上面的“修改后的文本文件 - 已损坏”链接中所见)。我确信它很简单,但我没有看到它。如果要复制损坏的输出,只需要注释掉这些行:

    lines = file(file_name, 'r').readlines() 
    del lines[-1] 
    file(file_name, 'w').writelines(lines) 

1 个答案:

答案 0 :(得分:2)

问题在于,当您将新数据写入文件时,您将以模式r+打开文件,这意味着“打开文件进行读写,并从头开始” 。然后,您的代码将数据从头开始写入文件。但是,您的新数据比文件中已有的数据短,并且由于文件没有被截断,因此在文件末尾会留下额外的数据。

快速解决方案:在if match_replace:部分中,更改此行:

current_file = open(file_name, "r+")

到此:

current_file = open(file_name, "w")

这将以写入模式打开文件,并在写入文件之前截断该文件。我刚试过它,它运行正常。