我为爸爸写了一个快速而邋py的python脚本,以便读取给定文件夹中的文本文件并用特定格式替换顶行。对于任何加号(+)和逗号(,)的混合,我深表歉意。目的是取代这样的东西:
Sounding: BASF CPT-1
Depth: 1.05 meter(s)
有这样的事情:
Tempo(ms); Amplitude(cm/s) Valores provisorios da Sismica; Profundidade[m] = 1.05
我以为我已经解决了这个问题,直到我父亲提到所有文本文件都在最后一行中重复了最后一个数字。以下是输出的一些示例:
output sample links - 发布超过2个链接的声誉不够,抱歉
这是我的代码:
TIME AMPLITUDE
(ms)
#imports
import glob, inspect, os, re
from sys import argv
#work
is_correct = False
succeeded = 0
failed = 0
while not is_correct:
print "Please type the folder name: "
folder_name = raw_input()
full_path = os.path.dirname(os.path.abspath(__file__)) + "\\" + folder_name + "\\"
print "---------Looking in the following folder: " + full_path
print "Is this correct? (Y/N)"
confirm_answer = raw_input()
if confirm_answer == 'Y':
is_correct = True
else:
is_correct = False
files_list = glob.glob(full_path + "\*.txt")
print "Files found: ", files_list
for file_name in files_list:
new_header = "Tempo(ms); Amplitude(cm/s) Valores provisorios da Sismica; Profundidade[m] ="
current_file = open(file_name, "r+")
print "---------Looking at: " + current_file.name
file_data = current_file.read()
current_file.close()
match = re.search("Depth:\W(.+)\Wmeter", file_data)
if match:
new_header = new_header + str(match.groups(1)[0]) + "\n"
print "Depth captured: ", match.groups()
print "New header to be added: ", new_header
else:
print "Match failed!"
match_replace = re.search("(Sounding.+\s+Depth:.+\s+TIME\s+AMPLITUDE\s+.+\s+) \d", file_data)
if match_replace:
print "Replacing text ..."
text_to_replace = match_replace.group(1)
print "SANITY CHECK - Text found: ", text_to_replace
new_data = file_data.replace(text_to_replace, new_header)
current_file = open(file_name, "r+")
current_file.write(new_data)
current_file.close()
succeeded = succeeded + 1
else:
print "Text not found!"
failed = failed + 1
# this was added after I noticed the mysterious repeated number (quick fix)
# why do I need this?
lines = file(file_name, 'r').readlines()
del lines[-1]
file(file_name, 'w').writelines(lines)
print "--------------------------------"
print "RESULTS"
print "--------------------------------"
print "Succeeded: " , succeeded
print "Failed: ", failed
#template -- new_data = file_data.replace("Sounding: BASF CPT-1\nDepth: 29.92 meter(s)\nTIME AMPLITUDE \n(ms)\n\n")
我到底做错了什么?我不确定为什么最后添加额外的数字(正如您在上面的“修改后的文本文件 - 已损坏”链接中所见)。我确信它很简单,但我没有看到它。如果要复制损坏的输出,只需要注释掉这些行:
lines = file(file_name, 'r').readlines()
del lines[-1]
file(file_name, 'w').writelines(lines)
答案 0 :(得分:2)
问题在于,当您将新数据写入文件时,您将以模式r+
打开文件,这意味着“打开文件进行读写,并从头开始” 。然后,您的代码将数据从头开始写入文件。但是,您的新数据比文件中已有的数据短,并且由于文件没有被截断,因此在文件末尾会留下额外的数据。
快速解决方案:在if match_replace:
部分中,更改此行:
current_file = open(file_name, "r+")
到此:
current_file = open(file_name, "w")
这将以写入模式打开文件,并在写入文件之前截断该文件。我刚试过它,它运行正常。