我是python和脚本的新手,所以我非常感谢编写python脚本的一些指导。 所以,到了这一点:
我在目录中有大量文件。有些文件是空的,其他文件包含这样的行:
16 2009-09-30T20:07:59.659Z 0.05 0.27 13.559 6
16 2009-09-30T20:08:49.409Z 0.22 0.312 15.691 7
16 2009-09-30T20:12:17.409Z -0.09 0.235 11.826 4
16 2009-09-30T20:12:51.159Z 0.15 0.249 12.513 6
16 2009-09-30T20:15:57.209Z 0.16 0.234 11.776 4
16 2009-09-30T20:21:17.109Z 0.38 0.303 15.201 6
16 2009-09-30T20:23:47.959Z 0.07 0.259 13.008 5
16 2009-09-30T20:32:10.109Z 0.0 0.283 14.195 5
16 2009-09-30T20:32:10.309Z 0.0 0.239 12.009 5
16 2009-09-30T20:37:48.609Z -0.02 0.256 12.861 4
16 2009-09-30T20:44:19.359Z 0.14 0.251 12.597 4
16 2009-09-30T20:48:39.759Z 0.03 0.284 14.244 5
16 2009-09-30T20:49:36.159Z -0.07 0.278 13.98 4
16 2009-09-30T20:57:54.609Z 0.01 0.304 15.294 4
16 2009-09-30T20:59:47.759Z 0.27 0.265 13.333 4
16 2009-09-30T21:02:56.209Z 0.28 0.272 13.645 6
等等。
我想将这些文件中的这些行放到一个新文件中。但是有一些条件! 如果两个或多个连续行位于6秒的时间窗口内,则只应将具有最高阈值的行打印到新文件中。
所以,就像这样:
原件:
16 2009-09-30T20:32:10.109Z 0.0 0.283 14.195 5
16 2009-09-30T20:32:10.309Z 0.0 0.239 12.009 5
:
16 2009-09-30T20:32:10.109Z 0.0 0.283 14.195 5
请记住,来自不同文件的行可能在6s窗口内有来自其他文件的行,因此输出中的行是来自不同文件的阈值最高的行。
解释行内容的代码在这里:
import glob
from datetime import datetime
path = './*.cat'
files=glob.glob(path)
for file in files:
in_file=open(file, 'r')
out_file = open("times_final", "w")
for line in in_file.readlines():
split_line = line.strip().split(' ')
template_number = split_line[0]
t = datetime.strptime(split_line[1], '%Y-%m-%dT%H:%M:%S.%fZ')
mag = split_line[2]
num = split_line[3]
threshold = float(split_line[4])
no_detections = split_line[5]
in_file.close()
out_file.close()
非常感谢提示,指南......
答案 0 :(得分:0)
您在评论中说过,您知道如何将多个文件合并为1个按t
排序,并且6秒窗口从第一行开始,并且基于实际数据。
因此,您需要一种方法来记住每个窗口的最大阈值,并且只有在确定处理了窗口中的所有行之后才能写入。样本实施:
from datetime import datetime, timedelta
from csv import DictReader, DictWriter
fieldnames=("template_number", "t", "mag","num", "threshold", "no_detections")
with open('master_data') as f_in, open("times_final", "w") as f_out:
reader = DictReader(f_in, delimiter=" ", fieldnames=fieldnames)
writer = DictWriter(f_out, delimiter=" ", fieldnames=fieldnames,
lineterminator="\n")
window_start = datetime(1900, 1, 1)
window_timedelta = timedelta(seconds=6)
window_max = 0
window_row = None
for row in reader:
try:
t = datetime.strptime(row["t"], "%Y-%m-%dT%H:%M:%S.%fZ")
threshold = float(row["threshold"])
except ValueError:
# replace by actual error handling
print("Problem with: {}".format(row))
# switch to new window after 6 seconds
if t - window_start > window_timedelta:
# write out previous window before switching
if window_row:
writer.writerow(window_row)
window_start = t
window_max = threshold
window_row = row
# remember max threshold inside a single window
elif threshold > window_max:
window_max = threshold
window_row = row
# don't forget the last window
if window_row:
writer.writerow(window_row)