Question

我是python和脚本的新手，所以我非常感谢编写python脚本的一些指导。所以，到了这一点：

我在目录中有大量文件。有些文件是空的，其他文件包含这样的行：

16 2009-09-30T20：07：59.659Z 0.05 0.27 13.559 6
16 2009-09-30T20：08：49.409Z 0.22 0.312 15.691 7
16 2009-09-30T20：12：17.409Z -0.09 0.235 11.826 4
16 2009-09-30T20：12：51.159Z 0.15 0.249 12.513 6
16 2009-09-30T20：15：57.209Z 0.16 0.234 11.776 4
16 2009-09-30T20：21：17.109Z 0.38 0.303 15.201 6
16 2009-09-30T20：23：47.959Z 0.07 0.259 13.008 5
16 2009-09-30T20：32：10.109Z 0.0 0.283 14.195 5
16 2009-09-30T20：32：10.309Z 0.0 0.239 12.009 5
16 2009-09-30T20：37：48.609Z -0.02 0.256 12.861 4
16 2009-09-30T20：44：19.359Z 0.14 0.251 12.597 4
16 2009-09-30T20：48：39.759Z 0.03 0.284 14.244 5
16 2009-09-30T20：49：36.159Z -0.07 0.278 13.98 4
16 2009-09-30T20：57：54.609Z 0.01 0.304 15.294 4
16 2009-09-30T20：59：47.759Z 0.27 0.265 13.333 4
16 2009-09-30T21：02：56.209Z 0.28 0.272 13.645 6

等等。

我想将这些文件中的这些行放到一个新文件中。但是有一些条件！如果两个或多个连续行位于6秒的时间窗口内，则只应将具有最高阈值的行打印到新文件中。

所以，就像这样：

原件：
16 2009-09-30T20：32：10.109Z 0.0 0.283 14.195 5
16 2009-09-30T20：32：10.309Z 0.0 0.239 12.009 5

输出文件中的

：
16 2009-09-30T20：32：10.109Z 0.0 0.283 14.195 5

请记住，来自不同文件的行可能在6s窗口内有来自其他文件的行，因此输出中的行是来自不同文件的阈值最高的行。

解释行内容的代码在这里：

import glob 
from datetime import datetime

path = './*.cat'   
files=glob.glob(path)   
for file in files:  

    in_file=open(file, 'r')  
    out_file = open("times_final", "w")

    for line in in_file.readlines():
        split_line = line.strip().split(' ')
        template_number = split_line[0]
        t = datetime.strptime(split_line[1], '%Y-%m-%dT%H:%M:%S.%fZ')
        mag = split_line[2]
        num = split_line[3]
        threshold = float(split_line[4])
        no_detections = split_line[5]

in_file.close()
out_file.close()

非常感谢提示，指南......

Answer 1

您在评论中说过，您知道如何将多个文件合并为1个按t排序，并且6秒窗口从第一行开始，并且基于实际数据。

因此，您需要一种方法来记住每个窗口的最大阈值，并且只有在确定处理了窗口中的所有行之后才能写入。样本实施：

from datetime import datetime, timedelta
from csv import DictReader, DictWriter

fieldnames=("template_number", "t", "mag","num", "threshold", "no_detections")
with open('master_data') as f_in, open("times_final", "w") as f_out:
    reader = DictReader(f_in, delimiter=" ", fieldnames=fieldnames)
    writer = DictWriter(f_out, delimiter=" ", fieldnames=fieldnames,
                        lineterminator="\n")
    window_start = datetime(1900, 1, 1)
    window_timedelta = timedelta(seconds=6)
    window_max = 0
    window_row = None
    for row in reader:
        try:
            t = datetime.strptime(row["t"], "%Y-%m-%dT%H:%M:%S.%fZ")
            threshold = float(row["threshold"])
        except ValueError:
            # replace by actual error handling
            print("Problem with: {}".format(row))    
        # switch to new window after 6 seconds
        if t - window_start > window_timedelta:
            # write out previous window before switching
            if window_row:
                writer.writerow(window_row)
            window_start = t
            window_max = threshold
            window_row = row
        # remember max threshold inside a single window
        elif threshold > window_max:
            window_max = threshold
            window_row = row
    # don't forget the last window
    if window_row:
        writer.writerow(window_row)

如果它们位于时间窗内，我如何仅获得具有最高价值的那些行？

1 个答案: