Question

我有一个.txt填充了我想要过滤的数据，因为有些行是重复的，唯一的区别是时间戳恰好是2小时之后。应该省略那些副本的晚期版本（例如附加示例中的第一行）。所有其他行应保留并写入新的.txt文件。

1_3_IMM 2016-07-19 16:11:56 00:00:40    2   Sensor Check   #   should go
1_3_IMM 2016-07-19 14:12:40 00:00:33    2   Sensor Check   #   should go
1_3_IMM 2016-07-19 14:11:56 00:00:40    2   Sensor Check   #   should stay
1_3_IMM 2016-07-19 16:12:40 00:00:33    2   Sensor Check   #   should stay
1_4_IMM 2016-07-19 17:23:25 00:00:20    2   Sensor Check   #   should stay
1_4_IMM 2016-07-19 19:23:25 00:00:20    2   Sensor Check   #   should go
1_4_IMM 2016-07-19 19:15:24 00:02:21    2   Sensor Check   #   should stay
1_4_IMM 2016-07-19 19:25:13 00:02:13    2   Sensor Check   #   should stay

我尝试编写一些Python代码来执行任务，但我害怕编码已经有点太久以至于我无法成功。任何人都可以就此问题向我提供一些反馈意见吗？请参阅下面的代码。

def filter_file():
    with open("output.txt", "w") as output: 
        with open("input.txt","r") as logger_input:
            for line in logger_data:
                if...:
                #compare current line with all other lines and DON'T copy
                #current line to output file if:                
                     #1. Machine number is similar (eg 1_3_IMM) &
                     #2. Date stamp is similar &
                     #3. Time stamp is similar with a +02:00:00 difference
                else:                                           
                   output.write(...)   #write line to output file
                   output.write("\n")  #go to new line

if __name__ == "__main__":
    filter_file()

谢谢！

在文本文件中查找重复的行（Python）

0 个答案: