根据时间戳将txt文件数据分段为24小时块

时间:2018-03-23 14:33:06

标签: python python-3.x datetime

我有一个以下形式的txt文件:

Event A       15MAR18 103000       15MAR18 103758    
Event A       16MAR18 120518       16MAR18 121308  
Event B       16MAR18 121203       16MAR18 124543   
Event B       16MAR18 134443       16MAR18 141823 
Event B       16MAR18 151733       16MAR18 155103   
Event B       17MAR18 165013       17MAR18 172343       
Event B       17MAR18 182253       17MAR18 185623     
Event B       17MAR18 195533       17MAR18 202903 
Event A       17MAR18 203738       17MAR18 204028     
Event B       18MAR18 212813       18MAR18 220143     
Event A       18MAR18 221058       18MAR18 222338      
Event B       18MAR18 230103       18MAR18 233423    
Event A       19MAR18 234728       19MAR18 000048       
Event B       20MAR18 003343       20MAR18 010703   
Event A       20MAR18 012508       20MAR18 013418      
Event B       21MAR18 020623       21MAR18 023943       
Event B       21MAR18 033903       21MAR18 041223      
Event B       21MAR18 051143       21MAR18 054503     
Event B       21MAR18 064433       21MAR18 071743     
Event A       22MAR18 074058       22MAR18 075008   
Event B       22MAR18 081713       22MAR18 085023      
Event A       23MAR18 091438       23MAR18 092738     
Event B       23MAR18 094953       23MAR18 102303      
Event A       23MAR18 105148       23MAR18 110418  

我正在尝试根据24小时的时间值与中间列分隔文件。

E.g 15MAR18 103000的第一行将是它自己的单独列表

然后第二行将是不同的列表,因为timedelta是> 24小时。它将从16MAR18 120518到16MAR18 151733组合在一起。等等......

我的尝试如下:

List_Segment_1 = []

with open('file.txt', 'r') as input_file:
     input_file = input_file.readlines()

startTime = datetime.strptime(input_file[0][15:29], '%d%b%y %H%M%S')
endTime = startTime + timedelta(hours=24)


for line in input_file:
     dates= datetime.strptime(line[15:29], '%d%b%y %H%M%S')

     if startTime < dates < endTime:
           List_Segment_1.append(line)

我不知道如何为其他行做这件事......只有第一段&#39; ...真正的txt文件中有数百行...也许有用词典分类数据的更好方法是什么?

帮助表示感谢。理想情况下没有熊猫或任何扩展库

输出应如下:

Event A       15MAR18 103000       15MAR18 103758      Segment1
Event A       16MAR18 120518       16MAR18 121308      Segment2 
Event B       16MAR18 121203       16MAR18 124543      Segment2
Event B       16MAR18 134443       16MAR18 141823      Segment2
Event B       16MAR18 151733       16MAR18 155103      Segment2
Event B       17MAR18 165013       17MAR18 172343      Segment3
Event B       17MAR18 182253       17MAR18 185623      Segment3
Event B       17MAR18 195533       17MAR18 202903      Segment3
Event A       17MAR18 203738       17MAR18 204028      Segment3
Event B       18MAR18 212813       18MAR18 220143      Segment4
Event A       18MAR18 221058       18MAR18 222338      Segment4
Event B       18MAR18 230103       18MAR18 233423      Segment4
Event A       19MAR18 234728       19MAR18 000048      Segment5
Event B       20MAR18 003343       20MAR18 010703      Segment5
Event A       20MAR18 012508       20MAR18 013418      Segment5
Event B       21MAR18 020623       21MAR18 023943      Segment6 
Event B       21MAR18 033903       21MAR18 041223      Segment6
Event B       21MAR18 051143       21MAR18 054503      Segment6
Event B       21MAR18 064433       21MAR18 071743      Segment6
Event A       22MAR18 074058       22MAR18 075008      Segment6
Event B       22MAR18 081713       22MAR18 085023      Segment7
Event A       23MAR18 091438       23MAR18 092738      Segment8
Event B       23MAR18 094953       23MAR18 102303      Segment8
Event A       23MAR18 105148       23MAR18 110418      Segment8

3 个答案:

答案 0 :(得分:4)

这是您问题的简单实现,您应该根据需要进行修改:

from datetime import datetime, timedelta

with open('file.txt', 'r') as input_file:
    lines = input_file.readlines()

base_time = datetime.strptime(lines[0][14:28], '%d%b%y %H%M%S')
end_time = base_time + timedelta(hours=24)
segment = 1

for line in lines:
    date = datetime.strptime(line[14:28], '%d%b%y %H%M%S')

    if base_time <= date < end_time:
        pass
    else:
        segment += 1
        base_time = date
        end_time = date + timedelta(hours=24)

    print(line.strip()  + '\tSegment {}'.format(segment))

此代码段输出:

Event A       15MAR18 103000       15MAR18 103758       Segment 1
Event A       16MAR18 120518       16MAR18 121308       Segment 2
Event B       16MAR18 121203       16MAR18 124543       Segment 2
Event B       16MAR18 134443       16MAR18 141823       Segment 2
Event B       16MAR18 151733       16MAR18 155103       Segment 2
Event B       17MAR18 165013       17MAR18 172343       Segment 3
Event B       17MAR18 182253       17MAR18 185623       Segment 3
Event B       17MAR18 195533       17MAR18 202903       Segment 3
Event A       17MAR18 203738       17MAR18 204028       Segment 3
Event B       18MAR18 212813       18MAR18 220143       Segment 4
Event A       18MAR18 221058       18MAR18 222338       Segment 4
Event B       18MAR18 230103       18MAR18 233423       Segment 4
Event A       19MAR18 234728       19MAR18 000048       Segment 5
Event B       20MAR18 003343       20MAR18 010703       Segment 5
Event A       20MAR18 012508       20MAR18 013418       Segment 5
Event B       21MAR18 020623       21MAR18 023943       Segment 6
Event B       21MAR18 033903       21MAR18 041223       Segment 6
Event B       21MAR18 051143       21MAR18 054503       Segment 6
Event B       21MAR18 064433       21MAR18 071743       Segment 6
Event A       22MAR18 074058       22MAR18 075008       Segment 7
Event B       22MAR18 081713       22MAR18 085023       Segment 7
Event A       23MAR18 091438       23MAR18 092738       Segment 8
Event B       23MAR18 094953       23MAR18 102303       Segment 8
Event A       23MAR18 105148       23MAR18 110418       Segment 8

答案 1 :(得分:0)

假设天数是01-31(不是1-31)我写了一个基于字符串切片的解决方案。但是你也可以将datetime用于这个逻辑。

from pprint import pprint

with open('file.txt', 'r') as input_file:
    input_file = input_file.readlines()

previous_day = 15 # first line of the file
segments = []
day_data = []
for line in input_file:
    current_day = int(line[14:16])
    if current_day > previous_day:
        # new day
        segments.append(day_data) # append before starting new list
        day_data = []
        day_data.append(str(line))
    else:
        day_data.append(str(line))

pprint(segments)

答案 2 :(得分:0)

相当老式的代码,但工作。输出为字典。

import datetime

mydict = {}
l_num = 1
with open('file.txt', 'r') as input_file:
    input_file = input_file.readlines()


for i in range(len(input_file)):
    if i == 0:
        mydict['Segment ' + str(l_num)] = [input_file[i]]
    else:
        prevDate = datetime.datetime.strptime(input_file[i-1].split('       ')[1], '%d%b%y %H%M%S')
        Date = datetime.datetime.strptime(input_file[i].split('       ')[1], '%d%b%y %H%M%S')
        if Date - prevDate > datetime.timedelta(hours = 24):
            l_num += 1
            mydict['Segment ' + str(l_num)] = []
            mydict['Segment ' + str(l_num)].append(input_file[i])
        else:
            mydict['Segment ' + str(l_num)].append(input_file[i])

刚刚注意到。我正在使用Python2。我不确定它是否能在Python3中正常运行。但我希望它确实如此。