我有一个以下形式的txt文件:
Event A 15MAR18 103000 15MAR18 103758
Event A 16MAR18 120518 16MAR18 121308
Event B 16MAR18 121203 16MAR18 124543
Event B 16MAR18 134443 16MAR18 141823
Event B 16MAR18 151733 16MAR18 155103
Event B 17MAR18 165013 17MAR18 172343
Event B 17MAR18 182253 17MAR18 185623
Event B 17MAR18 195533 17MAR18 202903
Event A 17MAR18 203738 17MAR18 204028
Event B 18MAR18 212813 18MAR18 220143
Event A 18MAR18 221058 18MAR18 222338
Event B 18MAR18 230103 18MAR18 233423
Event A 19MAR18 234728 19MAR18 000048
Event B 20MAR18 003343 20MAR18 010703
Event A 20MAR18 012508 20MAR18 013418
Event B 21MAR18 020623 21MAR18 023943
Event B 21MAR18 033903 21MAR18 041223
Event B 21MAR18 051143 21MAR18 054503
Event B 21MAR18 064433 21MAR18 071743
Event A 22MAR18 074058 22MAR18 075008
Event B 22MAR18 081713 22MAR18 085023
Event A 23MAR18 091438 23MAR18 092738
Event B 23MAR18 094953 23MAR18 102303
Event A 23MAR18 105148 23MAR18 110418
我正在尝试根据24小时的时间值与中间列分隔文件。
E.g 15MAR18 103000的第一行将是它自己的单独列表
然后第二行将是不同的列表,因为timedelta是> 24小时。它将从16MAR18 120518到16MAR18 151733组合在一起。等等......
我的尝试如下:
List_Segment_1 = []
with open('file.txt', 'r') as input_file:
input_file = input_file.readlines()
startTime = datetime.strptime(input_file[0][15:29], '%d%b%y %H%M%S')
endTime = startTime + timedelta(hours=24)
for line in input_file:
dates= datetime.strptime(line[15:29], '%d%b%y %H%M%S')
if startTime < dates < endTime:
List_Segment_1.append(line)
我不知道如何为其他行做这件事......只有第一段&#39; ...真正的txt文件中有数百行...也许有用词典分类数据的更好方法是什么?
帮助表示感谢。理想情况下没有熊猫或任何扩展库
输出应如下:
Event A 15MAR18 103000 15MAR18 103758 Segment1
Event A 16MAR18 120518 16MAR18 121308 Segment2
Event B 16MAR18 121203 16MAR18 124543 Segment2
Event B 16MAR18 134443 16MAR18 141823 Segment2
Event B 16MAR18 151733 16MAR18 155103 Segment2
Event B 17MAR18 165013 17MAR18 172343 Segment3
Event B 17MAR18 182253 17MAR18 185623 Segment3
Event B 17MAR18 195533 17MAR18 202903 Segment3
Event A 17MAR18 203738 17MAR18 204028 Segment3
Event B 18MAR18 212813 18MAR18 220143 Segment4
Event A 18MAR18 221058 18MAR18 222338 Segment4
Event B 18MAR18 230103 18MAR18 233423 Segment4
Event A 19MAR18 234728 19MAR18 000048 Segment5
Event B 20MAR18 003343 20MAR18 010703 Segment5
Event A 20MAR18 012508 20MAR18 013418 Segment5
Event B 21MAR18 020623 21MAR18 023943 Segment6
Event B 21MAR18 033903 21MAR18 041223 Segment6
Event B 21MAR18 051143 21MAR18 054503 Segment6
Event B 21MAR18 064433 21MAR18 071743 Segment6
Event A 22MAR18 074058 22MAR18 075008 Segment6
Event B 22MAR18 081713 22MAR18 085023 Segment7
Event A 23MAR18 091438 23MAR18 092738 Segment8
Event B 23MAR18 094953 23MAR18 102303 Segment8
Event A 23MAR18 105148 23MAR18 110418 Segment8
答案 0 :(得分:4)
这是您问题的简单实现,您应该根据需要进行修改:
from datetime import datetime, timedelta
with open('file.txt', 'r') as input_file:
lines = input_file.readlines()
base_time = datetime.strptime(lines[0][14:28], '%d%b%y %H%M%S')
end_time = base_time + timedelta(hours=24)
segment = 1
for line in lines:
date = datetime.strptime(line[14:28], '%d%b%y %H%M%S')
if base_time <= date < end_time:
pass
else:
segment += 1
base_time = date
end_time = date + timedelta(hours=24)
print(line.strip() + '\tSegment {}'.format(segment))
此代码段输出:
Event A 15MAR18 103000 15MAR18 103758 Segment 1
Event A 16MAR18 120518 16MAR18 121308 Segment 2
Event B 16MAR18 121203 16MAR18 124543 Segment 2
Event B 16MAR18 134443 16MAR18 141823 Segment 2
Event B 16MAR18 151733 16MAR18 155103 Segment 2
Event B 17MAR18 165013 17MAR18 172343 Segment 3
Event B 17MAR18 182253 17MAR18 185623 Segment 3
Event B 17MAR18 195533 17MAR18 202903 Segment 3
Event A 17MAR18 203738 17MAR18 204028 Segment 3
Event B 18MAR18 212813 18MAR18 220143 Segment 4
Event A 18MAR18 221058 18MAR18 222338 Segment 4
Event B 18MAR18 230103 18MAR18 233423 Segment 4
Event A 19MAR18 234728 19MAR18 000048 Segment 5
Event B 20MAR18 003343 20MAR18 010703 Segment 5
Event A 20MAR18 012508 20MAR18 013418 Segment 5
Event B 21MAR18 020623 21MAR18 023943 Segment 6
Event B 21MAR18 033903 21MAR18 041223 Segment 6
Event B 21MAR18 051143 21MAR18 054503 Segment 6
Event B 21MAR18 064433 21MAR18 071743 Segment 6
Event A 22MAR18 074058 22MAR18 075008 Segment 7
Event B 22MAR18 081713 22MAR18 085023 Segment 7
Event A 23MAR18 091438 23MAR18 092738 Segment 8
Event B 23MAR18 094953 23MAR18 102303 Segment 8
Event A 23MAR18 105148 23MAR18 110418 Segment 8
答案 1 :(得分:0)
假设天数是01-31(不是1-31)我写了一个基于字符串切片的解决方案。但是你也可以将datetime用于这个逻辑。
from pprint import pprint
with open('file.txt', 'r') as input_file:
input_file = input_file.readlines()
previous_day = 15 # first line of the file
segments = []
day_data = []
for line in input_file:
current_day = int(line[14:16])
if current_day > previous_day:
# new day
segments.append(day_data) # append before starting new list
day_data = []
day_data.append(str(line))
else:
day_data.append(str(line))
pprint(segments)
答案 2 :(得分:0)
相当老式的代码,但工作。输出为字典。
import datetime
mydict = {}
l_num = 1
with open('file.txt', 'r') as input_file:
input_file = input_file.readlines()
for i in range(len(input_file)):
if i == 0:
mydict['Segment ' + str(l_num)] = [input_file[i]]
else:
prevDate = datetime.datetime.strptime(input_file[i-1].split(' ')[1], '%d%b%y %H%M%S')
Date = datetime.datetime.strptime(input_file[i].split(' ')[1], '%d%b%y %H%M%S')
if Date - prevDate > datetime.timedelta(hours = 24):
l_num += 1
mydict['Segment ' + str(l_num)] = []
mydict['Segment ' + str(l_num)].append(input_file[i])
else:
mydict['Segment ' + str(l_num)].append(input_file[i])
刚刚注意到。我正在使用Python2。我不确定它是否能在Python3中正常运行。但我希望它确实如此。