我有一个文本文件,其中包含数千个类似模式的行 下方。
year date hour:minute:seconds data4 data5 data6 data1:data2:data3:command data_1 value1 started
我想通读文本文件并查找value1已更改为其他任意值的位置,并从行中获取年份,日期和时间元素并计算时间差异
例如。带有关键字“command”,“data_1”和“started”的下一个匹配行在此行
year_2 date_2 hour_2:minute_2:seconds_2 data4 data5 data6 data1:data2:data3:command data_1 value2 started.
我想找到时间上的差异,以分钟或小时为单位。任何指导如何实施将是非常有帮助的。然后以同样的方式寻找另一个数据,如。
year date hour:minute:seconds data4 data5 data6 data1:data2:data3:command data_2 value2 started
然后检查下一个类似的匹配
year date hour:minute:seconds data4 data5 data6 data1:data2:data3:command data_2 value4 started and so on...
这里的值value1,value2,value3这些值并不重要,这里只需要知道它是否从之前的值改变了。
command_active = {}
previous_command = ""
seen = "false"
with open("common_files/total_logs" ,"r") as total_logs:
lines = f.readlines()
for i in range(0, len(lines)):
line = lines[i]
if ":Command" and "started." in line:
month = line.split()[0]
day = line.split()[1]
time = line.split()[2]
current_command = re.findall(r"Command (.*?) started",data)
current_command = str(commands_executed).strip("[""]")
current_command = str(commands_executed).replace("',","\n")
current_command = str(commands_executed).replace("'","")
only_command = current_command.split()[0]
next_line = line[i + 1]
for i in range(next_line, len(lines)):
if ":Command" and "started." and only_command in lines[next_line]:
month_1 = line.split()[0]
day_1 = line.split()[1]
time_1 = line.split()[2]
我只能管理这件事。
必需的输出:
data_1 value1 : 25 minutes
data_2 value2 : 10 minutes and so on.....
答案 0 :(得分:0)
此任务是Python的生成器创建的类型。
首先,我们将制作一个生成器,用您感兴趣的内容抓取每一行:
def get_interesting_line(file,*searches):
with open(file,'r') as f:
for line in f:
if all(search in line for search in searches):
yield line
# Use it like this:
interesting_lines = get_interesting_line(myfile,':command','started.')
next(interesting_lines)
# Or like this:
for line in get_interesting_line(myfile,':command','started.'):
print(line)
然后我们将创建另一个生成器,它从任何其他生成器/迭代器返回由项和下一项(或者换句话说,前一项和当前项)组成的元组:
def get_item_pair(iterable):
previous = next(iterable)
for current in iterable:
yield previous, current
previous = current
# Use it like this:
line_pairs = get_item_pair(get_interesting_line(myfile,':command','started.'))
next(line_pairs)
# Or like this:
for pair in get_item_pair(get_interesting_line(myfile,':command','started.')):
print(pair)
现在我们可以将它们放在一起并创建另一个生成器,它将根据您的标准吐出由数据,值,前一个时间和当前时间组成的四元组(我们将使用datetime
模块来存储作为datetime
个对象的时间。
from datetime import datetime
def get_command_file_delta_t_info(file):
c_lines = get_interesting_line(file,':command','started.')
line_pairs = get_item_pair(c_lines)
for line1, line2 in line_pairs:
value1 = line1.split('command: ')[1].split(' started.')[0]
value2 = line2.split('command: ')[1].split(' started.')[0]
if value1 != value2:
#value changed
year1 = line1.split(' ',1)[0]
year2 = line2.split(' ',1)[0]
#assuming date is MM/DD format:
month1 = line1.split()[1].split('/')[0]
month2 = line2.split()[1].split('/')[0]
day1 = line1.split()[1].split('/')[1]
day2 = line2.split()[1].split('/')[1]
#now hours, minutes, seconds:
hour1 = line1.split()[2].split(':')[0]
hour2 = line2.split()[2].split(':')[0]
minute1 = line1.split()[2].split(':')[1]
minute2 = line2.split()[2].split(':')[1]
second1 = line1.split()[2].split(':')[2]
second2 = line2.split()[2].split(':')[2]
#now create a couple of date time objects for initial and final times
dt_i = datetime(year1, month1, day1, hour1, minute1, second1)
dt_f = datetime(year2, month2, day2, hour2, minute2, second2)
#strip out data
data = line2.split(':command ')[1].split(" " value2)[0]
#finally yield a tuple containing the data you want
yield (data, value2, dt_i, dt_f)
如上所述,此生成器正在以下列格式生成项目:
#create the generator
my_gen = get_command_file_delta_t_info(myfile)
#Get first generated item:
next(my_gen)
#produces:
(data1, value1, dt_i1, dt_f1)
#Now we can get the second item, 3rd, etc:
next(my_gen)
(data2, value2, dt_i2, dt_f2)
#get all the remaining items and do stuff with them:
for item in my_gen:
do_stuff(item)
# note that the first item in the for statement is actually the 3rd
# item that has been generated. the first two were generated in the
# lines above. my_gen does not "start over".
如上所述,生成器是可迭代对象,当生成器被迭代时(例如,使用next()
或在for
语句中),该对象仅前进到生成器中的下一个项目。
现在我们可以用文件数据做任何我们想做的事情,包括创建一个以我们想要的格式输出它的函数:
def print_command_file_delta_t_info(command_file_delta_t_info):
for info in command_file_delta_t_info:
#get the time difference
#time_diff is a datetime.timedelta object
time_diff = info[3] - info [2]
#now print the information you want:
print("{data} {value} : {minutes} minutes".format(data = info[0], value = info[1], minutes = time_diff.total_seconds()/60))