在python 3.0中从文本文件匹配后查找特定关键字

时间:2015-01-19 13:34:19

标签: python python-3.x text readfile

我有一个文本文件,其中包含数千个类似模式的行 下方。

year date hour:minute:seconds data4 data5 data6 data1:data2:data3:command data_1 value1 started

我想通读文本文件并查找value1已更改为其他任意值的位置,并从行中获取年份,日期和时间元素并计算时间差异

例如。带有关键字“command”,“data_1”和“started”的下一个匹配行在此行

year_2 date_2 hour_2:minute_2:seconds_2 data4 data5 data6 data1:data2:data3:command data_1 value2 started.

我想找到时间上的差异,以分钟或小时为单位。任何指导如何实施将是非常有帮助的。然后以同样的方式寻找另一个数据,如。

   year date hour:minute:seconds data4 data5 data6 data1:data2:data3:command data_2 value2 started 

然后检查下一个类似的匹配

   year date hour:minute:seconds data4 data5 data6 data1:data2:data3:command data_2 value4 started and so on...

这里的值value1,value2,value3这些值并不重要,这里只需要知道它是否从之前的值改变了。

command_active = {}
previous_command = ""
seen = "false"

with open("common_files/total_logs" ,"r") as total_logs:
    lines = f.readlines()
    for i in range(0, len(lines)):
        line = lines[i]
        if ":Command" and "started." in line:

            month = line.split()[0]
            day = line.split()[1]
            time = line.split()[2]

            current_command = re.findall(r"Command (.*?) started",data)
            current_command = str(commands_executed).strip("[""]")
            current_command = str(commands_executed).replace("',","\n")
            current_command = str(commands_executed).replace("'","")    
            only_command = current_command.split()[0]

            next_line = line[i + 1]

            for i in range(next_line, len(lines)):
                if ":Command" and "started."  and only_command in lines[next_line]:

                    month_1 = line.split()[0]
                    day_1 = line.split()[1]
                    time_1 = line.split()[2]    

我只能管理这件事。

必需的输出:

   data_1 value1 : 25 minutes
   data_2 value2 : 10 minutes and so on.....

1 个答案:

答案 0 :(得分:0)

此任务是Python的生成器创建的类型。

首先,我们将制作一个生成器,用您感兴趣的内容抓取每一行:

def get_interesting_line(file,*searches):
    with open(file,'r') as f:
        for line in f:
            if all(search in line for search in searches):
                yield line

# Use it like this:
interesting_lines = get_interesting_line(myfile,':command','started.')
next(interesting_lines)
# Or like this:
for line in get_interesting_line(myfile,':command','started.'):
    print(line)

然后我们将创建另一个生成器,它从任何其他生成器/迭代器返回由项和下一项(或者换句话说,前一项和当前项)组成的元组:

def get_item_pair(iterable):
    previous = next(iterable)
    for current in iterable:
        yield previous, current
        previous = current

# Use it like this:
line_pairs = get_item_pair(get_interesting_line(myfile,':command','started.'))
next(line_pairs)
# Or like this:
for pair in get_item_pair(get_interesting_line(myfile,':command','started.')):
    print(pair)

现在我们可以将它们放在一起并创建另一个生成器,它将根据您的标准吐出由数据,值,前一个时间和当前时间组成的四元组(我们将使用datetime模块来存储作为datetime个对象的时间。

from datetime import datetime
def get_command_file_delta_t_info(file):
    c_lines = get_interesting_line(file,':command','started.')
    line_pairs = get_item_pair(c_lines)
    for line1, line2 in line_pairs:
        value1 = line1.split('command: ')[1].split(' started.')[0]
        value2 = line2.split('command: ')[1].split(' started.')[0]
        if value1 != value2:
            #value changed
            year1 = line1.split(' ',1)[0]
            year2 = line2.split(' ',1)[0]
            #assuming date is MM/DD format:
            month1 = line1.split()[1].split('/')[0]
            month2 = line2.split()[1].split('/')[0]
            day1 = line1.split()[1].split('/')[1]
            day2 = line2.split()[1].split('/')[1]
            #now hours, minutes, seconds:
            hour1 = line1.split()[2].split(':')[0]
            hour2 = line2.split()[2].split(':')[0]
            minute1 = line1.split()[2].split(':')[1]
            minute2 = line2.split()[2].split(':')[1]
            second1 = line1.split()[2].split(':')[2]
            second2 = line2.split()[2].split(':')[2]
            #now create a couple of date time objects for initial and final times
            dt_i = datetime(year1, month1, day1, hour1, minute1, second1)
            dt_f = datetime(year2, month2, day2, hour2, minute2, second2)
            #strip out data
            data = line2.split(':command ')[1].split(" " value2)[0]
            #finally yield a tuple containing the data you want
            yield (data, value2, dt_i, dt_f)

如上所述,此生成器正在以下列格式生成项目:

#create the generator
my_gen = get_command_file_delta_t_info(myfile)
#Get first generated item:
next(my_gen)
#produces:
(data1, value1, dt_i1, dt_f1)
#Now we can get the second item, 3rd, etc:
next(my_gen)
(data2, value2, dt_i2, dt_f2)
#get all the remaining items and do stuff with them:
for item in my_gen:
    do_stuff(item)
    # note that the first item in the for statement is actually the 3rd
    # item that has been generated. the first two were generated in the 
    # lines above. my_gen does not "start over".

如上所述,生成器是可迭代对象,当生成器被迭代时(例如,使用next()或在for语句中),该对象仅前进到生成器中的下一个项目。

现在我们可以用文件数据做任何我们想做的事情,包括创建一个以我们想要的格式输出它的函数:

def print_command_file_delta_t_info(command_file_delta_t_info):
    for info in command_file_delta_t_info:
        #get the time difference
        #time_diff is a datetime.timedelta object
        time_diff = info[3] - info [2]
        #now print the information you want:
        print("{data} {value} : {minutes} minutes".format(data = info[0], value = info[1], minutes = time_diff.total_seconds()/60))