Grep来自多个文件的数据然后输入Python中的日志

时间:2017-07-07 07:31:32

标签: python python-2.7 python-3.x

我是Python的新手,希望有人可以帮助我。 我想从多个文件中grep数据,然后将我grep的数据合并到一个日志中。

我的输入文件如下:

输入file1(200MHz)

Cell_a  freq_100  50  
Cell_a  freq_200  6.8  
Cell_b  freq_100  70  

输入file2(100MHz)

Cell_a freq_100 100
   Cell_a freq_200 10.5
   Cell_b freq_100 60

这是我的预期输出

[cell] [freq] [value_frm_file1] [value_frm_file2] [value_frm_file3] [etc ...]

预期输出示例: -

Cell_a freq_100 50 100#50取自file1,100来自file2
   Cell_a freq_200 6.8 10.5
   Cell_b freq_100 70 60

我猜最好的方法是存储在Python字典中?你能举个例子或告诉我怎么做吗?这是我的代码,但我只能一次获得一个值,如何将它们相应地组合到它的各自的频率类型?

for i in cmaxFreqList: #this is the list base on it's frq type, IE 200MHz, 100MHz etc
    file = path + freqfile
    with open (file) as f:
        data = f.readlines()

    for line in data:
        line = line.rstrip('\n')
        freqlength = len(line.split())
        if freqlength == 3:
            searchFreqValue =re.search("(\S+)\s+(\S+)\s+(\S+)",line)
            cell = searchFreqValue.group(1)
            freq = searchFreqValue.group(2)
            value = searchFreqValue.group(3)
            print ('cell + ' ' + freq + ' ' + value)  #only can get up to printing out one value at a time

感谢您的帮助!

2 个答案:

答案 0 :(得分:0)

由于您预期的输出的可读性,我不完全理解这个问题,但是这里有一些提示可以用来迭代参数和值:

用于搜索某种类型的值(即单元格,频率等),您可以使用列表索引方法:

parameters = ['Cell_', 'freq_', 'etc'] #Name of the parameters you are looking for

for parameter in parameters:
    for line in data:
        new_list = line.split() 
        position_of_the_value = new_list.index(parameter) + 1 

如果你

print(new_list[position_of_the_value])

您获得该行中该参数的值,然后您可以将其存储在列表中

parameter1_list = list()
parameter1_list.append(new_list[position_of_the_value])

最后,构建要打印的字符串

print('Parameter_1 '+ ' '.join(parameter1_list))

这将打印类似

的内容
Parameter_1 100 50 200 300

你只需要构造循环来迭代每个参数和每个列表,以便全部打印出来。

答案 1 :(得分:0)

这是一个相对简单的任务,只要你的文件不是非常庞大(即它们的组合数据可以在连接它们时适合工作内存)。您只需要创建一个(cell_name, freq)地图(您可以使用dict),然后将匹配值附加到该地图上。完成所有文件后,只需将map->value元素写入组合输出文件,然后将Bob视为叔叔:

import os
import collections

path = "."  # current folder
freq_list = ["100.dat", "200.dat"]  # a list of files to concatenate

result = collections.defaultdict(list)  # a map to hold a list of our results
for file_name in freq_list:  # go through each file name
    with open(os.path.join(path, file_name), "r") as f:  # open the file
        for line in f:  # go through it line by line
            try:
                cell, freq, value = line.split()  # split it by whitespace into 3 elements
            except ValueError:  # invalid line - it didn't have exactly 3 elements
                continue  # ignore the current line and continue with the next
            result[(cell, freq)].append(value)  # append the value to our result map
with open(os.path.join(path, "combined.dat"), "w") as f:  # open our output file for writing
    # Python dictionaries are unsorted (<v3.6), sort the keys when looping through them
    for element in sorted(result):  # loop through each key in our result map
        # write the key (cell name and frequency) separated by space, add space,
        # write the values separated by space and finally add a new line:
        f.write("{} {}\n".format(" ".join(element), " ".join(result[element])))

从您的代码中不清楚cmaxFreqList包含哪些内容,但在我的示例中,它(freq_list)包含实际的文件名 - 您当然可以按照您想要的方式构建输入文件名(只需制作确保os.path.join(path, file_name)构造有效路径)。例如,如果上面列出的100.dat包含:

Cell_a  freq_100  50
Cell_a  freq_200  6.8
Cell_b  freq_100  70

200.dat包含:

Cell_a freq_100 100
Cell_a freq_200 10.5
Cell_b freq_100 60

“combined.dat”文件最终会显示为:

Cell_a freq_100 50 100
Cell_a freq_200 6.8 10.5
Cell_b freq_100 70 60