Question

我是Python的新手，希望有人可以帮助我。我想从多个文件中grep数据，然后将我grep的数据合并到一个日志中。

我的输入文件如下：

输入file1（200MHz）

Cell_a  freq_100  50  
Cell_a  freq_200  6.8  
Cell_b  freq_100  70

输入file2（100MHz）

Cell_a freq_100 100
Cell_a freq_200 10.5
Cell_b freq_100 60

这是我的预期输出

[cell] [freq] [value_frm_file1] [value_frm_file2] [value_frm_file3] [etc ...]

预期输出示例： -

Cell_a freq_100 50 100＃50取自file1,100来自file2
Cell_a freq_200 6.8 10.5
Cell_b freq_100 70 60

我猜最好的方法是存储在Python字典中？你能举个例子或告诉我怎么做吗？这是我的代码，但我只能一次获得一个值，如何将它们相应地组合到它的各自的频率类型？

for i in cmaxFreqList: #this is the list base on it's frq type, IE 200MHz, 100MHz etc
    file = path + freqfile
    with open (file) as f:
        data = f.readlines()

    for line in data:
        line = line.rstrip('\n')
        freqlength = len(line.split())
        if freqlength == 3:
            searchFreqValue =re.search("(\S+)\s+(\S+)\s+(\S+)",line)
            cell = searchFreqValue.group(1)
            freq = searchFreqValue.group(2)
            value = searchFreqValue.group(3)
            print ('cell + ' ' + freq + ' ' + value)  #only can get up to printing out one value at a time

感谢您的帮助！

Answer 1

由于您预期的输出的可读性，我不完全理解这个问题，但是这里有一些提示可以用来迭代参数和值：

用于搜索某种类型的值（即单元格，频率等），您可以使用列表索引方法：

parameters = ['Cell_', 'freq_', 'etc'] #Name of the parameters you are looking for

for parameter in parameters:
    for line in data:
        new_list = line.split() 
        position_of_the_value = new_list.index(parameter) + 1

如果你

print(new_list[position_of_the_value])

您获得该行中该参数的值，然后您可以将其存储在列表中

parameter1_list = list()
parameter1_list.append(new_list[position_of_the_value])

最后，构建要打印的字符串

print('Parameter_1 '+ ' '.join(parameter1_list))

这将打印类似

的内容

Parameter_1 100 50 200 300

你只需要构造循环来迭代每个参数和每个列表，以便全部打印出来。

Answer 2

这是一个相对简单的任务，只要你的文件不是非常庞大（即它们的组合数据可以在连接它们时适合工作内存）。您只需要创建一个(cell_name, freq)地图（您可以使用dict），然后将匹配值附加到该地图上。完成所有文件后，只需将map->value元素写入组合输出文件，然后将Bob视为叔叔：

import os
import collections

path = "."  # current folder
freq_list = ["100.dat", "200.dat"]  # a list of files to concatenate

result = collections.defaultdict(list)  # a map to hold a list of our results
for file_name in freq_list:  # go through each file name
    with open(os.path.join(path, file_name), "r") as f:  # open the file
        for line in f:  # go through it line by line
            try:
                cell, freq, value = line.split()  # split it by whitespace into 3 elements
            except ValueError:  # invalid line - it didn't have exactly 3 elements
                continue  # ignore the current line and continue with the next
            result[(cell, freq)].append(value)  # append the value to our result map
with open(os.path.join(path, "combined.dat"), "w") as f:  # open our output file for writing
    # Python dictionaries are unsorted (<v3.6), sort the keys when looping through them
    for element in sorted(result):  # loop through each key in our result map
        # write the key (cell name and frequency) separated by space, add space,
        # write the values separated by space and finally add a new line:
        f.write("{} {}\n".format(" ".join(element), " ".join(result[element])))

从您的代码中不清楚cmaxFreqList包含哪些内容，但在我的示例中，它（freq_list）包含实际的文件名 - 您当然可以按照您想要的方式构建输入文件名（只需制作确保os.path.join(path, file_name)构造有效路径）。例如，如果上面列出的100.dat包含：

Cell_a  freq_100  50
Cell_a  freq_200  6.8
Cell_b  freq_100  70

和200.dat包含：

Cell_a freq_100 100
Cell_a freq_200 10.5
Cell_b freq_100 60

“combined.dat”文件最终会显示为：

Cell_a freq_100 50 100
Cell_a freq_200 6.8 10.5
Cell_b freq_100 70 60

Grep来自多个文件的数据然后输入Python中的日志

2 个答案: