我是Python的新手,希望有人可以帮助我。 我想从多个文件中grep数据,然后将我grep的数据合并到一个日志中。
我的输入文件如下:
输入file1(200MHz)
Cell_a freq_100 50
Cell_a freq_200 6.8
Cell_b freq_100 70
输入file2(100MHz)
Cell_a freq_100 100
Cell_a freq_200 10.5
Cell_b freq_100 60
这是我的预期输出
[cell] [freq] [value_frm_file1] [value_frm_file2] [value_frm_file3] [etc ...]
预期输出示例: -
Cell_a freq_100 50 100#50取自file1,100来自file2
Cell_a freq_200 6.8 10.5
Cell_b freq_100 70 60
我猜最好的方法是存储在Python字典中?你能举个例子或告诉我怎么做吗?这是我的代码,但我只能一次获得一个值,如何将它们相应地组合到它的各自的频率类型?
for i in cmaxFreqList: #this is the list base on it's frq type, IE 200MHz, 100MHz etc
file = path + freqfile
with open (file) as f:
data = f.readlines()
for line in data:
line = line.rstrip('\n')
freqlength = len(line.split())
if freqlength == 3:
searchFreqValue =re.search("(\S+)\s+(\S+)\s+(\S+)",line)
cell = searchFreqValue.group(1)
freq = searchFreqValue.group(2)
value = searchFreqValue.group(3)
print ('cell + ' ' + freq + ' ' + value) #only can get up to printing out one value at a time
感谢您的帮助!
答案 0 :(得分:0)
由于您预期的输出的可读性,我不完全理解这个问题,但是这里有一些提示可以用来迭代参数和值:
用于搜索某种类型的值(即单元格,频率等),您可以使用列表索引方法:
parameters = ['Cell_', 'freq_', 'etc'] #Name of the parameters you are looking for
for parameter in parameters:
for line in data:
new_list = line.split()
position_of_the_value = new_list.index(parameter) + 1
如果你
print(new_list[position_of_the_value])
您获得该行中该参数的值,然后您可以将其存储在列表中
parameter1_list = list()
parameter1_list.append(new_list[position_of_the_value])
最后,构建要打印的字符串
print('Parameter_1 '+ ' '.join(parameter1_list))
这将打印类似
的内容Parameter_1 100 50 200 300
你只需要构造循环来迭代每个参数和每个列表,以便全部打印出来。
答案 1 :(得分:0)
这是一个相对简单的任务,只要你的文件不是非常庞大(即它们的组合数据可以在连接它们时适合工作内存)。您只需要创建一个(cell_name, freq)
地图(您可以使用dict
),然后将匹配值附加到该地图上。完成所有文件后,只需将map->value
元素写入组合输出文件,然后将Bob视为叔叔:
import os
import collections
path = "." # current folder
freq_list = ["100.dat", "200.dat"] # a list of files to concatenate
result = collections.defaultdict(list) # a map to hold a list of our results
for file_name in freq_list: # go through each file name
with open(os.path.join(path, file_name), "r") as f: # open the file
for line in f: # go through it line by line
try:
cell, freq, value = line.split() # split it by whitespace into 3 elements
except ValueError: # invalid line - it didn't have exactly 3 elements
continue # ignore the current line and continue with the next
result[(cell, freq)].append(value) # append the value to our result map
with open(os.path.join(path, "combined.dat"), "w") as f: # open our output file for writing
# Python dictionaries are unsorted (<v3.6), sort the keys when looping through them
for element in sorted(result): # loop through each key in our result map
# write the key (cell name and frequency) separated by space, add space,
# write the values separated by space and finally add a new line:
f.write("{} {}\n".format(" ".join(element), " ".join(result[element])))
从您的代码中不清楚cmaxFreqList
包含哪些内容,但在我的示例中,它(freq_list
)包含实际的文件名 - 您当然可以按照您想要的方式构建输入文件名(只需制作确保os.path.join(path, file_name)
构造有效路径)。例如,如果上面列出的100.dat
包含:
Cell_a freq_100 50 Cell_a freq_200 6.8 Cell_b freq_100 70
和200.dat
包含:
Cell_a freq_100 100 Cell_a freq_200 10.5 Cell_b freq_100 60
“combined.dat”文件最终会显示为:
Cell_a freq_100 50 100 Cell_a freq_200 6.8 10.5 Cell_b freq_100 70 60