我有两个文件矩阵,看起来像这样
File1:
{'key1',g,l,i,o,+: [0.0, 0.0, 0.92, 0.02, 0.01],'key2',g,l,i,o,+: [0.1, 0.2, 0.90,
0.26, 0.10].....'key100',g,l,i,o,+: [0.1, 0.1, 0.29, 0.19, 0.20]}
File2:
{'key1',g,l,i,o,+: [0.0, 0.0, 0.96, 0.06, 0.01],'key2',g,l,i,o,+: [0.0, 0.1, 0.95,
0.26, 0.11].....'key100',g,l,i,o,+: [0.2, 0.0, 0.23, 0.16, 0.21]}
两个文件都有相同的密钥'。我想平均两个文件之间的值,因此结果文件如下所示:
所需的输出文件:
{'key1',g,l,i,o,+: [0.0, 0.0, 0.94, 0.04, 0.01],'key2',g,l,i,o,+: [0.05, 0.15, 0.925,
0.26, 0.105].....'key100',g,l,i,o,+: [0.15, 0.1, 0.29, 0.175, 0.205]}
我已经考虑过我可以编写的python脚本,但由于我对此很陌生,所以欢迎任何快速创意:
import gzip
import numpy as np
inFile1 = gzip.open('/home/file1')
inFile2 = gzip.open('/home/file2')
inFile.next()
for line in inFile:
cols = line.strip().split('\t')
data = cols[6:]
for line in inFile2:
cols = line.strip().split('\t')
data2 = cols[6:]
newdata = (data + data2)/2
答案 0 :(得分:0)
您可以使用正则表达式替换字符串并使其与JSON兼容。然后你可以轻松地将它转换为dict然后只需使用普通的python来分析数据(比较dicts):
import re
import json
s = '''{'key1',g,l,i,o,+: [0.0, 0.0, 0.92, 0.02, 0.01],'key2',g,l,i,o,+: [0.1, 0.2, 0.90,
0.26, 0.10],'key100',g,l,i,o,+: [0.1, 0.1, 0.29, 0.19, 0.20]}'''
s2 = re.sub('\'(key\d+)\',g,l,i,o,\+', r'"\1"', s)
print(s2)
d = json.loads(s2)
print(d)
答案 1 :(得分:0)
问题是你的数据格式,正如Wodin评论的那样:
这种格式是什么?它看起来有点像Python字典,但是 ,g,l,i,o,+对词典没有意义。
我尝试了你的数据,你可以从这段代码中获取提示,帮助:
我试过
File1.txt
{'key1',g,l,i,o,+: [0.0, 0.0, 0.92, 0.02, 0.01],'key2',g,l,i,o,+: [0.1, 0.2, 0.90,0.26, 0.10]}
{'key3',g,l,i,o,+: [0.0, 0.0, 0.98, 0.02, 0.01],'key4',g,l,i,o,+: [0.1, 0.2, 0.90,0.268, 0.10]}
FILE2.TXT:
{'key1',g,l,i,o,+: [0.0, 0.0, 0.96, 0.06, 0.01],'key2',g,l,i,o,+: [0.0, 0.1, 0.95,0.26, 0.11]}
{'key3',g,l,i,o,+: [0.0, 0.0, 0.98, 0.02, 0.01],'key4',g,l,i,o,+: [0.1, 0.2, 0.90,0.268, 0.10]}
代码:
import re
pattern=r"('key\w+',g,l,i,o,\+):\s(\[.+?\])"
with open('File1.txt','r') as f:
for line in f:
average = {}
pr=re.finditer(pattern,line)
for find in pr:
with open('File2','r') as ff:
for line in ff:
for find1 in re.finditer(pattern,line):
if find.group(1)==find1.group(1):
average_part=list(map(lambda x: sum(x) / len(x), list(zip(eval(find.group(2)),eval(find1.group(2))))))
rest_part=find.group(1)
average[rest_part]=average_part
print(average)
输出:
{"'key2',g,l,i,o,+": [0.05, 0.15000000000000002, 0.925, 0.26, 0.10500000000000001], "'key1',g,l,i,o,+": [0.0, 0.0, 0.94, 0.04, 0.01]}
{"'key3',g,l,i,o,+": [0.0, 0.0, 0.98, 0.02, 0.01], "'key4',g,l,i,o,+": [0.1, 0.2, 0.9, 0.268, 0.1]}