我的输入数据是一个dicts列表([
{ 'r1': record_1, 'r2': record_2, corr: 85, 'r1_source': source_1, 'r2_source': source_2 },
{ 'r1': record_1, 'r2': record_3, corr: 90, 'r1_source': source_1, 'r2_source': source_3 },
{ 'r1': record_2, 'r2': record_3, corr: 77, 'r1_source': source_2, 'r2_source': source_3 },
...
]
),其中每个dict有两个可能的位置供记录显示,以及两者之间的相关因素及其各自的数据源:
record
每个records
都由一个列表表示,该列表来自唯一record
的有限列表。
我想要的输出数据的结构是一个dicts列表,其中每个唯一[
{ 'record': record_1, 'source': source_1, 'avg': (85 + 90) / 2 },
{ 'record': record_2, 'source': source_2, 'avg': (85 + 77) / 2 },
{ 'record': record_3, 'source': source_3, 'avg': (90 + 77) / 2 },
]
都有自己,它的来源和它的平均相关因子:
def average_record_from_match_value(matches):
averaged_recs = []
for match in matches:
# Q1
if [rec for rec in averaged_recs if rec['record'] == match['r1']] == []:
a_recs = []
# Q2
a_recs.extend([m['corr'] for m in matches if m['r1'] == match['r1']])
a_recs.extend([m['corr'] for m in matches if m['r2'] == match['r1']])
# Q3
r1_value = sum(a_recs) / len(a_recs)
averaged_recs.append({ 'record': match['r1'],
'source': match['r1_source'],
'match_value': r1_value,
'record_value': r1_value})
if [rec for rec in averaged_recs if rec['record'] == match['r2']] == []:
b_recs = []
b_recs.extend([m['corr'] for m in matches if m['r1'] == match['r2']])
b_recs.extend([m['corr'] for m in matches if m['r2'] == match['r2']])
r2_value = sum(b_recs) / len(b_recs)
averaged_recs.append({ 'record': match['r2'],
'source': match['r2_source'],
'match_value': r2_value,
'record_value': r2_value})
return averaged_recs
我目前的解决方案:
averaged_recs
这有效,但我确信它可以改进。上述评论标注的问题是:
records
列表
每场比赛。 感谢您的帮助!
答案 0 :(得分:1)
lst = [
{ 'r1': 'record_1', 'r2': 'record_2', 'corr': 85, 'r1_source': 'source_1', 'r2_source': 'source_2' },
{ 'r1': 'record_1', 'r2': 'record_3', 'corr': 90, 'r1_source': 'source_1', 'r2_source': 'source_3' },
{ 'r1': 'record_2', 'r2': 'record_3', 'corr': 77, 'r1_source': 'source_2', 'r2_source': 'source_3' },
]
tmp_dict = {}
for d in lst:
if d['r1'] not in tmp_dict.keys():
tmp_dict[d['r1']] = {}
tmp_dict[d['r1']]['corr'] = list()
tmp_dict[d['r1']]['source'] = d['r1_source']
if d['r2'] not in tmp_dict.keys():
tmp_dict[d['r2']] = {}
tmp_dict[d['r2']]['corr'] = list()
tmp_dict[d['r2']]['source'] = d['r2_source']
tmp_dict[d['r1']]['corr'].append(d['corr'])
tmp_dict[d['r2']]['corr'].append(d['corr'])
print [{ 'record': k, 'source': tmp_dict[k]['source'], 'avg': sum(tmp_dict[k]['corr'])/float(len(tmp_dict[k]['corr'])) } for k in tmp_dict.keys()]
答案 1 :(得分:1)
我的想法, 我们可以循环列表为所有r1,r2生成一个dict,如果是r1,将它附加到列表的头部,如果是r2,则将它添加到尾部。
然后循环此dict以获得您期望的输出。
from collections import defaultdict
test = [
{ 'r1': 'record_1', 'r2': 'record_2', 'corr': 85, 'r1_source': 'source_1', 'r2_source': 'source_2' },
{ 'r1': 'record_1', 'r2': 'record_3', 'corr': 90, 'r1_source': 'source_1', 'r2_source': 'source_3' },
{ 'r1': 'record_2', 'r2': 'record_3', 'corr': 77, 'r1_source': 'source_2', 'r2_source': 'source_3' },
]
temp = defaultdict(list)
for item in test:
temp[item['r1']].insert(0, item)
temp[item['r2']].append(item)
result = []
for key, value in temp.items():
new_item = {}
new_item['avg'] = sum(list(map(lambda item: item['corr'], value)))*1.0/len(value)
new_item['record'] = key
new_item['source'] = value[0]['r1_source'] if key == value[0]['r1'] else value[0]['r2_source']
result.append(new_item)
print(result)
输出:
[{'avg': 87.5, 'record': 'record_1', 'source': 'source_1'}, {'avg': 81.0, 'record': 'record_2', 'source': 'source_2'}, {'avg': 83.5, 'record': 'record_3', 'source': 'source_3'}]
[Finished in 0.175s]
更新1:
如果r1和r2是列表,我们可以将其转换为元组,然后在计算输出时将其转换回来。
所以代码就像:
from collections import defaultdict
record1 = [1, 2, 3]
record2 = [4, 5, 6]
record3 = [7, 8, 9]
test = [
{ 'r1': record1, 'r2': record2, 'corr': 85, 'r1_source': 'source_1', 'r2_source': 'source_2' },
{ 'r1': record1, 'r2': record3, 'corr': 90, 'r1_source': 'source_1', 'r2_source': 'source_3' },
{ 'r1': record2, 'r2': record3, 'corr': 77, 'r1_source': 'source_2', 'r2_source': 'source_3' },
]
temp = defaultdict(list)
for item in test:
temp[tuple(item['r1'])].insert(0, item)
temp[tuple(item['r2'])].append(item)
result = []
for key, value in temp.items():
new_item = {}
new_item['avg'] = sum(list(map(lambda item: item['corr'], value)))*1.0/len(value)
new_item['record'] = list(key)
new_item['source'] = value[0]['r1_source'] if list(key) == value[0]['r1'] else value[0]['r2_source']
result.append(new_item)
print(result)
<强>输出:强>
[{'avg': 87.5, 'record': [1, 2, 3], 'source': 'source_3'}, {'avg': 81.0, 'record': [4, 5, 6], 'source': 'source_3'}, {'avg': 83.5, 'record': [7, 8, 9], 'source': 'source_3'}]
[Finished in 0.178s]
答案 2 :(得分:0)
Q1 - 词典本质上是由独特元素制作的,所以我不相信你需要以这种方式重新检查它。你也在迭代平均的recs,这是空的。
Q2 - 您可以在if语句中使用或
[m['corr'] for m in matches if m['r1'] == match['r1'] or m['r2'] == match['r1']]