嵌套字典中包含的向量的相关性

时间:2019-06-17 07:42:29

标签: python loops dictionary for-loop python-2.x

我有一个下一个结构的嵌套字典:

{Cell_name_1 : {KPI_name_1: [value1, value2, ..., valueN], 
                KPI_name_2: [value1, value2, ..., valueN], 
                ..., 
                KPI_name_N: [value1, value2, ..., valueN]}, 
 Cell_name_2 : {KPI_name_1: [value1, value2, ..., valueN], ...}, 
 Cell_name_N : {....}}

我想检查不同单元格中包含的vectos之间的相关性(我已经定义了此方法,因此它是一个辅助功能)。假设:

vector_1 = [64.0, 66.0, 53.5, 52.1, 54.0] #[values from KPI_name_1 from Cell_name_1]
vector_2 = [84.0, 86.0, 63.5, 72.1, 24.0] #[values from KPI_name_2 from Cell_name_2]

correlation(vector_1, vector_2)

我尝试了各种循环遍历字典的方法(正常的循环,带有while和条件的经典循环等),但是我找不到找到所需内容的方法。

例如,代码是这样的:

dic_sem = {'16895555': {'KPI_name_1': [64.0, 66.0, 53.5, 52.1, 54.0], 
                        'KPI_name_2': [54.0, 56.0, 23.5, 32.1, 84.0]}, 
           '16894444': {'KPI_name_1': [84.0, 86.0, 63.5, 72.1, 24.0], 
                        'KPI_name_2': [24.0, 26.0, 63.5, 92.1, 84.0]}}

'16895555''16894444'是不同的Cell_name's

2 个答案:

答案 0 :(得分:1)

您可以遍历字典并创建单元名称的字典,例如git checkout feature_branch git rebase common_branch 到包含您的向量的列表列表

KPI_name_1

输出将为

from collections import defaultdict

vectors = defaultdict(list)

#Iterate over the values
for value in dic_sem.values():
    #Create your vectors dictionary
    for k, v in value.items():
        vectors[k].append(v)

print(dict(vectors))

然后您可以遍历该词典的值并相应地调用{'KPI_name_1': [[64.0, 66.0, 53.5, 52.1, 54.0], [84.0, 86.0, 63.5, 72.1, 24.0]], 'KPI_name_2': [[54.0, 56.0, 23.5, 32.1, 84.0], [24.0, 26.0, 63.5, 92.1, 84.0]]}

correlation

这里的输出将是

for value in vectors.values():
    print(value[0], value[1])
    #correlation(*value)

答案 1 :(得分:0)

在这里itertools.product可能会有所帮助:


  import itertools
  import numpy as np

  # Get vector names (assuming keys present in all cells)
  field_names = list(dic_sem.values())[0].keys()

  # Precompute all pairs of cells 
  all_cell_pairs = list(itertools.product(dic_sem.keys(), dic_sem.keys()))

  corr = {}
  for field in field_names: 
      corr[field] = np.reshape([correlation(dic_sem[c1][field], dic_sem[c2][field]) for c1, c2 in all_cell_pairs], (len(dic_sem), -1))

请注意,我们在此处进行了两次以上的必要计算:相关矩阵是对称的,因此足以仅计算上三角或下三角(例如,使用itertools.combinations),不包括对角线(等于1)。但是以上应该给出方向。