已更新

Question

我在corr_matrix pandas之后有一个df.corr，是：

id | 1   2   3   4   5
---|-------------------
1  | 1  .8  .2  .5 -.1
2  |.1   1 -.4 -.1  .8
.....other ids corr

现在我要获得的是dataframe

1 | 2  3  4  5
--|-----------
1 | 2 ....other
2 | 5
4 | 1
3 | 
  |

考虑第一列：

1 | 
--|
1 | 
2 |
4 |
3 |

这里列名1是ID，其数据是按排序顺序（从高到低）与其最相关的ID，即它与自身相关，因此顶部1和5负相关，所以不是。包括了

我正在做的是：

df = pd.DataFrame(corr_matrix.unstack())
df.sort_values(by=["id", 0], ascending=[True, False], inplace=True)
df.reset_index(inplace=True)
df.rename({"level_0": "alternative", 0: "corr"}, 
           axis='columns', inplace=True)
df = df[~(df['corr'] < 0)]

这给了我

 alternative id corr
0   3        3  1.000000
1   291      3  0.777778
2   171      3  0.654654
3   567      3  0.654654
4   561      3  0.554826
5   176      3  0.518476
6   579      3  0.518476

然后我做了：

corr_dict= a.groupby('id')
.apply(lambda x: dict(zip(x.alternative, x.corr))).to_dict()

这给我的错误是：

TypeError: zip argument #2 must support iteration

我正在考虑将其转换为dictionary，然后通过使用以下方法返回到dataframe：

from pandas.io.json import json_normalize
new_df = json_normalize(corr_dict)

由于我是pandas的新手，有没有一种简便的方法，实际上ID是产品ID ，我正在尝试查找与每个产品相关的所有产品产品以降序正相关。我以后必须在某处使用它。

已更新

尽管这对我有用：

corr_dict= a.groupby('product_id')
my_dict = {}
for product_id, product_id_alternatives_df in corr_dict:
    my_dict[product_id] = list(product_id_alternatives_df.alternative)

我想知道是否有更简单的方法

更新2

ValueError: arrays must all be same length

由于数组的长度不同，因此出现Value错误，如何跳过那些？因为相关为负而不是ID

获取pandas corr矩阵，每列最相关的ids TypeError

已更新

更新2

0 个答案: