Question

我使用以下代码迭代数据帧的行

以下是样本数据集：

device_id   s2  s41 s47 s14 s24 s36 s4  s23 s10
3           0   0   0   0.002507676 0   0   0   0   0
5           0   0   0   0   0   0   0   0   0
23          0   0   0   0   0   0   0   0   0
42          0   0   0   0   0   0   0   0   0
61          0   0   0   0   0   0   0   0   0
49          0   0   0   0   0   0   0   0   7.564063476
54          0   0   0   0   0   0   0   0.001098988 0

并对每行的前3个值进行排序。

for index, row in df.iterrows():

    row_sorted = row.sort_values(ascending=False)
    print (index,row_sorted)

这是一个示例输出

123 s16    1.054018
    s17    0.000000
    s26    0.000000

我也试过以下代码：

top_n = 3
    pd.DataFrame({n: df.T[col].nlargest(top_n).index.tolist() 
                  for n, col in enumerate(df.T)}).T

一次完成所有操作，但这是输出：

49 s16 s1 s37 - 49是这里的行号。

如您所见，输出不匹配，第一个输出正确。

我正在寻找的是一个最终字典，它包含索引作为键，前3列作为值：

{123 : 's16','s17','s26'}

这些将在下一行使用，以迭代另一个具有以下结构的字典to_map： ID": ["s26", "International", "E", "B_TV"]我将从中选择＆＃34; E＆＃34;和＆＃34; B_TV＆＃34;

Answer 1

尝试这种矢量化方法：

样本DF：

In [80]: df = pd.DataFrame(np.random.randint(10, size=(5,7)), columns=['id']+list('abcdef'))
    ...: df = df.set_index('id')
    ...:

In [81]: df
Out[81]:
    a  b  c  d  e  f
id
4   4  0  8  8  4  8
0   2  4  7  3  1  4
9   3  6  5  7  3  4
5   7  6  3  8  9  1
6   3  7  6  1  7  9

解决方案：

In [82]: idx = np.argsort(df.values, axis=1)[:, ::-1][:, :3]

In [83]: pd.DataFrame(np.take(df.columns, idx), index=df.index).T.to_dict('l')
Out[83]:
{0: ['c', 'f', 'b'],
 4: ['f', 'd', 'c'],
 5: ['e', 'd', 'a'],
 6: ['f', 'e', 'b'],
 9: ['d', 'b', 'c']}

PS用[:, :3]

替换[:, :top_n]

选择每行的前三列，并将结果与索引一起保存在python中的字典中

1 个答案:

选择每行的前三列，并将结果与​​索引一起保存在python中的字典中

1 个答案:

选择每行的前三列，并将结果与索引一起保存在python中的字典中