返回Pandas数据系列中第n个最大值的索引和列名

时间:2017-02-10 18:04:23

标签: python pandas

我如何(有效地为比所提供的示例大得多的矩阵)返回n最大或最小值的列名和索引(或行名)

import pandas as pd
import numpy as np

dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
matrix = df.corr()
matrix
          A         B         C         D
A  1.000000 -0.814913  0.495993 -0.880296
B -0.814913  1.000000 -0.211421  0.551441
C  0.495993 -0.211421  1.000000 -0.414037
D -0.880296  0.551441 -0.414037  1.000000

然后我会做一些像

这样的事情
def get_n_smallest(matrix, n):
    # can return as two variables, list, tuple, whatever...
    return row_name, col_name

get_n_smallest(matrix,0)
# would return D, A for the value -.880296

1 个答案:

答案 0 :(得分:1)

我认为您可以Series用于MultiIndex,然后按stackdrop_duplicates删除重复项,并通过索引index获取np.random.seed(100) dates = pd.date_range('20130101', periods=6) df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD')) matrix = df.corr() print (matrix) A B C D A 1.000000 0.570860 -0.558334 -0.434793 B 0.570860 1.000000 -0.358834 -0.564178 C -0.558334 -0.358834 1.000000 0.170589 D -0.434793 -0.564178 0.170589 1.000000 print (matrix.stack().drop_duplicates().sort_values()) B D -0.564178 A C -0.558334 D -0.434793 B C -0.358834 C D 0.170589 A B 0.570860 A 1.000000 dtype: float64 def get_n_smallest(matrix, n): return matrix.stack().drop_duplicates().sort_values().index[n] print (get_n_smallest(matrix,0)) ('B', 'D') print (get_n_smallest(matrix,1)) ('A', 'C') print (get_n_smallest(matrix,2)) ('A', 'D') 值:

def get_n_largest(matrix, n):
    return matrix.stack().drop_duplicates().sort_values(ascending=False).index[n]


print (get_n_largest(matrix,0))
('A', 'A')

print (get_n_largest(matrix,1))
('A', 'B')

print (get_n_largest(matrix,2))
('C', 'D')
EditText