我如何(有效地为比所提供的示例大得多的矩阵)返回n
最大或最小值的列名和索引(或行名)
import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
matrix = df.corr()
matrix
A B C D
A 1.000000 -0.814913 0.495993 -0.880296
B -0.814913 1.000000 -0.211421 0.551441
C 0.495993 -0.211421 1.000000 -0.414037
D -0.880296 0.551441 -0.414037 1.000000
然后我会做一些像
这样的事情def get_n_smallest(matrix, n):
# can return as two variables, list, tuple, whatever...
return row_name, col_name
get_n_smallest(matrix,0)
# would return D, A for the value -.880296
答案 0 :(得分:1)
我认为您可以Series
用于MultiIndex
,然后按stack
,drop_duplicates
删除重复项,并通过索引index
获取np.random.seed(100)
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
matrix = df.corr()
print (matrix)
A B C D
A 1.000000 0.570860 -0.558334 -0.434793
B 0.570860 1.000000 -0.358834 -0.564178
C -0.558334 -0.358834 1.000000 0.170589
D -0.434793 -0.564178 0.170589 1.000000
print (matrix.stack().drop_duplicates().sort_values())
B D -0.564178
A C -0.558334
D -0.434793
B C -0.358834
C D 0.170589
A B 0.570860
A 1.000000
dtype: float64
def get_n_smallest(matrix, n):
return matrix.stack().drop_duplicates().sort_values().index[n]
print (get_n_smallest(matrix,0))
('B', 'D')
print (get_n_smallest(matrix,1))
('A', 'C')
print (get_n_smallest(matrix,2))
('A', 'D')
值:
def get_n_largest(matrix, n):
return matrix.stack().drop_duplicates().sort_values(ascending=False).index[n]
print (get_n_largest(matrix,0))
('A', 'A')
print (get_n_largest(matrix,1))
('A', 'B')
print (get_n_largest(matrix,2))
('C', 'D')
EditText