Question

假设我有一个放在DataFrame中的750x750矩阵，比如df。

a = df.values
a.sort(axis=1)
sorted_table = a[:,-4::]
b = a[:,::-1]

我想找出每行包含4个最高值的列，我可以通过以下方式轻松完成：

[[ 98.      29.      15.      10.]
 [ 93.      91.      75.      60.]
 [ 48.      21.      17.      10.]
.
.
.
...]

但是，我得到的结果只是一个列表，没有索引和列名。

 df=

c1      c512    c20    c57     c310 
c2      c317    c133   c584    c80
c3      c499    c289   c703    c100
.       .    .    .   ...    .
.       .    .    .   ...    .
.       .    .    .   ...    .
c750    c89    c31    c546     c107

如果我想知道哪个列名是引用的排序值，我该怎么办？

我想展示：

  c512 is referring  to 98

  c20 is referring to 29

  c57 is referring to 15

and so and so.

其中

{{1}}

Answer 1

我怀疑这是最好的答案，但我认为它有效。我讨厌在熊猫中使用for循环，但我想不到大熊猫的方法。

import pandas as pd
import numpy as np

#array_size = 10

#--- Generate Data and create toy Dataframe ---
array_size = 750
np.random.seed(1)
data = np.random.randint(0, 1000000, array_size**2)
data = data.reshape((array_size, array_size))
df = pd.DataFrame(data, columns=['c'+str(i) for i in range(1, (array_size)+1)])
df.index = df.columns

#--- Transpose the dataframe to more familiarly sort by columns instead of rows ---
df = df.T

#--- Rank values in dataframe using max method where highest value is rank 1 ---
df = df.rank(method='max', ascending=False)

#--- Create empty dataframe to put data into ---
new_df = pd.DataFrame()

#--- For loop for each column to get top ranks less than 5, sort them, reset index, drop i column
for i in df.columns:
  s = df[i][df[i] < 5].sort_values().reset_index().drop(i, axis=1)
  new_df = pd.concat([new_df, s.T])

#--- The new_df index will say 'index', this reassigns the transposed column names to new_df's index
new_df.index = df.columns
print(new_df)

输出：

         0     1     2     3
c1    c479  c545  c614  c220
c2    c249  c535  c231  c680
c3    c657  c603  c137  c740
c4    c674  c424  c426  c127
...    ...   ...   ...   ...
c747  c251  c536  c321  c296
c748   c55  c383  c437  c103
c749  c138  c495  c299  c295
c750  c178  c556  c491  c445

如何使用数据框中的值获取列名？

1 个答案: