从Pandas Dataframe获取某些元素

时间:2018-06-10 15:25:53

标签: python pandas dataframe

我有一个正方形的二维pandas数据帧,我想提取满足以下条件的元素:

  • 他们必须大于零
  • 他们必须离开主对角线

我想要打印:提取的值,列标签,行标签。

2 个答案:

答案 0 :(得分:2)

首先将0下面的值替换为NaN的对角线,然后再重塑stack

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=list('ABCDE')) - 3
print (df)
   A  B  C  D  E
0  5  5  0  4  4
1 -3  1 -1  2 -1
2 -1 -1 -2 -3  5
3  1 -3  6  3 -1
4  1 -2  2  0  1

df = df.where(df > 0)
np.fill_diagonal(df.values, np.nan)

df = df.stack().reset_index()
df.columns=['idx','col','val']
print (df)
   idx col  val
0    0   B  5.0
1    0   D  4.0
2    0   E  4.0
3    1   D  2.0
4    2   E  5.0
5    3   A  1.0
6    3   C  6.0
7    4   A  1.0
8    4   C  2.0

<强>解释

stack创建MultiIndex Series

print (df.stack())
0  B    5.0
   D    4.0
   E    4.0
1  D    2.0
2  E    5.0
3  A    1.0
   C    6.0
4  A    1.0
   C    2.0
dtype: float64

然后reset_index()MultiIndex创建列:

print (df.stack().reset_index())

   level_0 level_1    0
0        0       B  5.0
1        0       D  4.0
2        0       E  4.0
3        1       D  2.0
4        2       E  5.0
5        3       A  1.0
6        3       C  6.0
7        4       A  1.0
8        4       C  2.0

答案 1 :(得分:2)

您可以使用NumPy执行此操作,方法是将不需要的数字替换为NaN

import numpy as np

df = pd.DataFrame(np.random.randint(-5, 6, (5, 5)))
arr = df.values.astype(float)

np.fill_diagonal(arr, np.nan)  # exclude diagonal
arr[arr <= 0] = np.nan         # filter for > 0

print(arr)

[[nan  2.  4. nan nan]
 [nan nan nan nan  3.]
 [nan nan nan nan nan]
 [ 2. nan  3. nan  4.]
 [nan  4.  1. nan nan]]

nan_filter = ~np.isnan(arr)

# aggregate indices with values
res = np.hstack((np.argwhere(nan_filter), arr[nan_filter][:, None]))

print(res)

[[0. 1. 2.]
 [0. 2. 4.]
 [1. 4. 3.]
 [3. 0. 2.]
 [3. 2. 3.]
 [3. 4. 4.]
 [4. 1. 4.]
 [4. 2. 1.]]