我有一个正方形的二维pandas数据帧,我想提取满足以下条件的元素:
我想要打印:提取的值,列标签,行标签。
答案 0 :(得分:2)
首先将0
下面的值替换为NaN
的对角线,然后再重塑stack
:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=list('ABCDE')) - 3
print (df)
A B C D E
0 5 5 0 4 4
1 -3 1 -1 2 -1
2 -1 -1 -2 -3 5
3 1 -3 6 3 -1
4 1 -2 2 0 1
df = df.where(df > 0)
np.fill_diagonal(df.values, np.nan)
df = df.stack().reset_index()
df.columns=['idx','col','val']
print (df)
idx col val
0 0 B 5.0
1 0 D 4.0
2 0 E 4.0
3 1 D 2.0
4 2 E 5.0
5 3 A 1.0
6 3 C 6.0
7 4 A 1.0
8 4 C 2.0
<强>解释强>:
stack
创建MultiIndex Series
:
print (df.stack())
0 B 5.0
D 4.0
E 4.0
1 D 2.0
2 E 5.0
3 A 1.0
C 6.0
4 A 1.0
C 2.0
dtype: float64
然后reset_index()
从MultiIndex
创建列:
print (df.stack().reset_index())
level_0 level_1 0
0 0 B 5.0
1 0 D 4.0
2 0 E 4.0
3 1 D 2.0
4 2 E 5.0
5 3 A 1.0
6 3 C 6.0
7 4 A 1.0
8 4 C 2.0
答案 1 :(得分:2)
您可以使用NumPy执行此操作,方法是将不需要的数字替换为NaN
:
import numpy as np
df = pd.DataFrame(np.random.randint(-5, 6, (5, 5)))
arr = df.values.astype(float)
np.fill_diagonal(arr, np.nan) # exclude diagonal
arr[arr <= 0] = np.nan # filter for > 0
print(arr)
[[nan 2. 4. nan nan]
[nan nan nan nan 3.]
[nan nan nan nan nan]
[ 2. nan 3. nan 4.]
[nan 4. 1. nan nan]]
nan_filter = ~np.isnan(arr)
# aggregate indices with values
res = np.hstack((np.argwhere(nan_filter), arr[nan_filter][:, None]))
print(res)
[[0. 1. 2.]
[0. 2. 4.]
[1. 4. 3.]
[3. 0. 2.]
[3. 2. 3.]
[3. 4. 4.]
[4. 1. 4.]
[4. 2. 1.]]