我有一个数字向量values
(熊猫数据帧df
中的一系列数字)。
idx values
0 NaN
1 1
2 2
3 NaN
4 NaN
5 33
6 34
7 90
8 NaN
9 5
10 NaN
11 22
12 70
13 NaN
14 672
15 10
16 73
17 9
18 NaN
19 15
然后我构造了形式的逻辑矩阵
array([[1, 1, 1, ..., 0, 0, 0],
[0, 1, 1, ..., 0, 0, 0],
[0, 0, 1, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 1, 0, 0],
[0, 0, 0, ..., 1, 1, 0],
[0, 0, 0, ..., 1, 1, 1]])
使用从SO上的某个答案中获取的以下代码,不幸的是找不到了。
n=len(df)
k=5
r= n-k+1
mat=np.tile([1]*k+[0]*r, r)[:-r].reshape(r,n)
mat
将具有形状(r,n)
,而df['values']
将具有形状(n,)
。
用mat
中的值填充df['values']
的正确方法是什么?
鉴于上一个示例,我的预期输出将是:
array([[NaN, 1, 2, NaN, ..., 0, 0, 0],
[ 0, 1, 2,NaN,NaN, ..., 0, 0, 0],
[ 0, 0, 2,NaN,NaN,33, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 672, 10, 73, 9, 0, 0],
[0, 0, 0, ..., 10,73, 9, NaN, 0],
[0, 0, 0, ..., 73, 9, NaN, 15]])
关于如何实现这一目标的任何建议?
我尝试使用点积(希望它的行为与在matlab中一样,并复制向量r
次,但不起作用。
答案 0 :(得分:2)
您可以使用numpy.apply_along_axis
和numpy.where
:
#!/usr/bin/env python3
import numpy as np
import pandas as pd
nan = np.nan
df = pd.DataFrame([
nan, 1, 2, nan, nan, 33, 34, 90,
nan, 5, nan, 22, 70, nan, 672,
10, 73, 9, nan, 15],
columns=['values'])
n = len(df)
k = 5
r = n - k + 1
mat = np.tile([1] * k + [0] * r, r)[:-r].reshape(r, n)
mat = np.apply_along_axis(lambda row: np.where(row, df['values'], row), 1, mat)
print(mat)
输出:
[[ nan 1. 2. nan nan 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 1. 2. nan nan 33. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 2. nan nan 33. 34. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. nan nan 33. 34. 90. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. nan 33. 34. 90. nan 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 33. 34. 90. nan 5. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 34. 90. nan 5. nan 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 90. nan 5. nan 22. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. nan 5. nan 22. 70. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 5. nan 22. 70. nan 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. nan 22. 70. nan 672. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 22. 70. nan 672. 10. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 70. nan 672. 10. 73. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. nan 672. 10. 73. 9. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 672. 10. 73. 9. nan 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 10. 73. 9. nan 15.]]
答案 1 :(得分:1)
这是一种方法
ary=np.array([[0,1,1],[1,0,1]])
s=df['values'].values
ary1=ary.ravel().copy().astype('float')
ary1[ary1==1]=np.tile(s,len(ary))[ary1==1]
ary1.reshape(len(ary),-1)
Out[446]:
array([[ 0., 1., 2.],
[nan, 0., 2.]])
数据输入:
df
idx values
0 NaN
1 1
2 2