提供数据框
col1 col2 col3
0 1 4 7
1 2 5 8
2 3 6 9
如何获得这样的东西:
0 1 2
0 1.0 2.0 3.0
1 5.0 4.0 7.0
2 9.0 6.0 NaN
3 NaN 8.0 NaN
如果我们将数据框视为索引i, j
的数组,则诊断n将是其中 abs (i-j) = n
加号可以选择顺序:
intercale = True,first_diag ='left'
0 1 2
0 1.0 2.0 3.0
1 5.0 4.0 7.0
2 9.0 6.0 NaN
3 NaN 8.0 NaN
intercalate = False,first_diag ='left'
0 1 2
0 1.0 2.0 3.0
1 5.0 6.0 7.0
2 9.0 4.0 NaN
3 NaN 8.0 NaN
intercalate = True,first_diag =“正确”
0 1 2
0 1.0 4.0 7.0
1 5.0 2.0 3.0
2 9.0 8.0 NaN
3 NaN 6.0 NaN
intercalate = False,first_diag ='right'
0 1 2
0 1.0 4.0 7.0
1 5.0 8.0 3.0
2 9.0 2.0 NaN
3 NaN 6.0 NaN
甚至可以通过选择从下角到上角的方向或相反的方向来进行分类的另一种自由度。或选择其他主要对角线
我对待大熊猫的方法
df2 = df.reset_index().melt('index').assign(variable = lambda x: x.variable.factorize()[0])
df2['diag'] = df2['index'].sub(df2['variable']).abs()
new_df = (df2.assign(index = df2.groupby('diag').cumcount())
.pivot_table(index = 'index',columns = 'diag',values = 'value'))
print(new_df)
diag 0 1 2
index
0 1.0 2.0 3.0
1 5.0 4.0 7.0
2 9.0 6.0 NaN
3 NaN 8.0 NaN
我想知道是否有任何更简单的方法,例如使用numpy
答案 0 :(得分:7)
方法1::这是使用NumPy的一种方法-
def diagonalize(a): # input is array and output is df
n = len(a)
r = np.arange(n)
idx = np.abs(r[:,None]-r)
lens = np.r_[n,np.arange(2*n-2,0,-2)]
split_idx = lens.cumsum()
b = a.flat[idx.ravel().argsort()]
v = np.split(b,split_idx[:-1])
return pd.DataFrame(v).T
样品运行-
In [110]: df
Out[110]:
col1 col2 col3 col4
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
3 13 14 15 16
In [111]: diagonalize(df.to_numpy(copy=False))
Out[111]:
0 1 2 3
0 1.0 2.0 3.0 4.0
1 6.0 5.0 8.0 13.0
2 11.0 7.0 9.0 NaN
3 16.0 10.0 14.0 NaN
4 NaN 12.0 NaN NaN
5 NaN 15.0 NaN NaN
方法2:与早期类似,但完全基于NumPy并且没有循环-
def diagonalize_v2(a): # input, outputs are arrays
# Setup params
n = len(a)
r = np.arange(n)
# Get indices based on "diagonalization" (distance off diagonal)
idx = np.abs(r[:,None]-r)
lens = np.r_[n,np.arange(2*n-2,0,-2)]
# Values in the order of "diagonalization"
b = a.flat[idx.ravel().argsort()]
# Get a mask for the final o/p where elements are to be assigned
mask = np.arange(lens.max())[:,None]<lens
# Setup o/p and assign
out = np.full(mask.shape,np.nan)
out.T[mask.T] = b
return out
样品运行-
In [2]: a
Out[2]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])
In [3]: diagonalize_v2(a)
Out[3]:
array([[ 1., 2., 3., 4.],
[ 6., 5., 8., 13.],
[11., 7., 9., nan],
[16., 10., 14., nan],
[nan, 12., nan, nan],
[nan, 15., nan, nan]])
我们还有两个额外的输入参数来管理订单。该解决方案将是主要受Approach #1
-
def diagonalize_generic(a, intercale = True ,first_diag = 'left'):
# Setup params
n = len(a)
r = np.arange(n)
# Get indices based on "diagonalization" (distance off diagonal)
idx = np.abs(r[:,None]-r)
lens = np.r_[n,np.arange(2*n-2,0,-2)]
if first_diag=='left':
w = np.triu(np.ones(n, dtype=int))
elif first_diag=='right':
w = np.tril(np.ones(n, dtype=int))
else:
raise Exception('Wrong first_diag value!')
order = np.lexsort(np.c_[w.ravel(),idx.ravel()].T)
split_idx = lens.cumsum()
o_split = np.split(order,split_idx[:-1])
f = a.flat
if intercale==1:
v = [f[o_split[0]]] + [f[o.reshape(2,-1).ravel('F')] for o in o_split[1:]]
else:
v = [f[o] for o in o_split]
return pd.DataFrame(v).T
样品运行
输入为数组:
In [53]: a
Out[53]:
array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])
不同的场景:
In [54]: diagonalize_generic(a, intercale = True, first_diag = 'left')
Out[54]:
0 1 2
0 1.0 2.0 3.0
1 5.0 4.0 7.0
2 9.0 6.0 NaN
3 NaN 8.0 NaN
In [55]: diagonalize_generic(a, intercale = False, first_diag = 'left')
Out[55]:
0 1 2
0 1.0 2.0 3.0
1 5.0 6.0 7.0
2 9.0 4.0 NaN
3 NaN 8.0 NaN
In [56]: diagonalize_generic(a, intercale = True, first_diag = 'right')
Out[56]:
0 1 2
0 1.0 4.0 7.0
1 5.0 2.0 3.0
2 9.0 8.0 NaN
3 NaN 6.0 NaN
In [57]: diagonalize_generic(a, intercale = False, first_diag = 'right')
Out[57]:
0 1 2
0 1.0 4.0 7.0
1 5.0 8.0 3.0
2 9.0 2.0 NaN
3 NaN 6.0 NaN