如何将现有的值向左移动(即,每行的列向左移动以删除所有的NaN值/右移NaN值?
所需结果类似于:
id,level_1__value,level_2__value,level_3__value,last_not_null
1,1,nan,nan,2
2,5,nan,nan,5
3,3,5,nan,6
4,7,2,2
5,3,nan,3
...
下面,您找到上面定义数据框的代码:
import pandas as pd
import numpy as np
from numpy import nan
df = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9}, 'level_1__value': {0: 1.0, 1: nan, 2: 3.0, 3: 4.0, 4: 5.0, 5: nan, 6: 7.0, 7: nan, 8: 34.0}, 'level_2__value': {0: nan, 1: nan, 2: 5.0, 3: 7.0, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan}, 'level_3__value': {0: nan, 1: 5.0, 2: nan, 3: 2.0, 4: 3.0, 5: nan, 6: nan, 7: 6.0, 8: nan}, 'last_not_null': {0: 2.0, 1: 5.0, 2: 6.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 10.0, 7: 6.0, 8: 34.0}})
display(df)
答案 0 :(得分:1)
您可以将自定义函数与Series.dropna
一起使用,转换为numpy数组,并为边缘状态添加Series.reindex
-如果所有行的东部都为NaN,那么输出将与输入列的长度不匹配:< / p>
c = ['level_1__value', 'level_2__value', 'level_3__value']
f = lambda x: pd.Series(x.dropna().to_numpy()).reindex(range(len(c)))
df[c] = df[c].apply(f, axis=1)
print (df)
id level_1__value level_2__value level_3__value last_not_null
0 1 1.0 NaN NaN 2.0
1 2 5.0 NaN NaN 5.0
2 3 3.0 5.0 NaN 6.0
3 4 4.0 7.0 2.0 2.0
4 5 5.0 3.0 NaN 3.0
5 6 NaN NaN NaN 3.0
6 7 7.0 NaN NaN 10.0
7 8 6.0 NaN NaN 6.0
8 9 34.0 NaN NaN 34.0
如果性能很重要,请使用divakar
函数:
#https://stackoverflow.com/a/44559180/2901002
def justify(a, invalid_val=0, axis=1, side='left'):
"""
Justifies a 2D array
Parameters
----------
A : ndarray
Input array to be justified
axis : int
Axis along which justification is to be made
side : str
Direction of justification. It could be 'left', 'right', 'up', 'down'
It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.
"""
if invalid_val is np.nan:
mask = ~np.isnan(a)
else:
mask = a!=invalid_val
justified_mask = np.sort(mask,axis=axis)
if (side=='up') | (side=='left'):
justified_mask = np.flip(justified_mask,axis=axis)
out = np.full(a.shape, invalid_val)
if axis==1:
out[justified_mask] = a[mask]
else:
out.T[justified_mask.T] = a.T[mask.T]
return out
c = ['level_1__value', 'level_2__value', 'level_3__value']
df[c] = justify(df[c].to_numpy(),invalid_val=np.nan )
print (df)
id level_1__value level_2__value level_3__value last_not_null
0 1 1.0 NaN NaN 2.0
1 2 5.0 NaN NaN 5.0
2 3 3.0 5.0 NaN 6.0
3 4 4.0 7.0 2.0 2.0
4 5 5.0 3.0 NaN 3.0
5 6 NaN NaN NaN 3.0
6 7 7.0 NaN NaN 10.0
7 8 6.0 NaN NaN 6.0
8 9 34.0 NaN NaN 34.0