左移熊猫数据框中的所有列

时间:2020-10-21 10:14:10

标签: python pandas

我有一个以下格式的数据框: enter image description here

如何将现有的值向左移动(即,每行的列向左移动以删除所有的NaN值/右移NaN值?

所需结果类似于:

id,level_1__value,level_2__value,level_3__value,last_not_null
1,1,nan,nan,2
2,5,nan,nan,5
3,3,5,nan,6
4,7,2,2
5,3,nan,3
...

下面,您找到上面定义数据框的代码:

    import pandas as pd
import numpy as np
from numpy import nan

df = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9}, 'level_1__value': {0: 1.0, 1: nan, 2: 3.0, 3: 4.0, 4: 5.0, 5: nan, 6: 7.0, 7: nan, 8: 34.0}, 'level_2__value': {0: nan, 1: nan, 2: 5.0, 3: 7.0, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan}, 'level_3__value': {0: nan, 1: 5.0, 2: nan, 3: 2.0, 4: 3.0, 5: nan, 6: nan, 7: 6.0, 8: nan}, 'last_not_null': {0: 2.0, 1: 5.0, 2: 6.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 10.0, 7: 6.0, 8: 34.0}})
display(df)

1 个答案:

答案 0 :(得分:1)

您可以将自定义函数与Series.dropna一起使用,转换为numpy数组,并为边缘状态添加Series.reindex-如果所有行的东部都为NaN,那么输出将与输入列的长度不匹配:< / p>

c = ['level_1__value', 'level_2__value', 'level_3__value']
f = lambda x: pd.Series(x.dropna().to_numpy()).reindex(range(len(c)))
df[c] = df[c].apply(f, axis=1)
print (df)
   id  level_1__value  level_2__value  level_3__value  last_not_null
0   1             1.0             NaN             NaN            2.0
1   2             5.0             NaN             NaN            5.0
2   3             3.0             5.0             NaN            6.0
3   4             4.0             7.0             2.0            2.0
4   5             5.0             3.0             NaN            3.0
5   6             NaN             NaN             NaN            3.0
6   7             7.0             NaN             NaN           10.0
7   8             6.0             NaN             NaN            6.0
8   9            34.0             NaN             NaN           34.0

如果性能很重要,请使用divakar函数:

#https://stackoverflow.com/a/44559180/2901002
def justify(a, invalid_val=0, axis=1, side='left'):    
    """
    Justifies a 2D array

    Parameters
    ----------
    A : ndarray
        Input array to be justified
    axis : int
        Axis along which justification is to be made
    side : str
        Direction of justification. It could be 'left', 'right', 'up', 'down'
        It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

    """

    if invalid_val is np.nan:
        mask = ~np.isnan(a)
    else:
        mask = a!=invalid_val
    justified_mask = np.sort(mask,axis=axis)
    if (side=='up') | (side=='left'):
        justified_mask = np.flip(justified_mask,axis=axis)
    out = np.full(a.shape, invalid_val) 
    if axis==1:
        out[justified_mask] = a[mask]
    else:
        out.T[justified_mask.T] = a.T[mask.T]
    return out

c = ['level_1__value', 'level_2__value', 'level_3__value']
df[c] = justify(df[c].to_numpy(),invalid_val=np.nan )
print (df)
   id  level_1__value  level_2__value  level_3__value  last_not_null
0   1             1.0             NaN             NaN            2.0
1   2             5.0             NaN             NaN            5.0
2   3             3.0             5.0             NaN            6.0
3   4             4.0             7.0             2.0            2.0
4   5             5.0             3.0             NaN            3.0
5   6             NaN             NaN             NaN            3.0
6   7             7.0             NaN             NaN           10.0
7   8             6.0             NaN             NaN            6.0
8   9            34.0             NaN             NaN           34.0