如何在空值上折叠熊猫中的列?

时间:2019-06-13 14:49:54

标签: python pandas numpy concatenation collapse

假设我具有以下数据框:

pd.DataFrame({'col1':    ["a", "a", np.nan, np.nan, np.nan],
            'override1': ["b", np.nan, "b", np.nan, np.nan],
            'override2': ["c", np.nan, np.nan, "c", np.nan]})


    col1    override1   override2
0     a        b          c
1     a       NaN        NaN
2     NaN      b         NaN
3     NaN     NaN         c
4     NaN     NaN         NaN

是否可以将3列折叠为一列,其中override2覆盖override1,而col1覆盖 collapsed 0 c 1 a 2 b 3 c 4 NaN ,但是,如果存在NaN,则bofore的值为被保留?另外,我主要是在寻找一种无需增加额外专栏的方法。我真的在寻找内置的熊猫解决方案。

这是我正在寻找的输出:

public interface ISomeTest
{
    bool RunTest(int i);
}

public static class MyExtensions
{
    public static int count = 0;
    public static bool CheckInputCorrect(this int o, int valueToCompareAgainst)
    {
        count++;
        if (valueToCompareAgainst == o)
            return true;
        return false;
    }
}

[TestFixture]
public class SomeTests
{
    [Test]
    public void TestWithMoq()
    {
        Mock<ISomeTest> mock = new Mock<ISomeTest>();
        mock.Setup(c => c.RunTest(It.Is<int>(p => p.CheckInputCorrect(2)))).Returns(true).Verifiable();

        ISomeTest someTest = mock.Object;

        someTest.RunTest(1);

        //fails
        Assert.AreEqual(1, MyExtensions.count);
    }
}

6 个答案:

答案 0 :(得分:4)

一个简单的解决方案涉及向前填充并选择最后一列。评论中提到了这一点。

df.ffill(1).iloc[:,-1].to_frame(name='collapsed')

  collapsed
0         c
1         a
2         b
3         c
4       NaN

如果您对性能感兴趣,我们可以使用Divakar的证明功能的修改版本:

pd.DataFrame({'collapsed': justify(
    df.values, invalid_val=np.nan, axis=1, side='right')[:,-1]
})

  collapsed
0         c
1         a
2         b
3         c
4       NaN

Reference.

def justify(a, invalid_val=0, axis=1, side='left'):    
    """
    Justifies a 2D array

    Parameters
    ----------
    A : ndarray
        Input array to be justified
    axis : int
        Axis along which justification is to be made
    side : str
        Direction of justification. It could be 'left', 'right', 'up', 'down'
        It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

    """

    if invalid_val is np.nan:
        mask = pd.notna(a)   # modified for strings
    else:
        mask = a!=invalid_val
    justified_mask = np.sort(mask,axis=axis)
    if (side=='up') | (side=='left'):
        justified_mask = np.flip(justified_mask,axis=axis)
    out = np.full(a.shape, invalid_val) 
    if axis==1:
        out[justified_mask] = a[mask]
    else:
        out.T[justified_mask.T] = a.T[mask.T]
    return out

答案 1 :(得分:4)

表现,而是美观和优雅(-:

df.stack().groupby(level=0).last().reindex(df.index)

0      c
1      a
2      b
3      c
4    NaN
dtype: object

答案 2 :(得分:3)

这是一种方法:

df.lookup(df.index , df.notna().cumsum(1).idxmax(1))
# array(['c', 'a', 'b', 'c', nan], dtype=object)

或者等效地使用基础numpy数组,并用idxmax更改ndarray.argmax

df.values[df.index, df.notna().cumsum(1).values.argmax(1)]
# array(['c', 'a', 'b', 'c', nan], dtype=object)

答案 3 :(得分:3)

关注性能,这是NumPy的一个-

In [106]: idx = df.shape[1] - 1 - df.notnull().to_numpy()[:,::-1].argmax(1)

In [107]: pd.Series(df.to_numpy()[np.arange(len(df)),idx])
Out[107]: 
0      c
1      a
2      b
3      c
4    NaN
dtype: object

答案 4 :(得分:3)

使用ffill

df.ffill(1).iloc[:,-1]

答案 5 :(得分:1)

import pandas as pd
import numpy as np
df=pd.DataFrame({'col1':    ["a", "a", np.nan, np.nan, np.nan],
            'override1': ["b", np.nan, "b", np.nan, np.nan],
            'override2': ["c", np.nan, np.nan, "c", np.nan]})

print(df)
df=df['col1'].fillna('') + df['override1'].fillna('')+ df['override2'].fillna('')
print(df)

enter image description here