假设我具有以下数据框:
pd.DataFrame({'col1': ["a", "a", np.nan, np.nan, np.nan],
'override1': ["b", np.nan, "b", np.nan, np.nan],
'override2': ["c", np.nan, np.nan, "c", np.nan]})
col1 override1 override2
0 a b c
1 a NaN NaN
2 NaN b NaN
3 NaN NaN c
4 NaN NaN NaN
是否可以将3列折叠为一列,其中override2
覆盖override1
,而col1
覆盖 collapsed
0 c
1 a
2 b
3 c
4 NaN
,但是,如果存在NaN,则bofore的值为被保留?另外,我主要是在寻找一种无需增加额外专栏的方法。我真的在寻找内置的熊猫解决方案。
这是我正在寻找的输出:
public interface ISomeTest
{
bool RunTest(int i);
}
public static class MyExtensions
{
public static int count = 0;
public static bool CheckInputCorrect(this int o, int valueToCompareAgainst)
{
count++;
if (valueToCompareAgainst == o)
return true;
return false;
}
}
[TestFixture]
public class SomeTests
{
[Test]
public void TestWithMoq()
{
Mock<ISomeTest> mock = new Mock<ISomeTest>();
mock.Setup(c => c.RunTest(It.Is<int>(p => p.CheckInputCorrect(2)))).Returns(true).Verifiable();
ISomeTest someTest = mock.Object;
someTest.RunTest(1);
//fails
Assert.AreEqual(1, MyExtensions.count);
}
}
答案 0 :(得分:4)
一个简单的解决方案涉及向前填充并选择最后一列。评论中提到了这一点。
df.ffill(1).iloc[:,-1].to_frame(name='collapsed')
collapsed
0 c
1 a
2 b
3 c
4 NaN
如果您对性能感兴趣,我们可以使用Divakar的证明功能的修改版本:
pd.DataFrame({'collapsed': justify(
df.values, invalid_val=np.nan, axis=1, side='right')[:,-1]
})
collapsed
0 c
1 a
2 b
3 c
4 NaN
def justify(a, invalid_val=0, axis=1, side='left'): """ Justifies a 2D array Parameters ---------- A : ndarray Input array to be justified axis : int Axis along which justification is to be made side : str Direction of justification. It could be 'left', 'right', 'up', 'down' It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0. """ if invalid_val is np.nan: mask = pd.notna(a) # modified for strings else: mask = a!=invalid_val justified_mask = np.sort(mask,axis=axis) if (side=='up') | (side=='left'): justified_mask = np.flip(justified_mask,axis=axis) out = np.full(a.shape, invalid_val) if axis==1: out[justified_mask] = a[mask] else: out.T[justified_mask.T] = a.T[mask.T] return out
答案 1 :(得分:4)
表现不,而是美观和优雅(-:
df.stack().groupby(level=0).last().reindex(df.index)
0 c
1 a
2 b
3 c
4 NaN
dtype: object
答案 2 :(得分:3)
这是一种方法:
df.lookup(df.index , df.notna().cumsum(1).idxmax(1))
# array(['c', 'a', 'b', 'c', nan], dtype=object)
或者等效地使用基础numpy
数组,并用idxmax
更改ndarray.argmax
:
df.values[df.index, df.notna().cumsum(1).values.argmax(1)]
# array(['c', 'a', 'b', 'c', nan], dtype=object)
答案 3 :(得分:3)
关注性能,这是NumPy的一个-
In [106]: idx = df.shape[1] - 1 - df.notnull().to_numpy()[:,::-1].argmax(1)
In [107]: pd.Series(df.to_numpy()[np.arange(len(df)),idx])
Out[107]:
0 c
1 a
2 b
3 c
4 NaN
dtype: object
答案 4 :(得分:3)
使用ffill
df.ffill(1).iloc[:,-1]
答案 5 :(得分:1)
import pandas as pd
import numpy as np
df=pd.DataFrame({'col1': ["a", "a", np.nan, np.nan, np.nan],
'override1': ["b", np.nan, "b", np.nan, np.nan],
'override2': ["c", np.nan, np.nan, "c", np.nan]})
print(df)
df=df['col1'].fillna('') + df['override1'].fillna('')+ df['override2'].fillna('')
print(df)