Question

如果我有：

    col1       col2
0   1          np.nan
1   2          np.nan
2   np.nan     3
4   np.nan     4

我将如何有效地前往：

    col1       col2     col3
0   1          np.nan   1
1   2          np.nan   2
2   np.nan     3        3
4   np.nan     4        4

我目前的解决方案是：

test = pd.Series([1,2,np.nan, np.nan])

test2 = pd.Series([np.nan, np.nan, 3,4])

temp_df = pd.concat([test, test2], axis = 1)


init_cols = list(temp_df.columns)

temp_df['test3'] = ""

for col in init_cols:
    temp_df.ix[temp_df[col].fillna("") != "", 'test3'] = list(temp_df.ix[temp_df[col].fillna("") != "", col])

理想情况下，我想避免使用循环。

Answer 1

这取决于您希望在每列具有非空值的情况下执行的操作。

先取col1然后填写col2

df['col3'] = df.col1.fillna(df.col2)

先取col2然后填写col1

df['col3'] = df.col2.fillna(df.col1)

平均重叠

df['col3'] = df.mean(1)

加总重叠

df['col3'] = df.sum(1)

根据缺失的数据合并两个大熊猫系列

1 个答案: